Summary:
Adds two features to the generated rule disable option:
- '*' - Disable all rules
- '!<foo>' - Re-enable rule(s)
- '!foo' - Enable rule named 'foo'
- '!5' - Enable rule five
- '!4-9' - Enable rule four to nine
- '!foo-bar' - Enable rules from 'foo' to (and including) 'bar'
(the '!' is available to the generated disable option but is not part of the underlying and determines whether to call setRuleDisabled() or setRuleEnabled())
This is intended to support unit testing of combine rules so
that you can do:
GeneratedCfg.setRuleDisabled("*")
GeneratedCfg.setRuleEnabled("foo")
to ensure only a specific rule is in effect. The rule is still
required to be included in a combiner though
Also added --...-only-enable-rule=X,Y which is effectively an
alias for --...-disable-rule=*,!X,!Y and as such interacts
properly with disable-rule.
Reviewers: aditya_nandakumar, bogner, volkan, aemerson, paquette, arsenm
Subscribers: wdng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81889
When selecting 32 b -> 64 b G_ZEXTs, we don't have to always emit the extend.
If the instruction feeding into the G_ZEXT implicitly zero extends the high
half of the register, we can just emit a SUBREG_TO_REG instead.
Differential Revision: https://reviews.llvm.org/D81897
This patch upstreams support for BFloat Matrix Multiplication Intrinsics
and Code Generation from __bf16 to AArch64. This includes IR intrinsics. Unittests are
provided as needed. AArch32 Intrinsics + CodeGen will come after this
patch.
This patch is part of a series implementing the Bfloat16 extension of
the
Armv8.6-a architecture, as detailed here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
The bfloat type, and its properties are specified in the Arm
Architecture
Reference Manual:
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
The following people contributed to this patch:
Luke Geeson
- Momchil Velikov
- Mikhail Maltsev
- Luke Cheeseman
Reviewers: SjoerdMeijer, t.p.northover, sdesmalen, labrinea, miyuki,
stuij
Reviewed By: miyuki, stuij
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki, chill, pbarrio, stuij
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80752
Change-Id: I174f0fd0f600d04e3799b06a7da88973c6c0703f
It's possible to end up with a zext or something in the way of a G_CONSTANT,
even pre-legalization. This can happen with memsets.
e.g.
https://godbolt.org/z/Bjc8cw
To make sure we can catch these cases, use `getConstantVRegValWithLookThrough`
instead of `mi_match`.
Differential Revision: https://reviews.llvm.org/D81875
Add selection support for ext via a new opcode, G_EXT and a post-legalizer
combine which matches it.
Add an `applyEXT` function, because the AArch64ext patterns require a register
for the immediate. So, we have to create a G_CONSTANT to get these without
writing new patterns or modifying the existing ones.
Tests are the same as arm64-ext.ll.
Also prevent ext from firing on the zip test. It has higher priority, so we
don't want it potentially getting in the way of mask tests.
Also fix up the shuffle-splat test, because ext is now selected there. The
test was incorrectly regbank selected before, which could cause a verifier
failure when you emit copies.
Differential Revision: https://reviews.llvm.org/D81436
This implements the following combines:
((0-A) + B) -> B-A
(A + (0-B)) -> A-B
Porting over the basic algebraic combines from the DAGCombiner. There are
several combines which fold adds away into subtracts. This is just the simplest
one.
I noticed that add combines are some of the most commonly hit across CTMark,
(via print statements when they fire), so I'm porting over some of the obvious
ones.
This gives some minor code size improvements on CTMark at -O3 on AArch64.
Differential Revision: https://reviews.llvm.org/D77453
Summary:
SCTLR_EL1.BT[01] controls the PACI[AB]SP compatibility with PBYTE 11
(see [1])
This bit will be set to zero so PACI[AB]SP are equal to BTI C
instruction only.
[1] https://developer.arm.com/docs/ddi0595/b/aarch64-system-registers/sctlr_el1
Reviewers: chill, tamas.petz, pbarrio, ostannard
Reviewed By: tamas.petz, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81746
Summary:
Teach MachineVerifier to check branches for MBB operands if they are not declared indirect.
Add `isBarrier`, `isIndirectBranch` to `G_BRINDIRECT` and `G_BRJT`.
Without these, `MachineInstr.isConditionalBranch()` was giving a
false-positive for those instructions.
Reviewers: aemerson, qcolombet, dsanders, arsenm
Reviewed By: dsanders
Subscribers: hiraditya, wdng, simoncook, s.egerton, arsenm, rovka, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81587
We select all of these via patterns now, so there's no reason to disallow this.
Update select-dup.mir to show that we correctly select the smaller types.
Differential Revision: https://reviews.llvm.org/D81322
This was making it so that the instructions weren't eliminated in
select-rev.mir and select-trn.mir despite not being used.
Update the tests accordingly.
Differential Revision: https://reviews.llvm.org/D81492
To make sure that no barrier gets placed on the architectural execution
path, each
BLR x<N>
instruction gets transformed to a
BL __llvm_slsblr_thunk_x<N>
instruction, with __llvm_slsblr_thunk_x<N> a thunk that contains
__llvm_slsblr_thunk_x<N>:
BR x<N>
<speculation barrier>
Therefore, the BLR instruction gets split into 2; one BL and one BR.
This transformation results in not inserting a speculation barrier on
the architectural execution path.
The mitigation is off by default and can be enabled by the
harden-sls-blr subtarget feature.
As a linker is allowed to clobber X16 and X17 on function calls, the
above code transformation would not be correct in case a linker does so
when N=16 or N=17. Therefore, when the mitigation is enabled, generation
of BLR x16 or BLR x17 is avoided.
As BLRA* indirect calls are not produced by LLVM currently, this does
not aim to implement support for those.
Differential Revision: https://reviews.llvm.org/D81402
Summary:
Fix crash when using -debug caused by the GlobalISel observer trying to print
an incomplete DBG_VALUE instruction. This was caused by the MachineIRBuilder
using buildInstr, which immediately inserts the instruction causing print,
instead of using BuildMI to first build up the instruction and using
insertInstr when finished.
Add RUN-line to existing debug-insts.ll test with -debug flag set to make sure
no crash is happening.
Also fixed a missing %s in the 2nd RUN-line of the same test.
Reviewers: t.p.northover, aditya_nandakumar, aemerson, dsanders, arsenm
Reviewed By: arsenm
Subscribers: wdng, arsenm, rovka, hiraditya, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76934
Until we have a real need for computing known bits for scalable
vectors I have simply changed the code to bail out for now and
pretend we know nothing. I've also fixed up some simple callers of
computeKnownBits too.
Differential Revision: https://reviews.llvm.org/D80437
Some processors may speculatively execute the instructions immediately
following RET (returns) and BR (indirect jumps), even though
control flow should change unconditionally at these instructions.
To avoid a potential miss-speculatively executed gadget after these
instructions leaking secrets through side channels, this pass places a
speculation barrier immediately after every RET and BR instruction.
Since these barriers are never on the correct, architectural execution
path, performance overhead of this is expected to be low.
On targets that implement that Armv8.0-SB Speculation Barrier extension,
a single SB instruction is emitted that acts as a speculation barrier.
On other targets, a DSB SYS followed by a ISB is emitted to act as a
speculation barrier.
These speculation barriers are implemented as pseudo instructions to
avoid later passes to analyze them and potentially remove them.
Even though currently LLVM does not produce BRAA/BRAB/BRAAZ/BRABZ
instructions, these are also mitigated by the pass and tested through a
MIR test.
The mitigation is off by default and can be enabled by the
harden-sls-retbr subtarget feature.
Differential Revision: https://reviews.llvm.org/D81400
Instead of loading from e.g. `<vscale x 16 x i8>*`, load from element
pointer `i8*`. This is more in line with the other load/store
intrinsics for SVE.
Reviewers: fpetrogalli, c-rhodes, rengolin, efriedma
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81458
This ensures that we match SelectionDAG behaviour by waiting until the expand
pseudos pass to generate ADRP + ADD pairs. Doing this at selection time for the
G_ADD_LOW is fine because by the time we get to selecting the G_ADD_LOW,
previous attempts to fold it into loads/stores must have failed.
Differential Revision: https://reviews.llvm.org/D81512
Having the input dumped on failure seems like a better
default: I debugged FileCheck tests for a while without knowing
about this option, which really helps to understand failures.
Remove `-dump-input-on-failure` and the environment variable
FILECHECK_DUMP_INPUT_ON_FAILURE which are now obsolete.
Differential Revision: https://reviews.llvm.org/D81422
If a resource can be held for multiple cycles in the schedule model
then an instruction can be placed into the available queue, another
instruction can be scheduled, but the first will not be taken back out if
the two instructions hazard. To fix this make sure that we update the
available queue even on the first MOp of a cycle, pushing available
instructions back into the pending queue if they now conflict.
This happens with some downstream schedules we have around MVE
instruction scheduling where we use ResourceCycles=[2] to show the
instruction executing over two beats. Apparently the test changes here
are OK too.
Differential Revision: https://reviews.llvm.org/D76909
Same idea as for zip, uzp, etc. Teach the post-legalizer combiner to recognize
G_SHUFFLE_VECTORs that are trn1/trn2 instructions.
- Add G_TRN1 and G_TRN2
- Port mask matching code from AArch64ISelLowering
- Produce G_TRN1 and G_TRN2 in the post-legalizer combiner
- Select via importer
Add select-trn.mir to test selection.
Add postlegalizer-combiner-trn.mir to test the combine. This is similar to the
existing arm64-trn test.
Note that both of these tests contain things we currently don't legalize.
I figured it would be easier to test these now rather than later, since once
we legalize the G_SHUFFLE_VECTORs, it's not guaranteed that someone will update
the tests.
Differential Revision: https://reviews.llvm.org/D81182
If fmul and fadd are separated by an fma, we can fold them together
to save an instruction:
fadd (fma A, B, (fmul C, D)), N1 --> fma(A, B, fma(C, D, N1))
The fold implemented here is actually a specialization - we should
be able to peek through >1 fma to find this pattern. That's another
patch if we want to try that enhancement though.
This transform was guarded by the TLI hook enableAggressiveFMAFusion(),
so it was done for some in-tree targets like PowerPC, but not AArch64
or x86. The hook is protecting against forming a potentially more
expensive computation when fma takes longer to execute than a single
fadd. That hook may be needed for other transforms, but in this case,
we are replacing fmul+fadd with fma, and the fma should never take
longer than the 2 individual instructions.
'contract' FMF is all we need to allow this transform. That flag
corresponds to -ffp-contract=fast in Clang, so we are allowed to form
fma ops freely across expressions.
Differential Revision: https://reviews.llvm.org/D80801
Summary:
This patch adds initial support for the following instrinsics:
* llvm.aarch64.sve.ld2
* llvm.aarch64.sve.ld3
* llvm.aarch64.sve.ld4
For loading two, three and four vectors worth of data. Basic codegen is
implemented with reg+reg and reg+imm addressing modes being addressed
in a later patch.
The types returned by these intrinsics have a number of elements that is a
multiple of the elements in a 128-bit vector for a given type and N, where N is
the number of vectors being loaded, i.e. 2, 3 or 4. Thus, for 32-bit elements
the types are:
LD2 : <vscale x 8 x i32>
LD3 : <vscale x 12 x i32>
LD4 : <vscale x 16 x i32>
This is implemented with target-specific intrinsics for each variant that take
the same operands as the IR intrinsic but return N values, where the type of
each value is a full vector, i.e. <vscale x 4 x i32> in the above example.
These values are then concatenated using the standard concat_vector intrinsic
to maintain type legality with the IR.
These intrinsics are intended for use in the Arm C Language
Extension (ACLE).
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D75751
In two instances of CreateStackTemporary we are sometimes promoting
alignments beyond the stack alignment. I have introduced a new function
called getReducedAlign that will return the alignment for the broken
down parts of illegal vector types. For example, on NEON a <32 x i8>
type is made up of two <16 x i8> types - in this case the sensible
alignment is 16 bytes, not 32.
In the legalization code wherever we create stack temporaries I have
started using the reduced alignments instead for illegal vector types.
I added a test to
CodeGen/AArch64/build-one-lane.ll
that tries to insert an element into an illegal fixed vector type
that involves creating a temporary stack object.
Differential Revision: https://reviews.llvm.org/D80370
Currently aarch64-ldst-opt will incorrectly rename registers with
multiple disjunct subregisters (e.g. result of LD3). This patch updates
the canRenameUpToDef to bail out if it encounters such a register class
that contains the register to rename.
Fixes PR46105.
Reviewers: efriedma, dmgreen, paquette, t.p.northover
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D81108
This patch adds a test to check that we do not use an undef renamable
register for renaming the other operand in a LDP instruction, as
suggested in D81108.
When the input to a wide compare instruction is a DUP or SPLAT_VECTOR
node we should deal with cases where the DUP/SPLAT_VECTOR input
operand is not an immediate value. I've fixed the code to return
SDValue() in such cases and added a couple of tests - one each to
represent the signed and unsigned cases.
Differential Revision: https://reviews.llvm.org/D81167
Summary:
This patch adds the following intrinsics for creating two-tuple,
three-tuple and four-tuple scalable vectors:
* llvm.aarch64.sve.tuple.create2
* llvm.aarch64.sve.tuple.create3
* llvm.aarch64.sve.tuple.create4
As well as:
* llvm.aarch64.sve.tuple.get
* llvm.aarch64.sve.tuple.set
For extracting and inserting scalable vectors from vector tuples. These
intrinsics are intended to be used by the ACLE functions svcreate<n>,
svget and svset.
This patch also includes calling convention support for passing and
returning tuples of scalable vectors to/from functions.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75674
Scalable vectors cannot use 'BUILD_VECTOR', so it is necessary to
properly split and widen scalable vectors when passing them
to CopyToReg/CopyFromReg.
This functionality is added to TargetLoweringBase::getVectorTypeBreakdown().
This patch only adds support for 'splitting' scalable vectors that
are a multiple of some legal type, e.g.
<vscale x 6 x i64> -> 3 x <vscale x 2 x i64>
Reviewers: efriedma, c-rhodes
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80139
Since all of the other G_SHUFFLE_VECTOR transforms are going there, let's do
this with dup as well. This is nice, because it lets us split up the original
code into matching, register bank selection, and instruction selection.
- Create G_DUP, make it equivalent to AArch64dup
- Add a post-legalizer combine which is 90% a copy-and-paste from
tryOptVectorDup, except with shuffle matching closer to what SelectionDAG
does in `ShuffleVectorSDNode::isSplatMask`.
- Teach RegBankSelect about G_DUP. Since dup selection relies on the correct
register bank for FP/GPR dup selection, this is necessary.
- Kill `tryOptVectorDup`, since it's now entirely handled by G_DUP.
- Add testcases for the combine, RegBankSelect, and selection. The selection
test gives the same selection results as the old test.
Differential Revision: https://reviews.llvm.org/D81221
Summary:
This patch adds legalisation of extensions where the operand
of the extend is a legal scalable type but the result is not.
EXTRACT_SUBVECTOR is used to split the result, before
being replaced by target-specific [S|U]UNPK[HI|LO] operations.
For example:
```
zext <vscale x 16 x i8> %a to <vscale x 16 x i16>
```
should emit:
```
uunpklo z2.h, z0.b
uunpkhi z1.h, z0.b
```
Reviewers: sdesmalen, efriedma, david-arm
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79587
This does three things:
1) Adds G_REV16, G_REV32, and G_REV64. These are equivalent to AArch64rev16,
AArch64rev32, and AArch64rev64 respectively.
2) Adds support for producing G_REV64 in the postlegalizer combiner.
We don't legalize any of the shuffles which could give us a G_REV32 or
G_REV16 yet. Since the function for detecting the rev mask is lifted from
AArch64ISelLowering, it should work for G_REV32 and G_REV16 when we get
there.
3) Adds a selection test for a good portion of the patterns imported for the rev
family. The only ones which are not tested are the ones with bitconvert.
This also does a little cleanup, and adds a struct for shuffle vector pseudo
matchdata. This lets us still use `applyShuffleVectorPseudo` rather than adding
a new function.
It should also make it a bit easier to port some of the other masks from
AArch64ISelLowering. (e.g. `isZIP_v_undef_Mask` and friends)
Differential Revision: https://reviews.llvm.org/D81112
Porting the mask stuff for uzp1 and uzp2 from AArch64ISelLowering.
Add two custom opcodes: G_UZP1 and G_UZP2.
Produce them in the post-legalizer combiner when the mask checks out.
Tests:
- postlegalizer-combiner-uzp.mir verifies that we create G_UZP1 and G_UZP2.
The testcases that check that we create them come from neon-perm.ll.
- select-uzp.mir verifies that we can select G_UZP1 and G_UZP2.
Differential Revision: https://reviews.llvm.org/D81049