Summary:
This patch adds initial support for the following intrinsics:
* llvm.aarch64.sve.st2
* llvm.aarch64.sve.st3
* llvm.aarch64.sve.st4
For storing two, three and four vectors worth of data. Basic codegen for
reg+immediate forms are implemented. Reg+reg addressing modes will be
addressed in a later patch.
These intrinsics are intended for use in the Arm C Language Extension
(ACLE).
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75947
On Darwin these need to be selected into a function call for the TLS
address lookup. As a result, they can't be moved below a physreg write,
which happens in call sequences. In the long term, we should have some
mechanism in the localizer to prevent localizing into target-specific
atomic instruction sequences.
rdar://60056248
Differential Revision: https://reviews.llvm.org/D76652
When we find something like this:
```
%a:_(s32) = G_SOMETHING ...
...
%select:_(s32) = G_SELECT %cond(s1), %a, %a
```
We can remove the select and just replace it entirely with `%a` because it's
always going to result in `%a`.
Same if we have
```
%select:_(s32) = G_SELECT %cond(s1), %a, %b
```
where we can deduce that `%a == %b`.
This implements the following cases:
- `%select:_(s32) = G_SELECT %cond(s1), %a, %a` -> `%a`
- `%select:_(s32) = G_SELECT %cond(s1), %a, %some_copy_from_a` -> `%a`
- `%select:_(s32) = G_SELECT %cond(s1), %a, %b` -> `%a` when `%a` and `%b`
are defined by identical instructions
This gives a few minor code size improvements on CTMark at -O3 for AArch64.
Differential Revision: https://reviews.llvm.org/D76523
Summary:
Add new generic MIR opcodes G_SADDSAT etc. Add support in IRTranslator
for translating the saturating add/subtract intrinsics to the new
opcodes.
Reviewers: aemerson, dsanders, paquette, arsenm
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76600
Summary:
It can be the case that a vector type is legal but the corresponding
scalar type is not legal for an architecture (i8 vs. v16i8 on AArch64).
Check if the scalar type created when folding
truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y))
is legal if we are running after the type legalizer.
This fixes https://github.com/android/ndk/issues/1207.
Reviewers: RKSimon, srhines
Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76312
Port over the following:
- shuffle undef, undef, any_mask -> undef
- shuffle anything, anything, undef_mask -> undef
This sort of thing shows up a lot when you try to bugpoint code containing
shufflevector.
Differential Revision: https://reviews.llvm.org/D76382
Support prefixing destructive operations, with the MOVPRFX instruction, to build constructive operations.
Differential Revision: https://reviews.llvm.org/D75064
Summary:
In order to keep the names consistent with other SVE gather loads, the
intrinsics for gather prefetch are renamed as follows:
* @llvm.aarch64.sve.gather.prfb -> @llvm.aarch64.sve.prfb.gather
Reviewed by: fpetrogalli
Differential Revision: https://reviews.llvm.org/D76421
This ports some combines from DAGCombiner.cpp which perform some trivial
transformations on instructions with undef operands.
Not having these can make it extremely annoying to find out where we differ
from SelectionDAG by looking at existing lit tests. Without them, we tend to
produce pretty bad code generation when we run into instructions which use
undef operands.
Also remove the nonpow2_store_narrowing testcase from arm64-fallback.ll, since
we no longer fall back on the add.
Differential Revision: https://reviews.llvm.org/D76339
Summary: The following change is to allow the machine outlining can be applied for Nth times, where N is specified by the compiler option. By default the value of N is 1. The motivation is that the repeated machine outlining can further reduce code size. Please refer to the presentation "Improving Swift Binary Size via Link Time Optimization" in LLVM Developers' Meeting in 2019.
Reviewers: aschwaighofer, tellenbach, paquette
Reviewed By: paquette
Subscribers: tellenbach, hiraditya, llvm-commits, jinlin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71027
Summary:
This fixes a discrepancy between the non-temporal loads/store
intrinsics and other SVE load intrinsics (such as nf/ff), so
that Clang can use the same code to generate these intrinsics.
Reviewers: andwar, kmclaughlin, rengolin, efriedma
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76237
Summary: The following change is to allow the machine outlining can be applied for Nth times, where N is specified by the compiler option. By default the value of N is 1. The motivation is that the repeated machine outlining can further reduce code size. Please refer to the presentation "Improving Swift Binary Size via Link Time Optimization" in LLVM Developers' Meeting in 2019.
Reviewers: aschwaighofer, tellenbach, paquette
Reviewed By: paquette
Subscribers: tellenbach, hiraditya, llvm-commits, jinlin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71027
ISD::ROTL/ROTR rotation values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits.
Differential Revision: https://reviews.llvm.org/D76201
Summary: The following change is to allow the machine outlining can be applied for Nth times, where N is specified by the compiler option. By default the value of N is 1. The motivation is that the repeated machine outlining can further reduce code size. Please refer to the presentation "Improving Swift Binary Size via Link Time Optimization" in LLVM Developers' Meeting in 2019.
Reviewers: aschwaighofer, tellenbach, paquette
Reviewed By: paquette
Subscribers: tellenbach, hiraditya, llvm-commits, jinlin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71027
Summary:
Run StackSafetyAnalysis at the end of the IR pipeline and annotate
proven safe allocas with !stack-safe metadata. Do not instrument such
allocas in the AArch64StackTagging pass.
Reviewers: pcc, vitalybuka, ostannard
Reviewed By: vitalybuka
Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, gilang, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73513
Follow-up for D74433
What the function returns are almost standard BFD names, except that "ELF" is
in uppercase instead of lowercase.
This patch changes "ELF" to "elf" and changes ARM/AArch64 to use their BFD names.
MIPS and PPC64 have endianness differences as well, but this patch does not intend to address them.
Advantages:
* llvm-objdump: the "file format " line matches GNU objdump on ARM/AArch64 objects
* "file format " line can be extracted and fed into llvm-objcopy -O literally.
(https://github.com/ClangBuiltLinux/linux/issues/779 has such a use case)
Affected tools: llvm-readobj, llvm-objdump, llvm-dwarfdump, MCJIT (internal implementation detail, not exposed)
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D76046
Summary:
Truncating the result of a merge means that most likely we could have done without merge in the first place and just used the input merge inputs directly. This can be done in three cases:
1. If the truncation result is smaller than the merge source, we can use the source in the trunc directly
2. If the sizes are the same, we can replace the register or use a copy
3. If the truncation size is a multiple of the merge source size, we can build a smaller merge
This gets rid of most of the larger, hard-to-legalize merges.
Reviewers: qcolombet, aditya_nandakumar, aemerson, paquette, arsenm, Petar.Avramovic
Reviewed By: arsenm
Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, jrtc27, atanasyan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75915
Summary:
This is a part of the series of efforts for correcting alignment of memory operations.
(Another related bugs: https://bugs.llvm.org/show_bug.cgi?id=44388 , https://bugs.llvm.org/show_bug.cgi?id=44543 )
This fixes https://bugs.llvm.org/show_bug.cgi?id=43880 by giving default alignment of loads to 1.
The test CodeGen/AArch64/bcmp-inline-small.ll should have been changed; it was introduced by https://reviews.llvm.org/D64805 . I talked with @evandro, and confirmed that the test is okay to be changed.
Other two tests from PowerPC needed changes as well, but fixes were straightforward.
Reviewers: courbet
Reviewed By: courbet
Subscribers: nlopes, gchatelet, wuzish, nemanjai, kristof.beyls, hiraditya, steven.zhang, danielkiss, llvm-commits, evandro
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76113
Summary:
When moving add and sub to memory operand instructions,
aarch64-ldst-opt would prematurally pop the stack pointer,
before memory instructions that do access the stack using
indirect loads.
e.g.
```
int foo(int offset){
int local[4] = {0};
return local[offset];
}
```
would generate:
```
sub sp, sp, #16 ; Push the stack
mov x8, sp ; Save stack in register
stp xzr, xzr, [sp], #16 ; Zero initialize stack, and post-increment, making it invalid
------ If an exception goes here, the stack value might be corrupted
ldr w0, [x8, w0, sxtw #2] ; Access correct position, but it is not guarded by SP
```
Reviewers: fhahn, foad, thegameg, eli.friedman, efriedma
Reviewed By: efriedma
Subscribers: efriedma, kristof.beyls, hiraditya, danielkiss, llvm-commits, simon_tatham
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75755
Summary:
This intrinsic implements the unpredicated duplication of scalar values
and is mapped to (through ISD::SPLAT_VECTOR):
* DUP <Zd>.<T>, #<imm>
* DUP <Zd>.<T>, <R><n|SP>
Reviewed by: sdesmalen
Differential Revision: https://reviews.llvm.org/D75900
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.
This change is needed for D73753.
Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74386
Summary:
This patch adds the following intrinsics for non-temporal gather loads
and scatter stores:
* aarch64_sve_ldnt1_gather_index
* aarch64_sve_stnt1_scatter_index
These intrinsics implement the "scalar + vector of indices" addressing
mode.
As opposed to regular and first-faulting gathers/scatters, there's no
instruction that would take indices and then scale them. Instead, the
indices for non-temporal gathers/scatters are scaled before the
intrinsics are lowered to `ldnt1` instructions.
The new ISD nodes, GLDNT1_INDEX and SSTNT1_INDEX, are only used as
placeholders so that we can easily identify the cases implemented in
this patch in performGatherLoadCombine and performScatterStoreCombined.
Once encountered, they are replaced with:
* GLDNT1_INDEX -> SPLAT_VECTOR + SHL + GLDNT1
* SSTNT1_INDEX -> SPLAT_VECTOR + SHL + SSTNT1
The patterns for lowering ISD::SHL for scalable vectors (required by
this patch) were missing, so these are added too.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D75601
Summary:
callbr's indirect branches aren't expected to be taken, so reduce their
probabilities to 0 while increasing the default destination to 1. This
allows some code improvements through block placement.
Reviewers: nickdesaulniers
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72656
Summary:
The IR intrinsics are mapped to the following SVE2 instructions:
* WHILERW <Pd>.<T>, <Xn>, <Xm>
* WHILEWR <Pd>.<T>, <Xn>, <Xm>
The intrinsics introduced in this patch are the IR counterpart of the
SVE ACLE functions `svwhilerw` and `svwhilewr` (all data type
variants).
Patch by Maciej Gąbka <maciej.gabka@arm.com>.
Reviewers: kmclaughlin, rengolin
Reviewed By: kmclaughlin
Subscribers: tschuett, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75862
This reverts commit 5583c2f2fb.
The lldb bot failure was a test that was fragile and sensitive to irrelevant
changes in instruction ordering. Re-committing this as the test should have
been skipped for AArch64 now.
Differential Revision: https://reviews.llvm.org/D75555
The new behavior matches GNU objdump. A pair of angle brackets makes tests slightly easier.
`.foo:` is not unique and thus cannot be used in a `CHECK-LABEL:` directive.
Without `-LABEL`, the CHECK line can match the `Disassembly of section`
line and causes the next `CHECK-NEXT:` to fail.
```
Disassembly of section .foo:
0000000000001634 .foo:
```
Bdragon: <> has metalinguistic connotation. it just "feels right"
Reviewed By: rupprecht
Differential Revision: https://reviews.llvm.org/D75713
Previously for any copy from a register bigger than the destination:
Copied to a same-sized register in the destination register bank.
Subregister copy of that to the destination.
This fails for copies from 128-bit FPRs to GPRs because the GPR register bank
can't accomodate 128-bit values.
Instead of special-casing such copies to perform the truncation beforehand in
the source register bank, generalize this:
a) Perform a subregister copy straight from source register whenever possible.
This results in shorter MIR and fixes the above problem.
b) Perform a full copy to target bank and then do a subregister copy only if
source bank can't support target's size. E.g. GPR to 8-bit FPR copy.
Patch by Raul Tambre (tambre)!
Differential Revision: https://reviews.llvm.org/D75421
As discussed in the commit thread for rGa253a2a and D73978, we can do more undef folding for FP ops.
The nnan and ninf fast-math-flags specify that if an operand is the disallowed value, the result is
poison, so we can produce an undef result.
But this doesn't work as expected (the undef operand cases remain) because of a Flags propagation
problem in SelectionDAGBuilder.
I've added DAGCombiner calls to enable these for the other cases because we've shown in other
patches that (because of the limited way that SDAG iterates), it is possible to miss simplifications
like this if they are done only at node creation time.
Several potential follow-ups to expand on this patch are possible.
Differential Revision: https://reviews.llvm.org/D75576
This changes the localizer to attempt intra-block localizer of instructions
that have local uses. This is useful because sometimes the entry block itself
has many uses of constant-like instructions, which would benefit from shortening
live ranges. Previously if an inst had no non-local uses, we wouldn't add it to
the list of instructions to attempt further intra-block localization.
This gives a 0.7% geomean code size improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D75555
CFI instructions can only safely be outlined when the outlined call is a tail
call, or when the outlined frame is fixed up.
For the sake of correctness, disable outlining from CFI instructions.
Add machine-outliner-cfi.mir to test this.
Summary:
This patch adds the following LLVM IR intrinsics for SVE:
1. non-temporal gather loads
* @llvm.aarch64.sve.ldnt1.gather
* @llvm.aarch64.sve.ldnt1.gather.uxtw
* @llvm.aarch64.sve.ldnt1.gather.scalar.offset
2. non-temporal scatter stores
* @llvm.aarch64.sve.stnt1.scatter
* @llvm.aarch64.sve.ldnt1.gather.uxtw
* @llvm.aarch64.sve.ldnt1.gather.scalar.offset
These intrinsic are mapped to the corresponding SVE instructions
(example for half-words, zero-extending):
* ldnt1h { z0.s }, p0/z, [z0.s, x0]
* stnt1h { z0.s }, p0/z, [z0.s, x0]
Note that for non-temporal gathers/scatters, the SVE spec defines only
one instruction type: "vector + scalar". For this reason, we swap the
arguments when processing intrinsics that implement the "scalar +
vector" addressing mode:
* @llvm.aarch64.sve.ldnt1.gather
* @llvm.aarch64.sve.ldnt1.gather.uxtw
* @llvm.aarch64.sve.stnt1.scatter
* @llvm.aarch64.sve.ldnt1.gather.uxtw
In other words, all intrinsics for gather-loads and scatter-stores
implemented in this patch are mapped to the same load and store
instruction, respectively.
The sve2_mem_gldnt_vs multiclass (and it's counterpart for scatter
stores) from SVEInstrFormats.td was split into:
* sve2_mem_gldnt_vec_vs_32_ptrs (32bit wide base addresses)
* sve2_mem_gldnt_vec_vs_62_ptrs (64bit wide base addresses)
This is consistent with what we did for
@llvm.aarch64.sve.ld1.scalar.offset and highlights the actual split in
the spec and the implementation.
Reviewed by: sdesmalen
Differential Revision: https://reviews.llvm.org/D74858
We get the simple cases of this via demanded elements and other folds,
but that doesn't work if the values have >1 use, so add a dedicated
match for the pattern.
We already have this transform in IR, but it doesn't help the
motivating x86 tests (based on PR42024) because the shuffles don't
exist until after legalization and other combines have happened.
The AArch64 test shows a minimal IR example of the problem.
Differential Revision: https://reviews.llvm.org/D75348
According to Joseph Myers, a libm maintainer
> They were only ever an ABI (selected by use of -ffinite-math-only or
> options implying it, which resulted in the headers using "asm" to redirect
> calls to some libm functions), not an API. The change means that ABI has
> turned into compat symbols (only available for existing binaries, not for
> anything newly linked, not included in static libm at all, not included in
> shared libm for future glibc ports such as RV32), so, yes, in any case
> where tools generate direct calls to those functions (rather than just
> following the "asm" annotations on function declarations in the headers),
> they need to stop doing so.
As a consequence, we should no longer assume these symbols are available on the
target system.
Still keep the TargetLibraryInfo for constant folding.
Differential Revision: https://reviews.llvm.org/D74712