Commit Graph

5110 Commits

Author SHA1 Message Date
Sanjay Patel 11522cfcad [DAGCombiner] add fold for vselect based on mask of signbit, part 3
(Cond0 s> -1) ? N1 : 0 --> ~(Cond0 s>> BW-1) & N1

https://alive2.llvm.org/ce/z/mGCBrd

This was suggested as a potential enhancement in D113212 (also 7e30404c3b ).
There's an improvement for AArch that could be generalized ( X > -1 --> X >= 0 ).
For x86, we have a counter-acting fold for most cases that turns the shift+not
back into a setcc, so that needs a work-around to get more cases to use "pandn":
D113603

Note that this pattern (and a previous one) are not currently canonical forms
in IR:
https://alive2.llvm.org/ce/z/e4o96b

Adding swapped variants is left as a TODO item here, but is planned as
a near-term follow-up patch.

Differential Revision: https://reviews.llvm.org/D113426
2021-11-11 10:27:37 -05:00
Florian Hahn c2ed9fd054
[AArch64] Use custom lowering for {U,S}INT_TO_FP with i8.
With fullfp16, it is cheaper to cast the {U,S}INT_TO_FP operand to i16
first, rather than promoting it to i32. The custom lowering for
{U,S}INT_TO_FP  already supports that, it just needs to be used.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D113601
2021-11-11 08:47:15 +00:00
David Green 703ded8dda [AArch64] Allow FP16 vector fixed point converts
This extends performFpToIntCombine to work on FP16 vectors as well as
the f32 and f64 vectors it already supported.

Differential Revision: https://reviews.llvm.org/D113297
2021-11-11 07:32:52 +00:00
Florian Hahn c6258a20ef
[AArch64] Add missing tests for i8 vector to half conversions. 2021-11-10 19:00:06 +00:00
David Green 509b397dd5 [AArch64] Combine vector fptoi.sat(fmul) to fixed point fcvtz
Similar to D113199 but dealing with the vector size, this extends the
fptosi+fmul to fixed point fold to handle fptosi.sat nodes that are
equally viable, so long as the saturation width matches the output
width.

Differential Revision: https://reviews.llvm.org/D113200
2021-11-10 16:12:48 +00:00
Amara Emerson af4dc633f8 [AArch64][GlobalISel] Fix atomic truncating stores from generating invalid copies.
If the source reg is a 64b vreg, then we need to emit a subreg copy to a 32b
gpr before we select sub-64b variants like STLRW.
2021-11-09 20:47:50 -08:00
Jessica Paquette 3eabcda814 [GlobalISel] Ensure that translateInvoke adds all successors for inlineasm
The existing code didn't add all necessary successors, which resulted in
disjoint basic blocks. These would end up not being legalized which, in the
best case, caused a fallback only in assert builds.

Here's an example:

https://godbolt.org/z/ndx15Enfj

We also end up getting weird codegen here as well.

Refactoring the code here allows us to correctly attach all successors. With
this patch, the above example gives correct codegen at -O0 with and without
asserts.

Also autogen the testcase to show that we add all the successors now.

Differential Revision: https://reviews.llvm.org/D113437
2021-11-09 16:20:34 -08:00
David Green 1f01b31755 [AArch64] Extend and regenerate fcvt_combine.ll. NFC
This adds half and fptoi.sat variants of the tests in fcvt_combine.ll,
and regenerates the resulting check lines.
2021-11-09 20:29:42 +00:00
Andrew Savonichev b702276ad0 [AArch64] Add Machine InstCombiner patterns for FMUL indexed variant
This patch adds DUP+FMUL => FMUL_indexed pattern to InstCombiner.
FMUL_indexed is normally selected during instruction selection, but it
does not work in cases when VDUP and VMUL are in different basic
blocks.

Differential Revision: https://reviews.llvm.org/D99662
2021-11-09 15:30:19 +03:00
Sanjay Patel e2b1d3260a [AArch][x86] add tests for vselect; NFC
This is a potential follow-up suggested in D113212.
2021-11-08 15:21:19 -05:00
Quinn Pham ca21488eac [llvm] Inclusive language: replace master with main in file paths in LIT tests
[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with main in file/directory paths in llvm LIT tests.

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D113190
2021-11-08 12:39:50 -06:00
David Sherwood 8d38c24fb6 [SVE][CodeGen] Improve codegen for some FP insert_subvector cases
When inserting an unpacked FP subvector into a packed vector we
can simply cast the unpacked value into a packed value, since
both types are legal for SVE. We can then use this as the input
for the UZP instruction. This avoids us expanding the operation
by going through the stack.

Differential Revision: https://reviews.llvm.org/D113270
2021-11-08 13:45:55 +00:00
Simon Pilgrim 1f60302a37 [AArch64] Precommit i256 test from D111530 2021-11-08 10:47:57 +00:00
David Green a982940eb5 [AArch64] Combine fptoi.sat(fmul) to fixed point cvtf
We already have patterns for fptosi and fptoui plus fmul to fixed point
convert, this adds equivalent patterns for fptosi.sat and fptoui.sat,
which should apply equally well for the legal saturating variants.

Differential Revision: https://reviews.llvm.org/D113199
2021-11-08 10:07:34 +00:00
Andrew Wei bf3784b882 [AArch64] Canonicalize X*(Y+1) or X*(1-Y) to madd/msub
Performing the rearrangement for add/sub and mul instructions to match the madd/msub pattern

Reviewed By: dmgreen, sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D111862
2021-11-08 16:49:31 +08:00
David Green 17acd6d940 [AArch64] Rewrite and update fcvt-fixed.ll. NFC
This rewrites the fcvt-fixed.ll test case to be separate functions, not
one large function with volatile global stores. It also adds fp16 and
fptoi.sat testing at the same time.
2021-11-07 18:11:49 +00:00
Sanjay Patel 7e30404c3b [DAGCombiner] add fold for vselect based on mask of signbit, part 2
This is the 'or' sibling for the fold added with:
D113212

https://alive2.llvm.org/ce/z/tgnp7K

Note that neither of these transforms is poison-safe,
but it does not seem to matter at this level. We have
had the scalar version of D113212 for a long time, so
this is just making optimizer behavior consistent.

We do not have the scalar version of *this* fold,
however, so that is another follow-up.
2021-11-05 15:02:12 -04:00
Sanjay Patel 4d513f2527 [AArch] add tests for vselect; NFC
These are copy/pasted from the related test patterns in D113212.
2021-11-05 15:02:12 -04:00
Florian Hahn f64580f8d2
[AArch64][GISel] Optimize 8 and 16 bit variants of uaddo.
Try simplify G_UADDO with 8 or 16 bit operands to wide G_ADD and TBNZ if
result is only used in the no-overflow case. It is restricted to cases
where we know that the high-bits of the operands are 0. If there's an
overflow, then the the 9th or 17th bit must be set, which can be checked
using TBNZ.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D111888
2021-11-05 19:11:15 +01:00
Sanjay Patel 4fc1fc4005 [DAGCombiner] add fold for vselect based on mask of signbit
(X s< 0) ? Y : 0 --> (X s>> BW-1) & Y

We canonicalize to the icmp+select form in IR, and we already have this fold
for scalar select in SDAG, so I think it's an oversight that we don't have
the fold for vectors. It seems neutral for AArch64 and saves some instructions
on x86.

Whether we should also have the sibling folds for the inverse condition or
all-ones true value may depend on target-specific factors such as whether
there's an "and-not" instruction.

Differential Revision: https://reviews.llvm.org/D113212
2021-11-05 10:06:16 -04:00
Sanjay Patel 1e7afa2a0d [AArch64] add tests for vector select; NFC 2021-11-05 10:06:16 -04:00
David Sherwood 657a1dcd0d [AArch64] Add target DAG combine for UUNPKHI/LO
When created a UUNPKLO/HI node with an undef input then the
output should also be undef. I've added a target DAG combine
function to ensure we avoid creating an unnecessary uunpklo/hi
instruction.

Differential Revision: https://reviews.llvm.org/D113266
2021-11-05 13:50:59 +00:00
Jingu Kang a7b1872593 [AArch64] Fix a bug from a pattern for uaddv(uaddlp(x)) ==> uaddlv
A pattern has selected wrong uaddlv MI. It should be as below.

uaddv(uaddlp(v8i8)) ==> uaddlv(v8i8)

Differential Revision: https://reviews.llvm.org/D113263
2021-11-05 12:48:18 +00:00
Ben Shi 59c3b48d99 Revert "[AArch64] Optimize add/sub with immediate"
This reverts commit 3de3ca3137.
2021-11-03 14:15:21 +08:00
Ben Shi 3de3ca3137 [AArch64] Optimize add/sub with immediate
Optimize ([add|sub] r, imm) -> ([ADD|SUB] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([add|sub] r, imm) -> ([SUB|ADD] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Reviewed By: jaykang10, dmgreen

Differential Revision: https://reviews.llvm.org/D111034
2021-11-03 03:06:43 +00:00
Simon Pilgrim 325031786e [SelectionDAG] Optimize expansion for rotates/funnel shifts
If the type of a funnel shift needs to be expanded, expand it to two funnel shifts instead of regular shifts. For constant shifts, this doesn't make much difference, but for variable shifts it allows a more optimal lowering.

Also use the optimized funnel shift lowering for rotates.

Alive2: https://alive2.llvm.org/ce/z/TvHDB- / https://alive2.llvm.org/ce/z/yzPept

(Branched from D108058 as getting this completed should help unlock some other WIP patches).

Original Patch: @efriedma (Eli Friedman)

Differential Revision: https://reviews.llvm.org/D112443
2021-11-02 11:38:25 +00:00
Cameron McInally 702fd3d323 [SVE] Fix VLS FMA matching for CodeGenOpt::Aggressive.
For NEON, FMA matching is done in the MachineCombiner, and not the
DAGCombiner. That causes problems with VLS lowering, since the
vectors are fixed width at the DAGCombiner, but are scalable in
the MachineCombiner. This patch corrects it by matching FMAs for
VLS vectors in the DAGCombiner.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D112557
2021-11-01 10:43:52 -07:00
Amara Emerson 5dd9e019dd [AArch64][GlobalISel] Fix an crash in RBS due to a new regclass being added.
rdar://84674985
2021-10-29 11:47:00 -07:00
Bradley Smith 86972f1114 [AArch64][SVE] Use TargetFrameIndex in more SVE load/store addressing modes
Add support for generating TargetFrameIndex in complex patterns for
indexed addressing modes in SVE. Additionally, add missing load/stores
to getMemOpInfo and getLoadStoreImmIdx.

Differential Revision: https://reviews.llvm.org/D112617
2021-10-29 14:44:16 +00:00
Max Kazantsev 8daf76935d [Test] Regenerate some of llc test checks using auto updater 2021-10-28 16:18:30 +07:00
Caroline Concatto 2186b011e9 [Driver][AArch64]Add driver support for neoverse-512tvb target
The support for  neoverse-512tvb mirrors the same option available in GCC[1].
There is no functional effect for this option yet.
This patch ensures the driver accepts "-mcpu=neoverse-512tvb", and enough
plumbing is in place to allow the new option to be used in the future.

[1]https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html

Differential Revision: https://reviews.llvm.org/D112406
2021-10-28 09:08:40 +01:00
Kerry McLaughlin f01fafdcd4 [SVE][CodeGen] Fix incorrect legalisation of zero-extended masked loads
PromoteIntRes_MLOAD always sets the extension type to `EXTLOAD`, which
results in a sign-extended load. If the type returned by getExtensionType()
for the load being promoted is something other than `NON_EXTLOAD`, we
should instead pass this to getMaskedLoad() as the extension type.

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D112320
2021-10-27 14:15:41 +01:00
Caroline Concatto 1137b7207d [SelectionDAG] Widening the result of INSERT_SUBVECTOR.
Widens the result and first input vector because they have the same size.
The subvector to be inserted is widened in the operand widen function.

Differential Revision: https://reviews.llvm.org/D112187
2021-10-27 13:52:25 +01:00
Alexandros Lamprineas 8689f5e6e7 [AArch64] Add support for the 'R' architecture profile.
This change introduces subtarget features to predicate certain
instructions and system registers that are available only on
'A' profile targets. Those features are not present when
targeting a generic CPU, which is the default processor.

In other words the generic CPU now means the intersection of
'A' and 'R' profiles. To maintain backwards compatibility we
enable the features that correspond to -march=armv8-a when the
architecture is not explicitly specified on the command line.

References: https://developer.arm.com/documentation/ddi0600/latest

Differential Revision: https://reviews.llvm.org/D110065
2021-10-27 12:32:30 +01:00
Danila Malyutin 2d9ee590b6 [AArch64] Handle ST1iN instructions in isAArch64FrameOffsetLegal
Before the code would crash with "unhandled opcode in
isAArch64FrameOffsetLegal" when there was a spill from extractelement.
Fixes pr52249

Differential Revision: https://reviews.llvm.org/D112311
2021-10-25 17:05:12 +03:00
Kerry McLaughlin 1f49b71fe5 [SVE][CodeGen] Enable reciprocal estimates for scalable fdiv/fsqrt
This patch enables the use of reciprocal estimates for SVE
when both the -Ofast and -mrecip flags are used.

Reviewed By: david-arm, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D111657
2021-10-25 11:30:44 +01:00
Jingu Kang a502436259 [AArch64] Remove redundant ORRWrs which is generated by zero-extend
%3:gpr32 = ORRWrs $wzr, %2, 0
%4:gpr64 = SUBREG_TO_REG 0, %3, %subreg.sub_32

If AArch64's 32-bit form of instruction defines the source operand of ORRWrs,
we can remove the ORRWrs because the upper 32 bits of the source operand are
set to zero.

Differential Revision: https://reviews.llvm.org/D110841
2021-10-25 09:47:07 +01:00
Bradley Smith cfe22cd4ef [AArch64][SVE] Add new ld<n> intrinsics that return a struct of vscale types
This will allow us to reuse existing interleaved load logic in
lowerInterleavedLoad that exists for neon types, but for SVE fixed
types.

The goal eventually will be to replace the existing ld<n> intriniscs
with these, once a migration path has been sorted out.

Differential Revision: https://reviews.llvm.org/D112078
2021-10-22 14:13:17 +00:00
Simon Pilgrim d8e50c9dba [CodeGen] Add PR50197 AArch64/ARM/X86 test coverage
Pre-commit for D111530
2021-10-22 14:22:46 +01:00
Jessica Paquette 5dc339d982 [AArch64][GlobalISel] Fold 64-bit cmps with 64-bit adds
G_ICMP is selected to an arithmetic overflow op (ADDS/SUBS/etc) with a dead
destination + a CSINC instruction.

We have a fold which allows us to combine 32-bit adds with G_ICMP.

The problem with G_ICMP is that we model it as always having a 32-bit
destination even though it can be a 64-bit operation. So, we were missing some
opportunities for 64-bit folds.

This patch teaches the fold to recognize 64-bit G_ICMPs + refactors some of
the code surrounding CSINC accordingly.

(Later down the line, I think we should probably change the way we handle G_ICMP
in general.)

Differential Revision: https://reviews.llvm.org/D111088
2021-10-21 13:51:44 -07:00
Kerry McLaughlin 0d153df69e [SVE] Fix selection failure when splitting extended masked loads
When splitting a masked load, `GetDependentSplitDestVTs` is used to get the
MemVTs of the high and low parts. If the masked load is extended, this
may return VTs with different element types which are used to create the
high & low masked load instructions.
This patch changes `GetDependentSplitDestVTs` to ensure we return VTs with
the same element type.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D111996
2021-10-21 13:04:38 +01:00
Jon Roelofs b046eb19b8 [AArch64][GlobalISel] combine (and (or x, c1), c2) => (and x, c2) iff c1 & c2 == 0
https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111856
2021-10-20 12:11:52 -07:00
Sander de Smalen be6c8dc765 [SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors.
When inserting a scalable subvector into a scalable vector through
the stack, the index to store to needs to be scaled by vscale.
Before this patch, that didn't yet happen, so it would generate the
wrong offset, thus storing a subvector to the incorrect address
and overwriting the wrong lanes.

For some insert:
  nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2)

The offset was not scaled by vscale:
  orr     x8, x8, #0x4
  st1h    { z0.h }, p0, [sp]
  st1h    { z1.d }, p1, [x8]
  ld1h    { z0.h }, p0/z, [sp]

And is changed to:
  mov x8, sp
  st1h { z0.h }, p0, [sp]
  st1h { z1.d }, p1, [x8, #1, mul vl]
  ld1h { z0.h }, p0/z, [sp]

Differential Revision: https://reviews.llvm.org/D111633
2021-10-20 13:55:24 +01:00
Daniel Kiss f903c85055 [AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions.
autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions
therefore let's emit .cfi_negate_ra_state for these too.
In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication
in one step here we can't emit the . cfi_negate_ra_state because that would be point after
the ret* instruction.

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D111780
2021-10-20 11:03:52 +02:00
David Sherwood 5ea35791e6 [AArch64] Split out processor/tuning features
Following on from an earlier patch that introduced support for -mtune
for AArch64 backends, this patch splits out the tuning features
from the processor features. This gives us the ability to enable
architectural feature set A for a given processor with "-mcpu=A"
and define the set of tuning features B with "-mtune=B".

It's quite difficult to write a test that proves we select the
right features according to the tuning attribute because most
of these relate to scheduling. I have created a test here:

  CodeGen/AArch64/misched-fusion-addr-tune.ll

that demonstrates the different scheduling choices based upon
the tuning.

Differential Revision: https://reviews.llvm.org/D111551
2021-10-19 15:18:55 +01:00
Jon Roelofs 1300677f97 [AArch64][GlobalISel] combine and + [la]sr => ubfx
https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111839
2021-10-18 10:33:01 -07:00
Andrew Wei f5056c8c16 [AArch64] Improve shuffle vector by using wider types
Try to widen element type to get a new mask value for a better permutation
sequence, so that we can use NEON shuffle instructions, such as zip1/2,
UZP1/2, TRN1/2, REV, INS, etc.
For example:
  shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 6, i32 7, i32 2, i32 3>
is equivalent to:
  shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 3, i32 1>
Finally, we can get:
  mov     v0.d[0], v1.d[1]

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D111619
2021-10-18 21:24:45 +08:00
Peter Waller c0782ba898 [AArch64][SVE][CodeGen] Add tests for RSHRN{T,B} instructions
Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D111735
2021-10-18 11:00:01 +00:00
Florian Hahn e9ff7d250e
[AArch64][GISel] Add 8/16 bit uaddo lowering tests.
Precommit tests for D111888.
2021-10-18 09:48:43 +01:00
David Truby 2e0fb007d6 [llvm][AArch64][SVE] Fold literals into math instructions
SVE has predicated literal forms of some instructions for specific
literals, which currently are generated correctly when using ACLE
but not when those instructions are generated directly.

This adds the patterns to generate those instructions when
generating from standard LLVM IR instructions.

Differential Revision: https://reviews.llvm.org/D99074
2021-10-17 10:57:04 +00:00