Commit Graph

4505 Commits

Author SHA1 Message Date
Thomas Preud'homme 3ba246719b [test, AArch64] Fix use of var defined in CHECK-NOT
LLVM test CodeGen/AArch64/speculation-hardening.ll tries to check for
the absence of a sequence of instructions with several CHECK-NOT with
one of those directives using a variable defined in another. However
CHECK-NOT are checked independently so that is using a variable defined
in a pattern that should not occur in the input.

This commit removes the dependency between those CHECK-NOT by replacing
single occurence of the undefined variable by a regex match, and
multiple occurences by a definition followed by a use.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D99866
2021-04-06 21:15:15 +01:00
Thomas Preud'homme 638d70be6b [test, AArch64] Fix use of var defined in CHECK-NOT
LLVM test CodeGen/AArch64/aarch64-tbz.ll tries to check for the absence
of a sequence of instructions with several CHECK-NOT with one of those
directives using a variable defined in another. However CHECK-NOT are
checked independently so that is using a variable defined in a pattern
that should not occur in the input.

This commit removes the definition and uses of variable to check each
line independently, making the check stronger than the current one. It
also removes unnecessary regex match for labels.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D99602
2021-04-06 10:45:08 +01:00
Sjoerd Meijer d5f1131c81 [AArch64] Default to zero-cycle-zeroing FP registers
It is generally beneficial to prefer "movi d0, #0" over "fmov s0, wzr" as this
is most efficient across all cores; it is recognised as a zeroing idiom. For
newer cores, fmov instructions can also be eliminated early and there is no
difference with movi, but some implementations lack this so is not true for
other/older cores. Thus this standardises on using movi as this should always
gives the same or better performance than the fmov with wzr.

Differential Revision: https://reviews.llvm.org/D99586
2021-04-06 09:47:50 +01:00
Sjoerd Meijer ef05b08c61 [AArch64] Use 64-bit movi for zeroing halfs/floats
This was using the .2d variant which zeros 128 bits, but using the .2s variant
that zeros 64 bits is faster on some cores.

This is a prep step for D99586 to always using movi for zeroing floats.

Differential Revision: https://reviews.llvm.org/D99710
2021-04-06 08:42:13 +01:00
Nikita Popov 665065821e [FastISel] Remove kill tracking
This is a followup to D98145: As far as I know, tracking of kill
flags in FastISel is just a compile-time optimization. However,
I'm not actually seeing any compile-time regression when removing
the tracking. This probably used to be more important in the past,
before FastRA was switched to allocate instructions in reverse
order, which means that it discovers kills as a matter of course.

As such, the kill tracking doesn't really seem to serve a purpose
anymore, and just adds additional complexity and potential for
errors. This patch removes it entirely. The primary changes are
dropping the hasTrivialKill() method and removing the kill
arguments from the emitFast methods. The rest is mechanical fixup.

Differential Revision: https://reviews.llvm.org/D98294
2021-04-03 15:50:13 +02:00
Brendon Cahoon 09a88278cb [GlobalISel] Allow different types for G_SBFX and G_UBFX operands
Change the definition of G_SBFX and G_UBFX so that the lsb and width
can have different types than the src and dst operands.

Differential Revision: https://reviews.llvm.org/D99739
2021-04-02 11:11:06 -04:00
Jun Ma 2dfa2c0ea0 [NFC][SVE] update sve-intrinsics-int-arith.ll under update_llc_test_checks.py 2021-04-02 20:17:11 +08:00
Jun Ma 274ac9d40e [AArch64][SVE] Lowering sve.dot to DOT node
Differential Revision: https://reviews.llvm.org/D99699
2021-04-02 20:05:17 +08:00
Mircea Trofin ce61def529 [regalloc] Ensure Query::collectInterferringVregs is called before interval iteration
The main part of the patch is the change in RegAllocGreedy.cpp: Q.collectInterferringVregs()
needs to be called before iterating the interfering live ranges.

The rest of the patch offers support that is the case: instead of  clearing the query's
InterferingVRegs field, we invalidate it. The clearing happens when the live reg matrix
is invalidated (existing triggering mechanism).

Without the change in RegAllocGreedy.cpp, the compiler ices.

This patch should make it more easily discoverable by developers that
collectInterferringVregs needs to be called before iterating.

I will follow up with a subsequent patch to improve the usability and maintainability of Query.

Differential Revision: https://reviews.llvm.org/D98232
2021-04-01 08:33:28 -07:00
Bradley Smith 2f45e632c0 [AArch64][SVE] Improve codegen for select nodes with fixed types
Additionally, move the existing fixed vselect tests to *-vselect.ll.

Differential Revision: https://reviews.llvm.org/D99418
2021-04-01 15:54:37 +01:00
Bradley Smith 0934fa4f5d [AArch64][SVE] SVE functions should use the SVE calling convention for fast calls
When an SVE function calls another SVE function using the C calling
convention we use the more efficient SVE VectorCall PCS.  However,
for the Fast calling convention we're incorrectly falling back to
the generic AArch64 PCS.

This patch adds the same "can use SVE vector calling convention"
detection used by CallingConv::C to CallingConv::Fast.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D99657
2021-04-01 15:52:08 +01:00
Simonas Kazlauskas 777a58e05b Support {S,U}REMEqFold before legalization
This allows these optimisations to apply to e.g. `urem i16` directly
before `urem` is promoted to i32 on architectures where i16 operations
are not intrinsically legal (such as on Aarch64). The legalization then
later can happen more directly and generated code gets a chance to avoid
wasting time on computing results in types wider than necessary, in the end.

Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D88785
2021-04-01 01:35:41 +03:00
Bradley Smith 4e52daa254 [AArch64][SVE] Add tests for UREM/SREM using fixed SVE types
Differential Revision: https://reviews.llvm.org/D99265
2021-03-31 16:09:55 +01:00
Florian Hahn 52e015081a
[AArch64] Avoid SCALAR_TO_VECTOR for single FP constant vector.
Currently the code only checks for integer constants (ConstantSDNode)
and triggers an infinite cycle for single-element floating point
vector constants. We need to check for both FP and integer constants.

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D99384
2021-03-31 10:17:36 +01:00
Amara Emerson a35c2c7942 [GlobalISel] Implement fewerElements legalization for vector reductions.
This patch adds 3 methods, one for power-of-2 vectors which use tree
reductions using vector ops, before a final reduction op. For non-pow-2
types it generates multiple narrow reductions and combines the values with
scalar ops.

Differential Revision: https://reviews.llvm.org/D97163
2021-03-30 11:19:21 -07:00
Amara Emerson 1bc90847ee [AArch64][GlobalISel] Define some legalization rules for G_ROTR and G_ROTL.
For imported pattern purposes, we have a custom rule that promotes the rotate
amount to 64b as well.

Differential Revision: https://reviews.llvm.org/D99463
2021-03-30 11:11:19 -07:00
Amara Emerson 91887cd4ec [AArch64][GlobalISel] Combine funnel shifts to rotates.
Differential Revision: https://reviews.llvm.org/D99388
2021-03-30 11:00:36 -07:00
Jessica Paquette 700431128e [GlobalISel][AArch64] Combine G_SEXT_INREG + right shift -> G_SBFX
Basically a port of isBitfieldExtractOpFromSExtInReg in AArch64ISelDAGToDAG.

This is only done post-legalization for now. Once the legalizer knows how to
decompose these back into shifts, this requirement can probably be removed.

Differential Revision: https://reviews.llvm.org/D99230
2021-03-30 10:14:30 -07:00
Joe Ellis a7dde4c5f7 [AArch64][SVE] Lower fixed length INSERT_VECTOR_ELT
Differential Revision: https://reviews.llvm.org/D98496
2021-03-30 09:37:11 +00:00
Joe Ellis c4d39f64d0 [AArch64][SVE] Lower fixed length EXTRACT_VECTOR_ELT
Differential Revision: https://reviews.llvm.org/D98625
2021-03-30 09:35:44 +00:00
Jun Ma 1af373c673 [AArch64][SVE] Codegen dup_lane for dup(vector_extract)
Differential Revision: https://reviews.llvm.org/D99324
2021-03-30 10:35:08 +08:00
Jun Ma b0db2dbc29 [AArch64][SVEIntrinsicOpts] Optimize tbl+dup into dup+extractelement
Differential Revision: https://reviews.llvm.org/D99412
2021-03-30 10:35:08 +08:00
Jessica Paquette 247ff26a89 [AArch64][GlobalISel] NFC: Replace IR regbankselect test with MIR test
regbank-ceil.ll -> regbank-ceil.mir

The IR test was intended to only check register banks. This makes it brittle,
especially as we improve load/store combines in GlobalISel.

Rewriting this as a MIR test also makes it more consistent with the rest of
the testcases in GlobalISel.
2021-03-29 16:32:34 -07:00
Florian Hahn 482283042f
[AArch64] Remove custom zext/sext legalization code.
Currently performExtendCombine assumes that the src-element bitwidth * 2
is a valid MVT. But this is not the case for i1 and it causes a crash on
the v64i1 test cases added in this patch.

It turns out that this code appears to not be needed; the same patterns are
handled by other code and we end up with the same results, even without the
custom lowering. I also added additional test cases in a50037aaa6.

Let's just remove the unneeded code.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D99437
2021-03-29 22:22:05 +01:00
Florian Hahn a50037aaa6
[AArch64] Add a few more vector extension tests. 2021-03-29 18:56:00 +01:00
Bradley Smith 9745dce8c3 [SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps
This is currently performed in SelectionDAGLegalize, here we make it also
happen in LegalizeVectorOps, allowing a target to lower the SETCC condition
codes first in LegalizeVectorOps and then lower to a custom node afterwards,
without having to duplicate all of the SETCC condition legalization in the
target specific lowering.

As a result of this, fixed length floating point SETCC nodes can now be
properly lowered for SVE.

Differential Revision: https://reviews.llvm.org/D98939
2021-03-29 15:32:25 +01:00
Matt Arsenault 2f779e79d5 AArch64/GlobalISel: Remove IR section from test 2021-03-28 11:12:59 -04:00
Amara Emerson 55533203d7 [GlobalISel] Add G_ROTR and G_ROTL opcodes for rotates.
Differential Revision: https://reviews.llvm.org/D99383
2021-03-25 17:23:30 -07:00
Jessica Paquette 23f657c165 [AArch64][GlobalISel] Emit bzero on Darwin
Darwin platforms for both AArch64 and X86 can provide optimized `bzero()`
routines. In this case, it may be preferable to use `bzero` in place of a
memset of 0.

This adds a G_BZERO generic opcode, similar to G_MEMSET et al. This opcode can
be generated by platforms which may want to use bzero.

To emit the G_BZERO, this adds a pre-legalize combine for AArch64. The
conditions for this are largely a port of the bzero case in
`AArch64SelectionDAGInfo::EmitTargetCodeForMemset`.

The only difference in comparison to the SelectionDAG code is that, when
compiling for minsize, this will fire for all memsets of 0. The original code
notes that it's not beneficial to do this for small memsets; however, using
bzero here will save a mov from wzr. For minsize, I think that it's preferable
to prioritise omitting the mov.

This also fixes a bug in the libcall legalization code which would delete
instructions which could not be legalized. It also adds a check to make sure
that we actually get a libcall name.

Code size improvements (Darwin):

- CTMark -Os: -0.0% geomean (-0.1% on pairlocalalign)
- CTMark -Oz: -0.2% geomean (-0.5% on bullet)

Differential Revision: https://reviews.llvm.org/D99358
2021-03-25 17:14:25 -07:00
Amara Emerson 0d2c4db637 [GlobalISel] Fix crash in RBS with a non-generic IMPLICIT_DEF.
This may occur when swifterror codegen in the translator generates these,
but we shouldn't try to handle them since they should have regclasses anyway.

rdar://75784009

Differential Revision: https://reviews.llvm.org/D99287
2021-03-24 23:08:51 -07:00
Jessica Paquette a141c7d06b [AArch64][GlobalISel] Select G_SBFX and G_UBFX
Add selection support for G_SBFX and G_UBFX and add a test.

These must always have a constant LSB and width.

Differential Revision: https://reviews.llvm.org/D99224
2021-03-24 11:15:57 -07:00
Jessica Paquette 1818dc394f [AArch64][GlobalISel] Mark G_SBFX/G_UBFX as legal for s32 and s64
This isn't perfect, since we should also verify that these only use constants.

Differential Revision: https://reviews.llvm.org/D99219
2021-03-24 11:08:41 -07:00
Nashe Mncube ac2a1e9596 [SVE] Suppress vselect warning from incorrect interface call
The VSelectCombine handler within AArch64ISelLowering,
uses an interface call which only expects fixed vectors.
This generates a warning when the call is made on a
scalable vector. This warning has been suppressed with this change,
by using the ElementCount interface, which supports both fixed and scalable vectors.
I have also added a regression test which recreates the warning.

Differential Revision: https://reviews.llvm.org/D98249
2021-03-24 14:34:34 +00:00
Amara Emerson 45a7fe1911 [AArch64][GlobalISel] Add test for G_FSHR legalization. 2021-03-23 16:11:45 -07:00
Amara Emerson 7bddf00581 [AArch64][GlobalISel] Lower G_FSHL and G_FSHR.
Codegen isn't as good as we need it, but that'll be done later.
2021-03-23 16:09:19 -07:00
Amara Emerson 75b6a47bd0 [AArch64][GlobalISel] Lower G_CTLZ_ZERO_UNDEF.
This adds some missing legalizer tests, which uncovered a v2s64 selection
test that wasn't working since there's no legalization or instruction for that.
2021-03-23 12:49:10 -07:00
David Sherwood 748ae5281d [IR][SVE] Add new llvm.experimental.stepvector intrinsic
This patch adds a new llvm.experimental.stepvector intrinsic,
which takes no arguments and returns a linear integer sequence of
values of the form <0, 1, ...>. It is primarily intended for
scalable vectors, although it will work for fixed width vectors
too. It is intended that later patches will make use of this
new intrinsic when vectorising induction variables, currently only
supported for fixed width. I've added a new CreateStepVector
method to the IRBuilder, which will generate a call to this
intrinsic for scalable vectors and fall back on creating a
ConstantVector for fixed width.

For scalable vectors this intrinsic is lowered to a new ISD node
called STEP_VECTOR, which takes a single constant integer argument
as the step. During lowering this argument is set to a value of 1.
The reason for this additional argument at the codegen level is
because in future patches we will introduce various generic DAG
combines such as

  mul step_vector(1), 2 -> step_vector(2)
  add step_vector(1), step_vector(1) -> step_vector(2)
  shl step_vector(1), 1 -> step_vector(2)
  etc.

that encourage a canonical format for all targets. This hopefully
means all other targets supporting scalable vectors can benefit
from this too.

I've added cost model tests for both fixed width and scalable
vectors:

  llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll
  llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll

as well as codegen lowering tests for fixed width and scalable
vectors:

  llvm/test/CodeGen/AArch64/neon-stepvector.ll
  llvm/test/CodeGen/AArch64/sve-stepvector.ll

See this thread for discussion of the intrinsic:
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html
2021-03-23 10:43:35 +00:00
Joe Ellis 6dc32da1b0 [AArch64][SVE] Test more types in sve-fixed-length-subvector.ll
Previously only the i32 type was tested. Now, the {i,f}{16,32,64} types
are tested.

The v8{i,f}16 cases lower differently to the other cases, which is worth
defending. The lowering for the other cases is currently identical, but
probably worth having for the better coverage.

Differential Revision: https://reviews.llvm.org/D98690
2021-03-22 14:09:05 +00:00
Sjoerd Meijer 7515e81e8c [AArch64] Add some float -> int -> float conversion patterns
This adds some conversion match patterns for which we want to keep the int
values in FP registers using the corresponding NEON instructions (not the FP
instructions) to avoid more costly int <-> fp register transfers.

Differential Revision: https://reviews.llvm.org/D98956
2021-03-22 11:06:08 +00:00
Jessica Paquette 0ca83730cc Recommit "[AArch64][GlobalISel] Fold constants into G_GLOBAL_VALUE"
This reverts commit 962b73dd0f.

This commit was reverted because of some internal SPEC test failures.

It turns out that this wasn't actually relevant to anything in open source, so
it's safe to recommit this.
2021-03-18 16:01:02 -07:00
Peter Waller 0d6482a76a [llvm][AArch64][SVE] Lower fixed length vector fabs
Seemingly striaghtforward.

Differential Revision: https://reviews.llvm.org/D98434
2021-03-18 17:20:08 +00:00
Matt Arsenault 61f834cc09 GlobalISel: Insert memcpy for outgoing byval arguments
byval requires an implicit copy between the caller and callee such
that the callee may write into the stack area without it modifying the
value in the parent. Previously, this was passing through the raw
pointer value which would break if the callee wrote into it.

Most of the time, this copy can be optimized out (however we don't
have the optimization SelectionDAG does yet).

This will trigger more fallbacks for AMDGPU now, since we don't have
legalization for memcpy yet (although we should stop using byval
anyway).
2021-03-18 09:16:54 -04:00
Thomas Preud'homme b79044391e [test] Fix incorrect use of string variable use
LLVM test CodeGen/AArch64/machine-outliner-retaddr-sign-thunk.ll uses
a string substitution block that contains a regex matching block. This
seems like as a copy/paste from other similar test where the match also
defines a variable, hence the [[]] syntax. In this case however this is
a CHECK-NOT variable so nothing should match. No variable definition is
thus expected and the square brackets can be dropped.

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D98853
2021-03-18 12:19:51 +00:00
Sjoerd Meijer 90ecb862a0 [AArch64] Rewrite (add, csel) to cinc
Don't rewrite an add instruction with 2 SET_CC operands into a csel
instruction. The total instruction sequence uses an extra instruction and
register. Preventing this allows us to match a `(add, csel)` pattern and
rewrite this into a `cinc`.

Differential Revision: https://reviews.llvm.org/D98704
2021-03-18 08:49:27 +00:00
Amara Emerson 28963d895b [GlobalISel] Don't DCE LIFETIME_START/LIFETIME_END markers.
These are pseudos without any users, so DCE was killing them in the combiner.

Marking them as having side effects doesn't seem quite right since they don't.

Gives a nice 0.3% geomean size win on CTMark -Os.

Differential Revision: https://reviews.llvm.org/D98811
2021-03-17 18:02:08 -07:00
Amara Emerson d7fed7b899 [AArch64][GlobalISel] Fall back if disabling neon/fp in the translator.
The previous technique relied on early-exiting the legalizer predicate
initialization, leaving an empty rule table. That causes a fallback
for most instructions, but some have legacy rules defined like G_ZEXT
which can try continue, but then crash.

We should fall back earlier, in the translator, to avoid this issue.

Differential Revision: https://reviews.llvm.org/D98730
2021-03-17 15:08:08 -07:00
Pavel Iliin bd79b565e3 [NFC][AArch64] Add codegen tests for various csinc-cmp sequences. 2021-03-17 20:17:40 +00:00
Bradley Smith cf0da91ba5 [AArch64][SVE/NEON] Add support for FROUNDEVEN for both NEON and fixed length SVE
Previously NEON used a target specific intrinsic for frintn, given that
the FROUNDEVEN ISD node now exists, move over to that instead and add
codegen support for that node for both NEON and fixed length SVE.

Differential Revision: https://reviews.llvm.org/D98487
2021-03-17 11:41:22 +00:00
Joe Ellis ff2dd8a212 [AArch64][SVE] Fold vector ZExt/SExt into gather loads where possible
This commit folds sxtw'd or uxtw'd offsets into gather loads where
possible with a DAGCombine optimization.

As an example, the following code:

     1	#include <arm_sve.h>
     2
     3	svuint64_t func(svbool_t pred, const int32_t *base, svint64_t offsets) {
     4	  return svld1sw_gather_s64offset_u64(
     5	    pred, base, svextw_s64_x(pred, offsets)
     6	  );
     7	}

would previously lower to the following assembly:

    sxtw	z0.d, p0/m, z0.d
    ld1sw	{ z0.d }, p0/z, [x0, z0.d]
    ret

but now lowers to:

    ld1sw   { z0.d }, p0/z, [x0, z0.d, sxtw]
    ret

Differential Revision: https://reviews.llvm.org/D97858
2021-03-16 15:09:46 +00:00
Joe Ellis 14bd44edc6 [AArch64][SVEIntrinsicOpts] Factor out redundant SVE mul/fmul intrinsics
This commit implements an IR-level optimization to eliminate idempotent
SVE mul/fmul intrinsic calls. Currently, the following patterns are
captured:

    fmul  pg  (dup_x  1.0)  V  =>  V
    mul   pg  (dup_x  1)    V  =>  V

    fmul  pg  V  (dup_x  1.0)  =>  V
    mul   pg  V  (dup_x  1)    =>  V

    fmul  pg  V  (dup  v  pg  1.0)  =>  V
    mul   pg  V  (dup  v  pg  1)    =>  V

The result of this commit is that code such as:

    1  #include <arm_sve.h>
    2
    3  svfloat64_t foo(svfloat64_t a) {
    4    svbool_t t = svptrue_b64();
    5    svfloat64_t b = svdup_f64(1.0);
    6    return svmul_m(t, a, b);
    7  }

will lower to a nop.

This commit does not capture all possibilities; only the simple cases
described above. There is still room for further optimisation.

Differential Revision: https://reviews.llvm.org/D98033
2021-03-16 14:50:17 +00:00