Commit Graph

4158 Commits

Author SHA1 Message Date
Simon Pilgrim 8562d2c040 [AArch64] Regenerate min/max tests and add vXi64 umin/umax test coverage 2020-11-26 15:33:39 +00:00
Kerry McLaughlin 4bee3197f6 [SVE][CodeGen] Extend isConstantSplatValue to support ISD::SPLAT_VECTOR
Updated the affected scalable_of_scalable tests in sve-gep.ll, as isConstantSplatValue now returns true in DAGCombiner::visitMUL and folds `(mul x, 1) -> x`

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91363
2020-11-26 11:19:40 +00:00
Craig Topper aea130f736 [LegalizerTypes] Add support for scalarizing the operand of an FP_EXTEND when the result type is legal. 2020-11-25 20:30:21 -08:00
Paul Robinson cf1c774d6a [FastISel] Flush local value map on ever instruction
Local values are constants or addresses that can't be folded into
the instruction that uses them. FastISel materializes these in a
"local value" area that always dominates the current insertion
point, to try to avoid materializing these values more than once
(per block).

https://reviews.llvm.org/D43093 added code to sink these local
value instructions to their first use, which has two beneficial
effects. One, it is likely to avoid some unnecessary spills and
reloads; two, it allows us to attach the debug location of the
user to the local value instruction. The latter effect can
improve the debugging experience for debuggers with a "set next
statement" feature, such as the Visual Studio debugger and PS4
debugger, because instructions to set up constants for a given
statement will be associated with the appropriate source line.

There are also some constants (primarily addresses) that could be
produced by no-op casts or GEP instructions; the main difference
from "local value" instructions is that these are values from
separate IR instructions, and therefore could have multiple users
across multiple basic blocks. D43093 avoided sinking these, even
though they were emitted to the same "local value" area as the
other instructions. The patch comment for D43093 states:

  Local values may also be used by no-op casts, which adds the
  register to the RegFixups table. Without reversing the RegFixups
  map direction, we don't have enough information to sink these
  instructions.

This patch undoes most of D43093, and instead flushes the local
value map after(*) every IR instruction, using that instruction's
debug location. This avoids sometimes incorrect locations used
previously, and emits instructions in a more natural order.

This does mean materialized values are not re-used across IR
instruction boundaries; however, only about 5% of those values
were reused in an experimental self-build of clang.

(*) Actually, just prior to the next instruction. It seems like
it would be cleaner the other way, but I was having trouble
getting that to work.

Differential Revision: https://reviews.llvm.org/D91734
2020-11-25 13:05:00 -05:00
Mark Murray 2b6691894a [ARM][AArch64] Adding Neoverse N2 CPU support
Add support for the Neoverse N2 CPU to the ARM and AArch64 backends.

Differential Revision: https://reviews.llvm.org/D91695
2020-11-25 11:42:54 +00:00
Kerry McLaughlin 603d40da9d [SVE][CodeGen] Add a DAG combine to extend mscatter indices
This patch adds a target-specific DAG combine for mscatter to promote indices
with element types i8 or i16 before legalisation, plus various tests with illegal types.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90945
2020-11-25 11:18:22 +00:00
Kai Luo 5931be60b5 [DAGCombine][PowerPC] Convert negated abs to trivial arithmetic ops
This patch converts `0 - abs(x)` to `Y = sra (X, size(X)-1); sub (Y, xor (X, Y))` for better codegen.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D91120
2020-11-24 09:43:35 +00:00
Amara Emerson ca7fdf7ce0 [AArch64][GlobalISel] Add pre-isel lowering to convert p0 G_DUPs to use s64.
This uses the same reasoning as other similar conversions just before selection,
without it we miss out on selection because the importer considers s64 and p0
distinct types.
2020-11-23 22:59:35 -08:00
Amara Emerson 0fb76b9035 [AArch64][GlobalISel] Make <2 x p0> of G_SHUFFLE_VECTOR legal. 2020-11-23 22:59:35 -08:00
Martin Storsjö 6f792041a5 Reapply "[CodeGen] [WinException] Only produce handler data at the end of the function if needed"
This reapplies 36c64af9d7 in updated
form.

Emit the xdata for each function at .seh_endproc. This keeps the
exact same output header order for most code generated by the LLVM
CodeGen layer. (Sections still change order for code built from
assembly where functions lack an explicit .seh_handlerdata
directive, and functions with chained unwind info.)

The practical effect should be that assembly output lacks
superfluous ".seh_handlerdata; .text" pairs at the end of functions
that don't handle exceptions, which allows such functions to use
the AArch64 packed unwind format again.

Differential Revision: https://reviews.llvm.org/D87448
2020-11-23 23:17:03 +02:00
Craig Topper 4252f7773a [SelectionDAG][ARM][AArch64][Hexagon][RISCV][X86] Add SDNPCommutative to fma and fmad nodes in tablegen. Remove explicit commuted patterns from targets.
X86 was already specially marking fma as commutable which allowed
tablegen to autogenerate commuted patterns. This moves it to the target
independent definition and fix up the targets to remove now
unneeded patterns.

Unfortunately, the tests change because the commuted version of
the patterns are generating operands in a different than the
explicit patterns.

Differential Revision: https://reviews.llvm.org/D91842
2020-11-23 10:09:20 -08:00
Matt Arsenault 1d1234b2a4 OpaquePtr: Update more tests to use typed sret 2020-11-20 20:08:43 -05:00
Matt Arsenault 20c43d6bd5 OpaquePtr: Bulk update tests to use typed sret 2020-11-20 17:58:26 -05:00
Amara Emerson c58df88886 [AArch64][GlobalISel] Make G_EXTRACT_VECTOR_ELT of <2 x p0> legal.
Also fix a selection issue for this which was using LLT::isScalar() when it
should have been using !isVector(), add test for that too.
2020-11-20 14:07:45 -08:00
Matt Arsenault 06c192d454 OpaquePtr: Bulk update tests to use typed byval
Upgrade of the IR text tests should be the only thing blocking making
typed byval mandatory. Partially done through regex and partially
manual.
2020-11-20 14:00:46 -05:00
Andrew Wei 1cd19fc568 [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead
Patched by: guopeilin
Reviewed By: hliao,rampitec

Differential Revision: https://reviews.llvm.org/D91513
2020-11-21 00:43:23 +08:00
Pavel Iliin 4d7df43ffd [AArch64] Out-of-line atomics (-moutline-atomics) implementation.
This patch implements out of line atomics for LSE deployment
mechanism. Details how it works can be found in llvm/docs/Atomics.rst
Options -moutline-atomics and -mno-outline-atomics to enable and disable it
were added to clang driver. This is clang and llvm part of out-of-line atomics
interface, library part is already supported by libgcc. Compiler-rt
support is provided in separate patch.

Differential Revision: https://reviews.llvm.org/D91157
2020-11-20 13:30:12 +00:00
QingShan Zhang 1b5921f4d8 [NFC][Test] Update test for IEEE Long Double 2020-11-20 09:57:45 +00:00
Adhemerval Zanella 807320119f [AArch64] Lower fptrunc/fpext from/to FP128t to/from FP16
The compiler-rt part which adds the emitted symbols is handled in
a subsequent patch.

Differential Revision: https://reviews.llvm.org/D91731
2020-11-19 15:14:50 -03:00
Florian Hahn 1983acce7c
[SelDAGBuilder] Do not require simple VTs for constraints.
In some cases, the values passed to `asm sideeffect` calls cannot be
mapped directly to simple MVTs. Currently, we crash in the backend if
that happens. An example can be found in the @test_vector_too_large_r_m
test case, where we pass <9 x float> vectors. In practice, this can
happen in cases like the simple C example below.

using vec = float __attribute__((ext_vector_type(9)));
void f1 (vec m) {
  asm volatile("" : "+r,m"(m) : : "memory");
}

One case that use "+r,m" constraints for arbitrary data types in
practice is google-benchmark's DoNotOptimize.

This patch updates visitInlineAsm so that it use MVT::Other for
constraints with complex VTs. It looks like the rest of the backend
correctly deals with that and properly legalizes the type.

And we still report an error if there are no registers to satisfy the
constraint.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D91710
2020-11-19 09:31:54 +00:00
Kai Luo 5f0ae23e71 [X86][AArch64][RISCV] Pre-commit negated abs test case. NFC. 2020-11-19 02:31:45 +00:00
Florian Hahn a9adb62a64
[AsmPrinter] Use getMnemonic for instruction-mix remark.
This patch uses the new `getMnemonic` helper from D90039
to display mnemonics instead of the internal opcodes.

The main motivation behind using the mnemonics is that they
are more user-friendly and more directly related to the assembly
the users will be presented.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D90040
2020-11-17 12:12:47 +00:00
Jessica Paquette 5bc0bd05e6 [AArch64][GlobalISel] Fold G_XOR x, -1 into G_SELECT and select CSINV
When we see

```
xor = G_XOR xor_lhs, -1
select = G_SELECT cc, tval, xor
```

Fold this into

```
select = CSINV tval, xor_lhs, cc
```

Update select-select.mir to reflect the changes.

For now, only handle the case where the G_XOR is the false-value for the
G_SELECT. It may make more sense to handle the true-value case in post-legalizer
lowering.

Differential Revision: https://reviews.llvm.org/D90774
2020-11-16 14:14:14 -08:00
Amara Emerson 0b6090699a [AArch64][GlobalISel] Look through a G_ZEXT when trying to match shift-extended register offsets.
The G_ZEXT in these cases seems to actually come from a combine that we do but
SelectionDAG doesn't. Looking through it allows us to match "uxtw #2" addressing
modes.

Differential Revision: https://reviews.llvm.org/D91475
2020-11-16 10:50:46 -08:00
David Green 2104783d02 [AArch64] Remove unused check prefixes. NFC 2020-11-14 18:30:17 +00:00
Jessica Paquette 9a8bfe3835 [AArch64][GlobalISel] Select G_SELECT cc, t, (G_SUB 0, x) -> CSNEG t, x, cc
When we see

```
%sub = G_SUB 0, %x
%select = G_SELECT %cc, %t, %sub
```

Fold away the G_SUB by producing

```
%select = CSNEG %t, %x, cc
```

Simple IR example: https://godbolt.org/z/K8TEnh

This is valid on both sides of the select, but for now, just handle one side.
It may make more sense to handle swapping sides during post-legalizer lowering.

Differential Revision: https://reviews.llvm.org/D90723
2020-11-13 10:12:51 -08:00
Kerry McLaughlin 306c8ab208 [SVE][CodeGen] Improve codegen of scalable masked scatters
If the scatter store is able to perform the sign/zero extend of
its index, this is folded into the instruction with refineIndexType().
Additionally, refineUniformBase() will return the base pointer and index
from an add + splat_vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90942
2020-11-13 11:19:36 +00:00
Jessica Paquette d0ba6c4002 [AArch64][GlobalISel] Select CSINC and CSINV for G_SELECT with constants
Select the following:

- G_SELECT cc, 0, 1 -> CSINC zreg, zreg, cc
- G_SELECT cc 0, -1 -> CSINV zreg, zreg cc
- G_SELECT cc, 1, f -> CSINC f, zreg, inv_cc
- G_SELECT cc, -1, f -> CSINV f, zreg, inv_cc
- G_SELECT cc, t, 1 -> CSINC t, zreg, cc
- G_SELECT cc, t, -1 -> CSINC t, zreg, cc

(IR example: https://godbolt.org/z/YfPna9)

These correspond to a bunch of the AArch64csel patterns in AArch64InstrInfo.td.

Unfortunately, it doesn't seem like we can import patterns that use NZCV like
those ones do. E.g.

```
def : Pat<(AArch64csel GPR32:$tval, (i32 1), (i32 imm:$cc), NZCV),
          (CSINCWr GPR32:$tval, WZR, (i32 imm:$cc))>;
```

So we have to manually select these for now.

This replaces `selectSelectOpc` with an `emitSelect` function, which performs
these optimizations.

Differential Revision: https://reviews.llvm.org/D90701
2020-11-12 14:44:01 -08:00
David Sherwood 3225fcf11e [SVE] Deal with SVE tuple call arguments correctly when running out of registers
When passing SVE types as arguments to function calls we can run
out of hardware SVE registers. This is normally fine, since we
switch to an indirect mode where we pass a pointer to a SVE stack
object in a GPR. However, if we switch over part-way through
processing a SVE tuple then part of it will be in registers and
the other part will be on the stack.

I've fixed this by ensuring that:

1. When we don't have enough registers to allocate the whole block
   we mark any remaining SVE registers temporarily as allocated.
2. We temporarily remove the InConsecutiveRegs flags from the last
   tuple part argument and reinvoke the autogenerated calling
   convention handler. Doing this prevents the code from entering
   an infinite recursion and, in combination with 1), ensures we
   switch over to the Indirect mode.
3. After allocating a GPR register for the pointer to the tuple we
   then deallocate any SVE registers we marked as allocated in 1).
   We also set the InConsecutiveRegs flags back how they were before.
4. I've changed the AArch64ISelLowering LowerCALL and
   LowerFormalArguments functions to detect the start of a tuple,
   which involves allocating a single stack object and doing the
   correct numbers of legal loads and stores.

Differential Revision: https://reviews.llvm.org/D90219
2020-11-12 08:41:50 +00:00
Amara Emerson ad376657c1 [AArch64][GlobalISel] Optimize G_PTR_ADD with a negated offset to be a G_SUB. 2020-11-11 22:46:53 -08:00
Jessica Paquette 7a70a2f04d [AArch64][GlobalISel] Mark G_FCONSTANT as legal when there is full fp16 support
When there is full fp16 support, there is no reason to widen 16-bit
G_FCONSTANTs to 32 bits. Mark them as legal in this case.

Also, we currently import a pattern for materializing a 16-bit 0.0.
Add a testcase showing we select it.

(All other 16-bit G_FCONSTANTS are not yet selected.)

Differential Revision: https://reviews.llvm.org/D89164
2020-11-11 13:25:11 -08:00
David Green 3e5b8d83f7 [AArch4] Regenerate test checks for f16-imm.ll. NFC 2020-11-11 19:42:12 +00:00
Jessica Paquette c42053f79b [AArch64][GlobalISel] Select arith extended add/sub in manual selection code
The manual selection code for add/sub was not checking if it was possible to
fold in shifts + extends (the *rx opcode variants).

As a result, we could never select things like

```
cmp x1, w0, uxtw #2
```

Because we don't import any patterns for compares.

This adds support for the arithmetic shifted register forms and updates tests
for instructions selected using `emitADD`, `emitADDS`, and `emitSUBS`.

This is a 0.1% geomean code size improvement on SPECINT2000 at -Os.

Differential Revision: https://reviews.llvm.org/D91207
2020-11-11 09:26:03 -08:00
Jessica Paquette f0580c73bb [AArch64][GlobalISel] Select negative arithmetic immediates in manual selector
Previously, we only handled negative arithmetic immediates in the imported
selector code.

Since we don't import code for, say, compares, we were missing opportunities
for things like

```
%cst:gpr(s64) = G_CONSTANT i64 -10
%cmp:gpr(s32) = G_ICMP intpred(eq), %reg0(s64), %cst
->
%adds = ADDSXri %reg0, 10, 0, implicit-def $nzcv
%cmp = CSINCWr $wzr, $wzr, 1, implicit $nzcv
```

Instead, we would have to materialize the constant and emit a SUBS.

This adds support for selection like above for SUB, SUBS, ADD, and ADDS.

This is a 0.1% geomean code size improvement on SPECINT2000 at -Os.

Differential Revision: https://reviews.llvm.org/D91108
2020-11-11 09:20:05 -08:00
Kerry McLaughlin 170947a5de [SVE][CodeGen] Lower scalable masked scatters
Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only)

Changes included in this patch:
 - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use.
    Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts.
 - Added the getCanonicalIndexType function to convert redundant addressing
   modes (e.g. scaling is redundant when accessing bytes)
 - Tests with 32 & 64-bit scaled & unscaled offsets

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90941
2020-11-11 11:50:22 +00:00
Amara Emerson 2262393090 [AArch64][GlobalISel] Port some AArch64 target specific MUL combines from SDAG.
These do things like turn a multiply of a pow-2+1 into a shift and and add,
which is a common pattern that pops up, and is universally better than expensive
madd instructions with a constant.

I've added check lines to an existing codegen test since the code being ported
is almost identical, however the mul by negative pow2 constant tests don't generate
the same code because we're missing some generic G_MUL combines still.

Differential Revision: https://reviews.llvm.org/D91125
2020-11-10 22:21:13 -08:00
Francesco Petrogalli 9f61931e07 [llvm][AArch64] Allow TB(N)Z to drop signext for sign bit tests.
For example if the sign extension is only used in for TBZ, and the value is used elsewhere with a zero extension, this can eliminate a sign extension.

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D90606
2020-11-09 18:27:48 +00:00
Paul Robinson 920befb337 [FastISel] Reduce spills around mem-intrinsic calls
FastISel generates instructions to materialize "local values" at the
top of a block, in the hope that these values could be reused within
the block.  To reduce spills and restores, FastISel treats calls as
sub-block boundaries, flushing the "local value map" at each call.

This patch treats the mem* intrinsics as if they were calls, because
at O0 generally they are calls.  Eliminating these spills/restores is
actually better for debugging (especially a "continue at this line"
command), code size, stack frame size, and maybe even performance.

Differential Revision: https://reviews.llvm.org/D90877
2020-11-09 09:45:14 -08:00
David Zarzycki d631e5240c [testing] Add exhaustive ULT/UGT vector CTPOP to AArch64 and PPC
This to help review the impact of https://reviews.llvm.org/D89952 which
allows targets to fine tune what SelectionDAG does when vector CTPOP is
not legal.
2020-11-09 10:34:01 -05:00
Lucas Prates c2c2cc1360 [ARM][AArch64] Adding Neoverse V1 CPU support
Add support for the Neoverse V1 CPU to the ARM and AArch64 backends.

This is based on patches from Mark Murray and Victor Campos.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D90765
2020-11-09 13:15:40 +00:00
Elvina Yakubova 93b99728b1 [AArch64] Add pipeline model for HiSilicon's TSV110
This patch adds the scheduling and cost model for TSV110.

Reviewed by: SjoerdMeijer, bryanpkc

Differential Revision: https://reviews.llvm.org/D89972
2020-11-07 01:23:00 +03:00
Amara Emerson f347d78cca [AArch64][GlobalISel] Add AArch64::G_DUPLANE[X] opcodes for lane duplicates.
These were previously handled by pattern matching shuffles in the selector, but
adding a new opcode and making it equivalent to the AArch64duplane SDAG node
allows us to select more patterns, like lane indexed FMLAs (patch adding a test
for that will be committed later).

The pattern matching code has been simply moved to postlegalize lowering.

Differential Revision: https://reviews.llvm.org/D90820
2020-11-05 11:18:11 -08:00
Cameron McInally c126eb7529 [SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL
Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247.

Differential Revision: https://reviews.llvm.org/D90644
2020-11-04 14:20:31 -06:00
Kerry McLaughlin f2412d372d [SVE][CodeGen] Lower scalable integer vector reductions
This patch uses the existing LowerFixedLengthReductionToSVE function to also lower
scalable vector reductions. A separate function has been added to lower VECREDUCE_AND
& VECREDUCE_OR operations with predicate types using ptest.

Lowering scalable floating-point reductions will be addressed in a follow up patch,
for now these will hit the assertion added to expandVecReduce() in TargetLowering.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D89382
2020-11-04 11:38:49 +00:00
Michael Liao 4b11201592 [MachineInstr] Add support for instructions with multiple memory operands.
- Basically iterate each pair of memory operands from both instructions
  and return true if any of them may alias.
- The exception are memory instructions without any memory operand. They
  may touch everything and could alias to any memory instruction.

Differential Revision: https://reviews.llvm.org/D89447
2020-11-03 20:44:40 -05:00
Amara Emerson 393b55380a [AArch64][GlobalISel] Add combine for G_EXTRACT_VECTOR_ELT to allow selection of pairwise FADD.
For the <2 x float> case, instead of adding another combine or legalization to
get it into a <4 x float> form, I'm just adding a GISel specific selection
pattern to cover it.

Differential Revision: https://reviews.llvm.org/D90699
2020-11-03 17:25:14 -08:00
Hans Wennborg cbf25fbed5 Revert "[CodeGen] [WinException] Only produce handler data at the end of the function if needed"
This caused an explosion in ICF times during linking on Windows when libfuzzer
instrumentation is enabled. For a small binary we see ICF time go from ~0 to
~10 s. For a large binary it goes from ~1 s to forevert (I gave up after 30
minutes).

See comment on the code review.

> If we are going to write handler data (that is written as variable
> length data following after the unwind info in .xdata), we need to
> emit the handler data immediately, but for cases where no such
> info is going to be written, skip emitting it right away. (Unwind
> info for all remaining functions that hasn't gotten it emitted
> directly is emitted at the end.)
>
> This does slightly change the ordering of sections (triggering a
> bunch of updates to DebugInfo/COFF tests), but the change should be
> benign.
>
> This also matches GCC's assembly output, which doesn't output
> .seh_handlerdata unless it actually is needed.
>
> For ARM64, the unwind info can be packed into the runtime function
> entry itself (leaving no data in the .xdata section at all), but
> that can only be done if there's no follow-on data in the .xdata
> section. If emission of the unwind info is triggered via
> EmitWinEHHandlerData (or the .seh_handlerdata directive), which
> implicitly switches to the .xdata section, there's a chance of the
> caller wanting to pass further data there, so the packed format
> can't be used in that case.
>
> Differential Revision: https://reviews.llvm.org/D87448

This reverts commit 36c64af9d7.
2020-11-03 13:12:10 +01:00
Sander de Smalen ba10c514c9 [AArch64][SVE] NFC: Guard all SVE tests for TypeSize warnings.
This patch adds a bunch of CHECK lines to guard against implicit
conversions of TypeSize -> uint64_t occuring in code-paths that previously
were safe for scalable vectors.
2020-11-03 11:29:36 +00:00
Nicholas Guy 54d8627852 [AArch64] Redundant masks in downcast long multiply
Adds patterns to catch masks preceeding a long multiply,
and generating a single umull/smull instruction instead.

Differential revision: https://reviews.llvm.org/D89956
2020-11-03 10:12:28 +00:00
QingShan Zhang 1d178d600a [Scheduling] Fall back to the fast cluster algorithm if the DAG is too complex
We have added a new load/store cluster algorithm in D85517. However, AArch64 see
some compiling deg with the new algorithm as the IsReachable() is not cheap if
the DAG is complex. O(M+N) See https://bugs.llvm.org/show_bug.cgi?id=47966
So, this patch added a heuristic to switch to old cluster algorithm if the DAG is too complex.

Reviewed By: Owen Anderson

Differential Revision: https://reviews.llvm.org/D90144
2020-11-02 02:11:52 +00:00