Commit Graph

38632 Commits

Author SHA1 Message Date
Baptiste Saleil caf1294d95 [AMDGPU] Experiments show that the GCNRegBankReassign pass significantly impacts
the compilation time and there is no case for which we see any improvement in
performance. This patch removes this pass and its associated test cases from
the tree.

Differential Revision: https://reviews.llvm.org/D101313

Change-Id: I0599169a7609c19a887f8d847a71e664030cc141
2021-04-26 17:21:49 -04:00
Craig Topper e2cd92cb9b [RISCV] Match splatted load to scalar load + splat. Form strided load during isel.
This modifies my previous patch to push the strided load formation
to isel. This gives us opportunity to fold the splat into a .vx
operation first. Using a scalar register and a .vx operation reduces
vector register pressure which can be important for larger LMULs.

If we can't fold the splat into a .vx operation, then it can make
sense to use a strided load to free up the vector arithmetic
ALU to do actual arithmetic rather than tying it up with vmv.v.x.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D101138
2021-04-26 13:32:03 -07:00
Sebastian Neubauer 9579af2bd7 [AMDGPU] Fix autogenerated wwm-reserved-spill.ll
Due to a bug in update_llc_test_checks.py, the test is wrongly
coalesced between run lines. Remove common check prefix to fix that.
NFC.
2021-04-26 19:09:09 +02:00
Sebastian Neubauer fcc40d9c17 [AMDGPU] Use MapVector for WWMReservedRegs
Use MapVector instead of SmallDenseMap because it has a deterministic
iteration order.

Differential Revision: https://reviews.llvm.org/D101299
2021-04-26 17:43:00 +02:00
Tim Northover 8705399d01 AArch64: support atomics in GISel 2021-04-26 14:38:06 +01:00
Bradley Smith 2040d20df2 [AArch64][SVE] Add missing patterns for scalar versions of SQSHL/UQSHL
Differential Revision: https://reviews.llvm.org/D101058
2021-04-26 13:07:12 +01:00
David Green 94c7bd7eb2 [ARM] Expand VMOVRRD simplification pattern
This expands the VMOVRRD(extract(..(build_vector(a, b, c, d)))) pattern,
to also handle insert_vectors. Providing we can find the correct insert,
this helps further simplify patterns by removing the redundant VMOVRRD.

Differential Revision: https://reviews.llvm.org/D100245
2021-04-26 12:27:38 +01:00
David Green b1a919d51c [ARM] Additional soft float BE test. NFC 2021-04-26 11:44:10 +01:00
David Green 258e2e9a0b [ARM] Ensure loop invariant active.lane.mask operands
CGP can move instructions like a ptrtoint into a loop, but the
MVETailPredication when converting them will currently assume invariant
trip counts. This tries to ensure the operands are loop invariant, and
bails if not.

Differential Revision: https://reviews.llvm.org/D100550
2021-04-26 10:04:33 +01:00
Ben Shi 60ed86d350 [RISCV] Optimize addition with immediate
Reviewed by: craig.topper

Differential Revision: https://reviews.llvm.org/D101244
2021-04-26 13:26:17 +08:00
Craig Topper 8f5cd49405 [RISCV] Teach DAG combine what bits Zbp instructions demanded from their inputs.
This teaches DAG combine that shift amount operands for grev, gorc
shfl, unshfl only read a few bits.

This also teaches DAG combine that grevw, gorcw, shflw, unshflw,
bcompressw, bdecompressw only consume the lower 32 bits of their
inputs.

In the future we can teach SimplifyDemandedBits to also propagate
demanded bits of the output to the inputs in some cases.
2021-04-25 21:54:06 -07:00
Levy Hsu 8cf54c7ff5 [RISCV] [1/2] Add IR intrinsic for Zbe extension
RV32/64:
bcompress
bdecompress

RV64 ONLY:
bcompressw
bdecompressw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101143
2021-04-25 19:14:34 -07:00
Roman Lebedev 7b312e228c
[NFC][X86][AVX2] Add baseline CodeGen/CostModel tests for interleaved loads/stores of i16 w/ strides 2/3/4
`X86TTIImpl::getInterleavedMemoryOpCostAVX2()` currently contains data
only for a handful of tuples. For now, at least add tests for a few more.

I'm guessing that we care how well the patterns codegen since
we use their presumed cost for vectorization decisions,
so i've added codegen tests too.

There's one really easy caveat for these codegen tests:
for interleaved load tests, we really have to ensure that the
deinterleaved vectors are escaped separately. Similarly for stores.
2021-04-26 01:13:07 +03:00
Simon Pilgrim 535df472b0 Revert rG2149aa73f640c96 "[X86] Add support for reusing ZF etc. from locked XADD instructions (PR20841)"
This might be the cause of some msan build failures - I don't have access to a msan build right now, so this is a speculative revert.
2021-04-25 12:45:07 +01:00
Simon Pilgrim 2149aa73f6 [X86] Add support for reusing ZF etc. from locked XADD instructions (PR20841)
XADD has the same EFLAGS behaviour as ADD
2021-04-25 12:02:33 +01:00
Simon Pilgrim 5dd5859c42 [X86] Add PR20841 test cases showing failure to reuse ZF from XADD ops 2021-04-25 11:50:18 +01:00
Simon Pilgrim 72471c656e [X86] Regenerate atomic-flags.ll test file 2021-04-25 11:50:18 +01:00
Xiang1 Zhang c3f95e9197 [X86] Refine AMX fast register allocation 2021-04-25 14:20:53 +08:00
Xiang1 Zhang 3b8ec86fd5 [X86] Support AMX fast register allocation
Differential Revision: https://reviews.llvm.org/D100026
2021-04-25 09:45:41 +08:00
Dávid Bolvanský ef2dc7ed9f [Analysis] Attribute alignment should not prevent tail call optimization
Fixes tail folding issue mentioned in D100879.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D101230
2021-04-24 19:57:42 +02:00
David Green af342f7240 [AArch64] Enable UseAA globally in the AArch64 backend
This is similar to D69796 from the ARM backend. We remove the UseAA
feature, enabling it globally in the AArch64 backend. This should in
general be an improvement allowing the backend to reorder more
instructions in scheduling and codegen, and enabling it by default helps
to improve the testing of the feature, not making it cpu-specific. A
debugging option is added instead for testing.

Differential Revision: https://reviews.llvm.org/D98781
2021-04-24 17:51:50 +01:00
Michael Kitzan 59f2dd5f1a [MachineCSE] Prevent CSE of non-local convergent instrs
At the moment, MachineCSE allows CSE-ing convergent instrs which are
non-local to each other. This can cause illegal codegen as convergent
instrs are control flow dependent. The patch prevents non-local CSE of
convergent instrs by adding a check in isProfitableToCSE and rejecting
CSE-ing if we're considering CSE-ing non-local convergent instrs. We
can still CSE convergent instrs which are in the same control flow
scope, so the patch purposely does not make all convergent instrs
non-CSE candidates in isCSECandidate.

https://reviews.llvm.org/D101187
2021-04-23 16:44:48 -07:00
Mitch Phillips caea37b37e Revert "[X86][AMX] Try to hoist AMX shapes' def"
This reverts commit 90118563ad.

Reason: Broke the MSan buildbots.
https://lab.llvm.org/buildbot/#/builders/5/builds/6967/steps/9/logs/stdio

More details can be found in the original phabricator review:
https://reviews.llvm.org/D101067
2021-04-23 10:42:26 -07:00
Craig Topper 3064a63b2b [RISCV] Remove GetVRegNoV0 from the output register class of masked compare pseudo instructions.
Theses instructions are allowed to write v0 when they are masked.
We'll still never use v0 because of the earlyclobber constraint so
this doesn't really help anything. It just makes the definitions
correct.

While I was there remove an unused multiclass I noticed.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D101118
2021-04-23 09:33:29 -07:00
Sebastian Neubauer 3366d81153 [AMDGPU] Save WWM registers in functions
The values of registers in inactive lanes needs to be saved during
function calls.

Save all registers used for whole wave mode, similar to how it is done
for VGPRs that are used for SGPR spilling.

Differential Revision: https://reviews.llvm.org/D99429

Reapply with fixed tests on window.
2021-04-23 18:09:24 +02:00
Nemanja Ivanovic 6725b90a02 [PowerPC] Add vec_ctsl and vec_ctul to altivec.h
These are added for compatibility with XLC. They are similar to
vec_cts and vec_ctu except that the result is a doubleword vector
regardless of the parameter type.
2021-04-23 11:03:38 -05:00
Joe Ellis c19c0ad681 [AArch64][SVE] Fix bug in lowering of fixed-length integer vector divides
The function AArch64TargetLowering::LowerFixedLengthVectorIntDivideToSVE
previously assumed the operands were full vectors, but this is not
always true. This function would produce bogus if the division operands
are not full vectors, resulting in miscompiles when dividing 8-bit or
16-bit vectors.

The fix is to perform an extend + div + truncate for non-full vectors,
instead of the usual unpacking and unzipping logic. This is an additive
change which reduces the non-full integer vector divisions to a pattern
recognised by the existing lowering logic.

For future reference, an example of code that would miscompile before
this patch is below:

     1  int8_t foo(unsigned N, int8_t *a, int8_t *b, int8_t *c) {
     2      int8_t result = 0;
     3      for (int i = 0; i < N; ++i) {
     4          result += (a[i] / b[i]) / c[i];
     5      }
     6      return result;
     7  }

Differential Revision: https://reviews.llvm.org/D100370
2021-04-23 14:55:10 +00:00
Jay Foad 5802cbefc1 [AMDGPU] Fix typo in implicit operand lists
Several tests had a typo where they mentioned sgpr17 twice instead of
sgpr17 and sgpr27. This had a significant effect on the
"scavenge_sgpr_pei_no_sgprs" tests because there was actually an sgpr
available, namely sgpr27.

Differential Revision: https://reviews.llvm.org/D100960
2021-04-23 15:44:17 +01:00
Sebastian Neubauer 22d99cb63f Revert "[AMDGPU] Save WWM registers in functions"
This reverts commit 91464c30bf.

Seems to break tests on windows.
2021-04-23 16:38:50 +02:00
Piotr Sobczak 83a3395b30 [AMDGPU][NFC] Update auto-gen test
Most likely the "glc" was not added to the test when
the volatile loads started generating those bits.
2021-04-23 16:33:16 +02:00
Sebastian Neubauer 91464c30bf [AMDGPU] Save WWM registers in functions
The values of registers in inactive lanes needs to be saved during
function calls.

Save all registers used for whole wave mode, similar to how it is done
for VGPRs that are used for SGPR spilling.

Differential Revision: https://reviews.llvm.org/D99429
2021-04-23 16:09:31 +02:00
Simon Pilgrim c2da0cdff5 [X86] Add Win32/64 mulo test coverage
Part of an investigation to solve the windows regressions caused by rG13ec913bdf50
2021-04-23 14:51:42 +01:00
Matt Arsenault b58332774f AMDGPU: Fix assert on inline asm on gfx90a
This was assuming all mayLoad instructions have one def.
2021-04-23 09:00:25 -04:00
Fraser Cormack 83b8f8da82 [RISCV] Custom lower vector F(MIN|MAX)NUM to vf(min|max)
This patch adds support for both scalable- and fixed-length vector code
lowering of the llvm.minnum and llvm.maxnum intrinsics to the equivalent
RVV instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101035
2021-04-23 12:22:15 +01:00
Daniel Kiss b1f463dcae [AArch64] Fix for BTI landing pad insertion with PAC-RET+bkey.
EMITBKEY is emitted for PAC-RET+bkey, which is a non machine instructions.

PR: 49957

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D100996
2021-04-23 10:07:25 +02:00
Wang, Pengfei 90118563ad [X86][AMX] Try to hoist AMX shapes' def
We request no intersections between AMX instructions and their shapes'
def when we insert ldtilecfg. However, this is not always ture resulting
from not only users don't follow AMX API model, but also optimizations.

This patch adds a mechanism that tries to hoist AMX shapes' def as well.
It only hoists shapes inside a BB, we can improve it for cases across
BBs in future. Currently, it only hoists shapes of which all sources' def
above the first AMX instruction. We can improve for the case that only
source that moves an immediate value to a register below AMX instruction.

Differential Revision: https://reviews.llvm.org/D101067
2021-04-23 12:17:00 +08:00
Wang, Pengfei e8bce83996 [X86] Enable compilation of user interrupt handlers.
Add __uintr_frame structure and use UIRET instruction for functions with
x86 interrupt calling convention when UINTR is present.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D99708
2021-04-23 11:43:57 +08:00
Matt Arsenault ed633a1daa AMDGPU: Restore atomic fp feature on FP atomic instruction definitions
9931b1f7a4 switched this to checking for
the two specific subtargets, instead of the dedicated feature. This
broke supporting functions which force added the feature when emitting
targets that do not actually support them. This stil does not work for
the targets that use the gfx6/7 or gfx10 encodings.
2021-04-22 21:32:01 -04:00
Levy Hsu b49337bbb9 [RISCV] [1/2] Add IR intrinsic for Zbp extension
RV32/64:
    grev
    grevi
    gorc
    gorci
    shfl
    shfli
    unshfl
    unshfli

RV64 ONLY:
    grevw
    greviw
    gorcw
    gorciw
    shflw
    shfli     (For non-existing shfliw)
    unshfli   (For non-existing unshfliw)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100830
2021-04-22 16:34:51 -07:00
Heejin Ahn c390621aeb [WebAssembly] Fix fixEndsAtEndOfFunction for delegate
Background:
CFGStackify's [[ 398f253400/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp (L1481-L1540) | fixEndsAtEndOfFunction ]] fixes block/loop/try's return
type when the end of function is unreachable and the function return
type is not void. So if a function returns i32 and `block`-`end` wraps the
whole function, i.e., the `block`'s `end` is the last instruction of the
function, the `block`'s return type should be i32 too:
```
block i32
  ...
end
end_function
```

If there are consecutive `end`s, this signature has to be propagate to
those blocks too, like:
```
block i32
  ...
  block i32
    ...
  end
end
end_function
```

This applies to `try`-`end` too:
```
try i32
  ...
catch
  ...
end
end_function
```

In case of `try`, we not only follow consecutive `end`s but also follow
`catch`, because for the type of the whole `try` to be i32, both `try`
and `catch` parts have to be i32:
```
try i32
  ...
  block i32
    ...
  end
catch
  ...
  block i32
    ...
  end
end
end_function
```

---

Previously we only handled consecutive `end`s or `end` before a `catch`.
But now we have `delegate`, which serves like `end` for
`try`-`delegate`. So we have to follow `delegate` too and mark its
corresponding `try` as i32 (the function's return type):
```
try i32
  ...
catch
  ...
  try i32    ;; Here
    ...
  delegate N
end
end_function
```

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D101036
2021-04-22 15:32:00 -07:00
Heejin Ahn b3e88ccba7 [WebAssembly] Serialize params/results in MachineFunctionInfo
This adds support for YAML serialization of `Params` and `Results`
fields in `WebAssemblyMachineFunctionInfo`. Types are printed as `MVT`'s
string representation. This is for writing MIR tests easier.

The tests added are testing simple parsing and printing of `params` /
`results` fields under `machineFunctionInfo`.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D101029
2021-04-22 15:31:09 -07:00
Craig Topper 5185b52988 [RISCV] Fix crash with fptosi.sat/fptoui.sat intrinsics on RV64. Add test cases.
Add PromoteIntOp_FP_TO_XINT_SAT to type legalize the bit width
operand from i32 to i64 for RV64.

Add test cases for the saturating intrinsics for half/float/double
and i32/i64. CodeGen is definitely not optimal. We can probably
make use of the native behavior of fcvt instructions in many cases.

Fixes PR50083
2021-04-22 15:18:15 -07:00
Krzysztof Parzyszek 06234f758e [Hexagon] Improve lowering of returns of i1
Emit explicit any-extend to avoid weird tstbit sequences.
2021-04-22 16:47:52 -05:00
Krzysztof Parzyszek ab9521aaeb [Hexagon] Use 'vnot' instead of 'not' in patterns with vectors
'not' expands to checking for an xor with a -1 constant. Since
this looks for a ConstantSDNode it will never match for a vector.

Co-authored-by: Craig Topper <craig.topper@sifive.com>

Differential Revision: https://reviews.llvm.org/D100687
2021-04-22 15:36:20 -05:00
David Green c0bf5929ee [AArch64] Improve vector reverse lowering
This improves the lowering of v8i16 and v16i8 vector reverse shuffles.
Instead of going via a generic tbl it uses a rev64; ext pair, as already
happens for v4i32.

Differential Revision: https://reviews.llvm.org/D100882
2021-04-22 21:01:25 +01:00
Craig Topper e01c419ecd [RISCV] Add IR intrinsics for vmsge(u).vv/vx/vi.
These instructions don't really exist, but we have ways we can
emulate them.

.vv will swap operands and use vmsle().vv. .vi will adjust the
immediate and use .vmsgt(u).vi when possible. For .vx we need to
use some of the multiple instruction sequences from the V extension
spec.

For unmasked vmsge(u).vx we use:
  vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd

For cases where mask and maskedoff are the same value then we have
vmsge{u}.vx v0, va, x, v0.t which is the vd==v0 case that
requires a temporary so we use:
  vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt

For other masked cases we use this sequence:
  vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0
We trust that register allocation will prevent vd in vmslt{u}.vx
from being v0 since v0 is still needed by the vmxor.

Differential Revision: https://reviews.llvm.org/D100925
2021-04-22 10:44:38 -07:00
Craig Topper d77d56acfd [RISCV] Add missing tests for vector type for second operand of vmsgt and vmsgtu IR intrinsics.
Refactor to use new multiclass instead of individual patterns.

We already supported this due to SEW=64 on RV32, but we didn't have
test cases for all the types we supported.

Part of D100925
2021-04-22 10:44:38 -07:00
Craig Topper 9524a0553d [RISCV] Support vector type for second operand of vmfge and vmfgt IR intrinsics.
We don't have instructions for these, but can swap the operands
to use vmle/vmflt. This makes the IR interface more consistent and
simplifies the frontend implementation.

Part of D100925
2021-04-22 10:44:38 -07:00
Craig Topper 70254ccb69 [RISCV] Turn splat shuffles of vector loads into strided load with stride of x0.
Implementations are allowed to optimize an x0 stride to perform
less memory accesses. This is the case in SiFive cores.

No idea if this is the case in other implementations. We might
need a tuning flag for this.

Reviewed By: frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D100815
2021-04-22 10:02:57 -07:00
Craig Topper 77f14c96e5 [RISCV] Use stack temporary to splat two GPRs into SEW=64 vector on RV32.
Rather than doing splatting each separately and doing bit manipulation
to merge them in the vector domain, copy the data to the stack
and splat it using a strided load with x0 stride. At least on
some implementations this vector load is optimized to not do
a load for each element.

This is equivalent to how we move i64 to f64 on RV32.

I've only implemented this for the intrinsic fallbacks in this
patch. I think we do similar splatting/shifting/oring in other
places. If this is approved, I'll refactor the others to share
the code.

Differential Revision: https://reviews.llvm.org/D101002
2021-04-22 09:50:07 -07:00