Commit Graph

37149 Commits

Author SHA1 Message Date
Simon Pilgrim 4214ca9614 [X86][AVX] Attempt to fold vpermf128(op(x,i),op(y,i)) -> op(vpermf128(x,y),i)
If vpermf128/vpermi128 is acting on 2 similar 'inlane' ops, then try to perform the vpermf128 first which will allow us to merge the ops.

This will help us fix one of the regressions in D56387
2021-01-11 16:59:25 +00:00
Paul Robinson c161775dec [FastISel] Flush local value map on every instruction
Local values are constants or addresses that can't be folded into
the instruction that uses them. FastISel materializes these in a
"local value" area that always dominates the current insertion
point, to try to avoid materializing these values more than once
(per block).

https://reviews.llvm.org/D43093 added code to sink these local
value instructions to their first use, which has two beneficial
effects. One, it is likely to avoid some unnecessary spills and
reloads; two, it allows us to attach the debug location of the
user to the local value instruction. The latter effect can
improve the debugging experience for debuggers with a "set next
statement" feature, such as the Visual Studio debugger and PS4
debugger, because instructions to set up constants for a given
statement will be associated with the appropriate source line.

There are also some constants (primarily addresses) that could be
produced by no-op casts or GEP instructions; the main difference
from "local value" instructions is that these are values from
separate IR instructions, and therefore could have multiple users
across multiple basic blocks. D43093 avoided sinking these, even
though they were emitted to the same "local value" area as the
other instructions. The patch comment for D43093 states:

  Local values may also be used by no-op casts, which adds the
  register to the RegFixups table. Without reversing the RegFixups
  map direction, we don't have enough information to sink these
  instructions.

This patch undoes most of D43093, and instead flushes the local
value map after(*) every IR instruction, using that instruction's
debug location. This avoids sometimes incorrect locations used
previously, and emits instructions in a more natural order.

In addition, constants materialized due to PHI instructions are
not assigned a debug location immediately; instead, when the
local value map is flushed, if the first local value instruction
has no debug location, it is given the same location as the
first non-local-value-map instruction.  This prevents PHIs
from introducing unattributed instructions, which would either
be implicitly attributed to the location for the preceding IR
instruction, or given line 0 if they are at the beginning of
a machine basic block.  Neither of those consequences is good
for debugging.

This does mean materialized values are not re-used across IR
instruction boundaries; however, only about 5% of those values
were reused in an experimental self-build of clang.

(*) Actually, just prior to the next instruction. It seems like
it would be cleaner the other way, but I was having trouble
getting that to work.

This reapplies commits cf1c774d and dc35368c, and adds the
modification to PHI handling, which should avoid problems
with debugging under gdb.

Differential Revision: https://reviews.llvm.org/D91734
2021-01-11 08:32:36 -08:00
Paul Robinson e5eb5c8a7f NFC: Use -LABEL more
There were a number of tests needing updates for D91734, and I added a
bunch of LABEL directives to help track down where those had to go.
These directives are an improvement independent of the functional
patch, so I'm committing them as their own separate patch.
2021-01-11 08:14:58 -08:00
Simon Pilgrim a0f82749f4 [X86] Extend lzcnt-cmp tests to test on non-lzcnt targets 2021-01-11 15:27:08 +00:00
Simon Pilgrim a46982a255 [X86] Add nounwind to lzcnt-cmp tests
Remove unnecessary cfi markup
2021-01-11 15:06:38 +00:00
Joe Ellis 007358239d [DAGCombiner] Use getVectorElementCount inside visitINSERT_SUBVECTOR
This avoids TypeSize-/ElementCount-related warnings.

Differential Revision: https://reviews.llvm.org/D92747
2021-01-11 14:15:11 +00:00
Jay Foad 6dcf9207df [AMDGPU] Fix a urem combine test to test what it was supposed to 2021-01-11 13:32:34 +00:00
Simon Pilgrim 8112a2598c [X86][SSE] Add 'vectorized sum' test patterns
These are often generated when building a vector from the reduction sums of independent vectors.

I've implemented some typical patterns from various v4f32/v4i32 based off current codegen emitted from the vectorizers, although these tests are more about tweaking some hadd style backend folds to handle whatever the vectorizers/vectorcombine throws at us...
2021-01-11 12:51:18 +00:00
Kazushi (Jam) Marukawa d02de13932 [VE] Support additional VMRGW and VMV intrinsic instructions
Support missing VMRGW and VMV intrinsic instructions and add regression
tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D94300
2021-01-11 20:50:31 +09:00
Kazushi (Jam) Marukawa b72ca79982 [VE] Support intrinsic to isnert/extract_subreg of v512i1
Support insert/extract_subreg intrinsic instructions for v512i1
registers and add regression tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D94298
2021-01-11 20:40:10 +09:00
Simon Pilgrim 5963229266 [X86][SSE] Add missing SSE test coverage for permute(hop,hop) folds
Should help avoid bugs like reported in rG80dee7965dff
2021-01-11 11:29:04 +00:00
Simon Pilgrim 41bf338dd1 Revert rGd43a264a5dd3 "Revert "[X86][SSE] Fold unpack(hop(),hop()) -> permute(hop())""
This reapplies commit rG80dee7965dffdfb866afa9d74f3a4a97453708b2.

[X86][SSE] Fold unpack(hop(),hop()) -> permute(hop())

UNPCKL/UNPCKH only uses one op from each hop, so we can merge the hops and then permute the result.

REAPPLIED with a fix for unary unpacks of HOP.
2021-01-11 11:29:04 +00:00
Kerry McLaughlin c37f68a888 [SVE][CodeGen] Fix legalisation of floating-point masked gathers
Changes in this patch:
- When lowering floating-point masked gathers, cast the result of the
  gather back to the original type with reinterpret_cast before returning.
- Added patterns for reinterpret_casts from integer to floating point, and
  concat_vector patterns for bfloat16.
- Tests for various legalisation scenarios with floating point types.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D94171
2021-01-11 10:57:46 +00:00
Luo, Yuanke c5be0e0cc0 [X86] Fix tile register spill issue.
The tile register spill need 2 instructions.
%46:gr64_nosp = MOV64ri 64
TILESTORED %stack.2, 1, killed %46:gr64_nosp, 0, $noreg, %43:tile
The first instruction load the stride to a GPR, and the second
instruction store tile register to stack slot. The optimization of merge
spill instruction is done after register allocation. And spill tile
register need create a new virtual register to for stride, so we can't
hoist tile spill instruction in postOptimization() of register
allocation. We can't hoist TILESTORED alone and we can't hoist the 2
instuctions together because MOV64ri will clobber some GPR. This patch
is to disble the spill merge for any spill which need 2 instructions.

Differential Revision: https://reviews.llvm.org/D93898
2021-01-11 18:35:09 +08:00
QingShan Zhang 7539c75bb4 [DAGCombine] Remove the check for unsafe-fp-math when we are checking the AFN
We are checking the unsafe-fp-math for sqrt but not for fpow, which behaves inconsistent.
As the direction is to remove this global option, we need to remove the unsafe-fp-math
check for sqrt and update the test with afn fast-math flags.

Reviewed By: Spatel

Differential Revision: https://reviews.llvm.org/D93891
2021-01-11 02:25:53 +00:00
Nico Weber d43a264a5d Revert "[X86][SSE] Fold unpack(hop(),hop()) -> permute(hop())"
This reverts commit 80dee7965d.
Makes clang sometimes hang forever. See
https://bugs.chromium.org/p/chromium/issues/detail?id=1164786#c6 for a
stand-alone repro.
2021-01-10 20:22:53 -05:00
Fraser Cormack b02eab9058 [RISCV] Add scalable vector icmp ISel patterns
Original patch by @rogfer01.

The RVV integer comparison instructions are defined in such a way that
many LLVM operations are defined by using the "opposite" comparison
instruction and swapping the operands. This is done in this patch in
most cases, except for the mappings where the immediate range must be
adjusted to accomodate:

    va < i --> vmsle{u}.vi vd, va, i-1, vm
    va >= i --> vmsgt{u}.vi vd, va, i-1, vm

That is left for future optimization; this patch supports all operations
but in the case of the missing mappings the immediate will be moved to
a scalar register first.

Since there are so many condition codes and operand cases to check, it
was decided to reduce the test burden by only testing the "vscale x 8"
vector types.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94168
2021-01-09 20:54:34 +00:00
Fraser Cormack 41d06095b0 [SelectionDAG] Teach isConstOrConstSplat about ISD::SPLAT_VECTOR
This improves llvm::isConstOrConstSplat by allowing it to analyze
ISD::SPLAT_VECTOR nodes, in order to allow more constant-folding of
operations using scalable vector types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94168
2021-01-09 20:54:34 +00:00
Mircea Trofin 75c04327a5 [NFC] Disallow unused prefixes in CodeGen/X86 tests.
Also fixed remaining tests that featured unused prefixes.

Differential Revision: https://reviews.llvm.org/D94330
2021-01-09 11:43:32 -08:00
Roger Ferrer Ibanez 524d8fa9a5 [RISCV] Do not grow the stack a second time when we need to realign the stack
This is a first change needed to fix a crash in which the emergency
spill splot ends being out of reach. This happens when we run the
register scavenger after we have eliminated the frame indexes. The fix
for the actual crash will come in a later change.

This change removes an extra stack size increase we do in
RISCVFrameLowering::determineFrameLayout.

We don't have to change the size of the stack here as
PEI::calculateFrameObjectOffsets is already doing this with the right
size accounting the extra alignment.

Differential Revision: https://reviews.llvm.org/D89237
2021-01-09 16:51:09 +00:00
Heejin Ahn 4e4df1e38d [WebAssembly] Remove unreachable EH pads
This removes unreachable EH pads in LateEHPrepare. This is not for
optimization but for preparation for CFGStackify. In CFGStackify, we
determine where to place `try` marker by computing the nearest common
dominator of all predecessors of an EH pad, but when an EH pad does not
have a predecessor, it becomes tricky. We can insert an empty dummy BB
before the EH pad and place the `try` there, but removing unreachable EH
pads is simpler.

This moves an existing exception label test from eh-label.mir to
exception.mir and adds a new test there.

This also adds some comments to existing methods.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D94044
2021-01-09 03:42:38 -08:00
Fraser Cormack 2c442629f0 [RISCV] Add tests for scalable constant-folding (NFC) 2021-01-09 11:31:22 +00:00
Heejin Ahn 0d8dfbb42a [WebAssembly] Update InstPrinter support for EH
- Updates InstPrinter to handle `catch_all`.
- Makes `rethrow` condition an early exit from the function to make the
  rest simpler.
- Unify label and catch counters. They don't need to be counted
  separately and this will help `delegate` instruction later.
- Removes `LastSeenEHInst` field. This was first introduced to handle
  when there are more than one `catch` blocks per `try`, but this was
  not implemented correctly and not being used at the moment anyway.
- Reenables all tests in cfg-stackify-eh.ll that don't deal with unwind
  destination mismatches, which will be handled in a later CL.

Reviewed By: dschuff, tlively, aardappel

Differential Revision: https://reviews.llvm.org/D94043
2021-01-09 02:42:35 -08:00
Heejin Ahn 52e240a072 [WebAssembly] Remove exnref and br_on_exn
This removes `exnref` type and `br_on_exn` instruction. This is
effectively NFC because most uses of these were already removed in the
previous CLs.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D94041
2021-01-09 02:02:54 -08:00
Heejin Ahn 9e4eadeb13 [WebAssembly] Update basic EH instructions for the new spec
This implements basic instructions for the new spec.

- Adds new versions of instructions: `catch`, `catch_all`, and `rethrow`
- Adds support for instruction selection for the new instructions
 - `catch` needs a custom routine for the same reason `throw` needs one,
   to encode `__cpp_exception` tag symbol.
- Updates `WebAssembly::isCatch` utility function to include `catch_all`
  and Change code that compares an instruction's opcode with `catch` to
  use that function.
- LateEHPrepare
  - Previously in LateEHPrepare we added `catch` instruction to both
    `catchpad`s (for user catches) and `cleanuppad`s (for destructors).
    In the new version `catch` is generated from `llvm.catch` intrinsic
    in instruction selection phase, so we only need to add `catch_all`
    to the beginning of cleanup pads.
  - `catch` is generated from instruction selection, but we need to
    hoist the `catch` instruction to the beginning of every EH pad,
    because `catch` can be in the middle of the EH pad or even in a
    split BB from it after various code transformations.
  - Removes `addExceptionExtraction` function, which was used to
    generate `br_on_exn` before.
- CFGStackfiy: Deletes `fixUnwindMismatches` function. Running this
  function on the new instruction causes crashes, and the new version
  will be added in a later CL, whose contents will be completely
  different. So deleting the whole function will make the diff easier to
  read.
- Reenables all disabled tests in exception.ll and eh-lsda.ll and a
  single basic test in cfg-stackify-eh.ll.
- Updates existing tests to use the new assembly format. And deletes
  `br_on_exn` instructions from the tests and FileCheck lines.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D94040
2021-01-09 01:48:06 -08:00
Heejin Ahn 9724c3cff4 [WebAssembly] Update WasmEHPrepare for the new spec
Clang generates `wasm.get.exception` and `wasm.get.ehselector`
intrinsics, which respectively return a caught exception value (a
pointer to some C++ exception struct) and a selector (an integer value
that tells which C++ `catch` clause the current exception matches, or
does not match any).

WasmEHPrepare is a pass that does some IR-level preparation before
instruction selection. Previously one of things we did in this pass was
to convert `wasm.get.exception` intrinsic calls to
`wasm.extract.exception` intrinsics. Their semantics were the same
except `wasm.extract.exception` did not have a token argument. We
maintained these two separate intrinsics with the same semantics because
instruction selection couldn't handle token arguments. This
`wasm.extract.exception` intrinsic was later converted to
`extract_exception` instruction in instruction selection, which was a
pseudo instruction to implement `br_on_exn`. Because `br_on_exn` pushed
an extracted value onto the value stack after the `end` instruction of a
`block`, but LLVM does not have a way of modeling that kind of behavior,
so this pseudo instruction was used to pull an extracted value out of
thin air, like this:
```
block $l0
  ...
  br_on_exn $cpp_exception $l0
  ...
end
extract_exception ;; pushes values onto the stack
```

In the new spec, we don't need this pseudo instruction anymore because
`catch` itself returns a value and we don't have `br_on_exn` anymore. In
the spec `catch` returns multiple values (like `br_on_exn`), but here we
assume it only returns a single i32, which is sufficient to support C++.

So this renames `wasm.get.exception` intrinsic to `wasm.catch`. Because
this CL does not yet contain instruction selection for `wasm.catch`
intrinsic, all `RUN` lines in exception.ll, eh-lsda.ll, and
cfg-stackify-eh.ll, and a single `RUN` line in wasm-eh.cpp (which is an
end-to-end test from C++ source to assembly) fail. So this CL
temporarily disables those `RUN` lines, and for those test files without
any valid remaining `RUN` lines, adds a dummy `RUN` line to make them
pass. These tests will be reenabled in later CLs.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D94039
2021-01-08 23:38:26 -08:00
Ben Shi 55f0a1b066 [RISCV] Optimize multiplication with constant
1. Break MUL with specific constant to a SLLI and an ADD/SUB on riscv32
   with the M extension.
2. Break MUL with specific constant to two SLLI and an ADD/SUB, if the
   constant needs a pair of LUI/ADDI to construct.

Reviewed by: craig.topper

Differential Revision: https://reviews.llvm.org/D93619
2021-01-09 10:37:21 +08:00
Tony 2f499b9aff [AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at
system scope for the address spaces accessed. This will ensure all
relevant caches will be bypassed.

A volatile atomic is not changed and still only bypasses caches upto
the level specified by the SyncScope operand.

Differential Revision: https://reviews.llvm.org/D94214
2021-01-09 00:52:33 +00:00
Mircea Trofin a8bda3df42 [NFC] Disallow unused prefixes in CodeGen/AMDGPU
This adds the lit config, and cleans up remaining tests.

Differential Revision: https://reviews.llvm.org/D94245
2021-01-08 11:49:23 -08:00
David Green 024af42c60 [ARM] Custom lower i1 vector truncates
The ISel patterns we have for truncating to i1's under MVE do not seem
to be correct. Instead custom lower to icmp(ne, and(x, 1), 0).

Differential Revision: https://reviews.llvm.org/D94226
2021-01-08 18:21:00 +00:00
Simon Pilgrim 80dee7965d [X86][SSE] Fold unpack(hop(),hop()) -> permute(hop())
UNPCKL/UNPCKH only uses one op from each hop, so we can merge the hops and then permute the result.
2021-01-08 15:22:17 +00:00
Heejin Ahn 7be271537e [WebAssembly] Rename wasm_rethrow_in_catch intrinsic/builtin
`wasm_rethrow_in_catch` intrinsic and builtin are used in order to
rethrow an exception when the exception is caught but there is no
matching clause within the current `catch`. For example,
```
try {
  foo();
} catch (int n) {
  ...
}
```
If the caught exception does not correspond to C++ `int` type, it should
be rethrown. These intrinsic/builtin were renamed `rethrow_in_catch`
because at the time I thought there would be another intrinsic for C++'s
`throw` keyword, which rethrows an exception. It turned out that `throw`
keyword doesn't require wasm's `rethrow` instruction, so we rename
`rethrow_in_catch` to just `rethrow` here.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D94038
2021-01-08 06:55:04 -08:00
David Green 1ae762469f [ARM] Update and regenerate test checks. NFC 2021-01-08 14:54:16 +00:00
Simon Pilgrim 4a582d766a [X86][SSE] Add vphaddd/vphsubd unpack(hop(),hop()) tests 2021-01-08 14:39:37 +00:00
Simon Pilgrim 7b9f541c1e [X86][SSE] Add tests for unpack(hop(),hop())
We should be able to convert these to permute(hop()) as we only ever use one of the ops from each hop.
2021-01-08 14:11:37 +00:00
Kazushi (Jam) Marukawa 5ead757f1d [VE] Support pack_f32p and pack_f32a intrinsic instructions
Support pack_f32p and pack_f32a intrinsic instructions and regression tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D94296
2021-01-08 22:59:11 +09:00
Heejin Ahn d012430eee [WebAssembly] Change label numbers to variables in test
cfg-stackify-eh.ll contains many `CHECK` lines specifying label / catch
comments with numbers. These numbers are subject to change every time
any block/loop/try is added in the middle in existing functions or any
other function is added in the middle of the file, generating a large
number of lines in diffs. This change converts them to variables so they
can be more resistent to future changes.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D94037
2021-01-08 05:49:59 -08:00
Simon Moll 611d3c63f3 [VP] ISD helper functions [VE] isel for vp_add, vp_and
This implements vp_add, vp_and for the VE target by lowering them to the
VVP_* layer. We also add helper functions for VP SDNodes (isVPSDNode,
getVPMaskIdx, getVPExplicitVectorLengthIdx).

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D93766
2021-01-08 14:29:45 +01:00
Nicholas Guy ed23229a64 [AArch64] Fix crash caused by invalid vector element type
Fixes a crash caused by D91255, when LLVMTy is null when
calling changeExtendedVectorElementType.

Differential Revision: https://reviews.llvm.org/D94234
2021-01-08 12:02:54 +00:00
Simon Moll eeba70a463 [VE] Expand single-element BUILD_VECTOR to INSERT_VECTOR_ELT
We do this mostly to be able to test the insert_vector_elt isel
patterns. As long as we don't, most single element insertions show up as
`BUILD_VECTOR` in the backend.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D93759
2021-01-08 11:48:01 +01:00
Simon Moll d1b606f897 [VE] Extract & insert vector element isel
Isel and tests for extract_vector_elt and insert_vector_elt.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D93687
2021-01-08 11:46:59 +01:00
Qiu Chaofan 6175fcf01f [NFC] Update some PPC tests marked as auto-generated
Update CodeGen regression tests with marker at first line telling it's
auto-generated by the script, under PowerPC directory. For some reason,
these tests are generated but manually written, which makes things
unclear when someone's change affecting them.

However, some tests only show simple change after re-generated, like
extra blank lines, disappearing '.localentry', etc. Besides, some tests
are generated but added checks for debug output. This commit doesn't try
updating them.
2021-01-08 17:59:13 +08:00
Kazushi (Jam) Marukawa 12167632bc [VE] Add SVOB intrinsic instruction
Add SVOB intrinsic instruction and a regression test.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D94279
2021-01-08 18:49:17 +09:00
David Sherwood d1bf26fd94 [AArch64][SVE] Add lowering for llvm abs intrinsic
Add functionality to permit lowering of the abs and neg intrinsics
using the passthru variants.

Differential Revision: https://reviews.llvm.org/D94160
2021-01-08 08:55:25 +00:00
Christudasan Devadasan ae25a397e9 AMDGPU/GlobalISel: Enable sret demotion 2021-01-08 10:56:35 +05:30
Evandro Menezes 946bc50e4c [RISCV] Define the vfsqrt RVV intrinsics
Define the `vfsqrt` IR intrinsics for the respective V instructions.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Evandro Menezes <evandro.menezes@sifive.com>

Differential Revision: https://reviews.llvm.org/D93745
2021-01-07 17:29:29 -06:00
Arthur Eubanks 9ccf13c36d [NewPM][NVPTX] Port NVPTX opt passes
There are only two used in the IR optimization pipeline.
Port these and add them to the default pipeline.

Similar to https://reviews.llvm.org/D93863.

I added -mtriple to some tests since under the new PM, the passes are
only available when the TargetMachine is specified.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D93930
2021-01-07 15:12:35 -08:00
Matt Arsenault 2cbbc6e87c GlobalISel: Fail legalization on narrowing extload below memory size 2021-01-07 17:40:34 -05:00
Matt Arsenault 1f9b6ef91f GlobalISel: Add combine for G_UREM by power of 2
Really I want this in the legalizer, but this is a start.
2021-01-07 16:36:35 -05:00
Wouter van Oortmerssen 5c38ae36c5 [WebAssembly] Fixed byval args missing DWARF DW_AT_LOCATION
A struct in C passed by value did not get debug information. Such values are currently
lowered to a Wasm local even in -O0 (not to an alloca like on other archs), which becomes
a Target Index operand (TI_LOCAL). The DWARF writing code was not emitting locations
in for TI's specifically if the location is a single range (not a list).

In addition, the ExplicitLocals pass which removes the ARGUMENT pseudo instructions did
not update the associated DBG_VALUEs, and couldn't even find these values since the code
assumed such instructions are adjacent, which is not the case here.

Also fixed asm printing of TIs needed by a test.

Differential Revision: https://reviews.llvm.org/D94140
2021-01-07 10:31:38 -08:00