Commit Graph

36786 Commits

Author SHA1 Message Date
Craig Topper a1ae3c6ac9 [RISCV][LegalizeDAG] Expand SETO and SETUO comparisons. Teach LegalizeDAG to expand SETUO expansion when UNE isn't legal.
If SETUNE isn't legal, UO can use the NOT of the SETO expansion.

Removes some complex isel patterns. Most of the test changes are
from using XORI instead of SEQZ.

Differential Revision: https://reviews.llvm.org/D92008
2020-12-10 09:15:52 -08:00
Kerry McLaughlin abe7775f5a [SVE][CodeGen] Extend index of masked gathers
This patch changes performMSCATTERCombine to also promote the indices of
masked gathers where the element type is i8 or i16, and adds various tests
for gathers with illegal types.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91433
2020-12-10 13:54:45 +00:00
Kazushi (Jam) Marukawa 4b1e329255 [VE] Add vector reduce intrinsic instructions
Add vrmax, vrmin, vfrmax, vfrmin, vrand, vror, and vrxor intrinsic
instructions and regression tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D92941
2020-12-10 22:21:17 +09:00
David Green 0447f3508f [ARM][RegAlloc] Add t2LoopEndDec
We currently have problems with the way that low overhead loops are
specified, with LR being spilled between the t2LoopDec and the t2LoopEnd
forcing the entire loop to be reverted late in the backend. As they will
eventually become a single instruction, this patch introduces a
t2LoopEndDec which is the combination of the two, combined before
registry allocation to make sure this does not fail.

Unfortunately this instruction is a terminator that produces a value
(and also branches - it only produces the value around the branching
edge). So this needs some adjustment to phi elimination and the register
allocator to make sure that we do not spill this LR def around the loop
(needing to put a spill after the terminator). We treat the loop very
carefully, making sure that there is nothing else like calls that would
break it's ability to use LR. For that, this adds a
isUnspillableTerminator to opt in the new behaviour.

There is a chance that this could cause problems, and so I have added an
escape option incase. But I have not seen any problems in the testing
that I've tried, and not reverting Low overhead loops is important for
our performance. If this does work then we can hopefully do the same for
t2WhileLoopStart and t2DoLoopStart instructions.

This patch also contains the code needed to convert or revert the
t2LoopEndDec in the backend (which just needs a subs; bne) and the code
pre-ra to create them.

Differential Revision: https://reviews.llvm.org/D91358
2020-12-10 12:14:23 +00:00
Mirko Brkusanin 0c7cce54eb [AMDGPU] Resolve issues when picking between ds_read/write and ds_read2/write2
Both ds_read_b128 and ds_read2_b64 are valid for 128bit 16-byte aligned
loads but the one that will be selected is determined either by the order in
tablegen or by the AddedComplexity attribute. Currently ds_read_b128 has
priority.

While ds_read2_b64 has lower alignment requirements, we cannot always
restrict ds_read_b128 to 16-byte alignment because of unaligned-access-mode
option. This was causing ds_read_b128 to be selected for 8-byte aligned
loads regardles of chosen access mode.

To resolve this we use two patterns for selecting ds_read_b128. One
requires alignment of 16-byte and the other requires
unaligned-access-mode option.

Same goes for ds_write2_b64 and ds_write_b128.

Differential Revision: https://reviews.llvm.org/D92767
2020-12-10 12:40:49 +01:00
David Green 5abbf20f0f [ARM] Additional test for Min loop. NFC 2020-12-10 10:49:00 +00:00
David Green b0ce615b2d [ARM] Remove copies from low overhead phi inductions.
The phi created in a low overhead loop gets created with a default
register class it seems. There are then copied inserted between the low
overhead loop pseudo instructions (which produce/consume GPRlr
instructions) and the phi holding the induction. This patch removes
those as a step towards attempting to make t2LoopDec and t2LoopEnd a
single instruction, and appears useful in it's own right as shown in the
tests.

Differential Revision: https://reviews.llvm.org/D91267
2020-12-10 10:30:31 +00:00
David Green eec5b99901 [ARM] MVE vcreate tests, for dual lane moves. NFC 2020-12-10 09:17:34 +00:00
Luo, Yuanke f80b29878b [X86] AMX programming model.
This patch implements amx programming model that discussed in llvm-dev
 (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html).
 Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet.
 This patch implemeted 7 components.

1. The c interface to end user.
2. The AMX intrinsics in LLVM IR.
3. Transform load/store <256 x i32> to AMX intrinsics or split the
   type into two <128 x i32>.
4. The Lowering from AMX intrinsics to AMX pseudo instruction.
5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx
   intruction.
6. The register allocation for tile register.
7. Morph AMX pseudo instruction to AMX real instruction.

Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0

Differential Revision: https://reviews.llvm.org/D87981
2020-12-10 17:01:54 +08:00
Kazushi (Jam) Marukawa e954ba28bc [VE][NFC] Disable VP tests
VP tests recently added don't work on Release mode.  They work on
Debug mode, so I disable them on Release mode to make tests work.
2020-12-10 15:13:05 +09:00
Stanislav Mekhanoshin 4617cc68f6 [AMDGPU] Fix expansion of 192 bit spills in PEI
Differential Revision: https://reviews.llvm.org/D92979
2020-12-09 16:36:29 -08:00
Krzysztof Parzyszek f5d07a05bb [Hexagon] Realign HVX vectors wherever possible
Introduce HexagonVectorCombine as a helper class for vector-related
optimizations.
2020-12-09 17:11:25 -06:00
Craig Topper 5ff5cf8e05 [X86] Use APInt::isSignedIntN instead of isIntN for 64-bit ANDs in X86DAGToDAGISel::IsProfitableToFold
Pretty sure we meant to be checking signed 32 immediates here
rather than unsigned 32 bit. I suspect I messed this up because
in MathExtras.h we have isIntN and isUIntN so isIntN differs in
signedness depending on whether you're using APInt or plain integers.

This fixes a case where we didn't fold a constant created
by shrinkAndImmediate. Since shrinkAndImmediate doesn't topologically
sort constants it creates, we can fail to convert the Constant
to a TargetConstant. This leads to very strange behavior later.

Fixes PR48458.
2020-12-09 13:39:07 -08:00
Mircea Trofin 55ea639d3c [NFC] Removed unused prefixes in llvm/test/CodeGen/AArch64
Differential Revision: https://reviews.llvm.org/D92943
2020-12-09 12:47:51 -08:00
Florian Hahn 77fd12a66e
[AArch64] Add aarch64_neon_vcmla{_rot{90,180,270}} intrinsics.
Add builtins required to implement vcmla and rotated variants from
the ACLE

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D92929
2020-12-09 19:46:49 +00:00
Michael Munday e28b6a60bc [RISCV][NFC] Regenerate RISCV CodeGen tests
Regenerated using:

./llvm/utils/update_llc_test_checks.py -u llvm/test/CodeGen/RISCV/*.ll

This has added comments to spill-related instructions and added @plt to
some symbols.

Differential Revision: https://reviews.llvm.org/D92841
2020-12-09 19:42:49 +00:00
Kazushi (Jam) Marukawa 1a2147fead [VE] Add vsum and vfsum intrinsic instructions
Add vsum and vfsum intrinsic instructions and regression tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D92938
2020-12-10 01:11:53 +09:00
Kazushi (Jam) Marukawa 398f29fbb0 [VE] Add vfmk intrinsic instructions
Add vfmk intrinsic instructions, a few pseudo instructions to expand
vfmk intrinsic using VM512 correctly, and regression tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D92758
2020-12-10 00:08:20 +09:00
Simon Pilgrim 24184dbb82 [X86] Fold CONCAT(VPERMV3(X,Y,M0),VPERMV3(Z,W,M1)) -> VPERMV3(CONCAT(X,Z),CONCAT(Y,W),CONCAT(M0,M1))
Further prep work toward supporting different subvector sizes in combineX86ShufflesRecursively
2020-12-09 14:29:32 +00:00
Kerry McLaughlin 05edfc5475 [SVE][CodeGen] Add DAG combines for s/zext_masked_gather
This patch adds the following DAGCombines, which apply if isVectorLoadExtDesirable() returns true:
 - fold (and (masked_gather x)) -> (zext_masked_gather x)
 - fold (sext_inreg (masked_gather x)) -> (sext_masked_gather x)

LowerMGATHER has also been updated to fetch the LoadExtType associated with the
gather and also use this value to determine the correct masked gather opcode to use.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92230
2020-12-09 11:53:19 +00:00
Joe Ellis 80c33de2d3 [SelectionDAG] Add llvm.vector.{extract,insert} intrinsics
This commit adds two new intrinsics.

- llvm.experimental.vector.insert: used to insert a vector into another
  vector starting at a given index.

- llvm.experimental.vector.extract: used to extract a subvector from a
  larger vector starting from a given index.

The codegen work for these intrinsics has already been completed; this
commit is simply exposing the existing ISD nodes to LLVM IR.

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D91362
2020-12-09 11:08:41 +00:00
Simon Moll 3ffbc79357 [VP] Build VP SDNodes
Translate VP intrinsics to VP_* SDNodes.  The tests check whether a
matching vp_* SDNode is emitted.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D91441
2020-12-09 11:36:51 +01:00
Tim Northover 45de42116e AArch64: use correct operand for ubsantrap immediate.
I accidentally pushed the wrong patch originally.
2020-12-09 10:17:16 +00:00
David Green 384383e15c [ARM] Common inverse constant predicates to VPNOT
This scans through blocks looking for constants used as predicates in
MVE instructions. When two constants are found which are the inverse of
one another, the second can be replaced by a VPNOT of the first,
potentially allowing that not to be folded away into an else predicate
of a vpt block.

Differential Revision: https://reviews.llvm.org/D92470
2020-12-09 07:56:44 +00:00
David Green 8254d70a38 [ARM] Constant Mask VPT block tests. NFC 2020-12-09 07:44:49 +00:00
Jinsong Ji 02b2c02419 [PowerPC] Precommit testcases for regpressure compute fix 2020-12-09 03:37:00 +00:00
Sam Clegg 1b6d879ec1 [WebAssembly] Fix code generated for atomic operations in PIC mode
The main this this test does is to add the `IsNotPIC` predicate to the
all the atomic instructions pattern that directly refer to
`tglobaladdr`.

This is because in PIC mode we need to generate separate instruction
sequence (either a direct global.get, or __memory_base + offset) for
accessing global addresses.

As part of this change I noticed that many of the `Requires` attributes
added to the instruction in `WebAssemblyInstrAtomics.td` were being
honored.  This is because the wrapped in a `let Predicates =
[HasAtomics]` block and it seems that that outer wrapping overrides any
`Requires` on defs within it.   As a workaround I removed the outer
`let` and added `HasAtomics` to all the inner `Requires`.  I believe
that all the instrucitons that don't have `Requires` explicit bottom out
in `ATOMIC_I` and `ATOMIC_NRI` which have `HasAtomics` so this should
not remove this predicate from any patterns (at least that is the idea).

The alternative to this approach looks like implementing something
like `PredicateControl` in `Mips.td` where we can split the predicates
into groups so they don't clobber each other.

Differential Revision: https://reviews.llvm.org/D92744
2020-12-08 18:41:32 -08:00
Chen Zheng 66a03d1022 [PowerPC] prepare more dq form for P10 pair load/store
Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D92393
2020-12-08 21:01:40 -05:00
Ilya Leoshkevich d58f112ce0 Prevent FENTRY_CALL reordering
FEntryInserter prepends FENTRY_CALL to the first basic block. In case
there are other instructions, PostRA Machine Instruction Scheduler can
move FENTRY_CALL call around. This actually occurs on SystemZ (see the
testcase). This is bad for the following reasons:

* FENTRY_CALL clobbers registers.
* Linux Kernel depends on whatever FENTRY_CALL expands to to be the very
  first instruction in the function.

Fix by adding isCall attribute to FENTRY_CALL, which prevents reordering
by making it a scheduling boundary for PostRA Machine Instruction
Scheduler.

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D91218
2020-12-09 00:59:01 +01:00
Jessica Paquette 40d1fb2229 [AArch64][GlobalISel] Swap select operands when inverting condition code
This was not obvious when reading the imported tablegen patterns in
AArch64GenDAGISel.

Update select-select.mir.
2020-12-08 14:17:26 -08:00
Anna Thomas 29356e3279 [ScalarizeMaskedMemIntrin] Add new PM support
This patch adds new PM support for the pass and the pass can be now used
during middle-end transforms. The old pass is remamed to
ScalarizeMaskedMemIntrinLegacyPass.

Reviewed-By: skatkov, aeubanks
Differential Revision: https://reviews.llvm.org/D92743
2020-12-08 17:15:22 -05:00
Jessica Paquette 21308c2b4c [AArch64][GlobalISel] Check if G_SELECT has been optimized when folding binops
`TryFoldBinOpIntoSelect` didn't have a check for `Optimized`, meaning you could
end up folding twice. (e.g. a select with a G_ADD on the true side, and a G_SUB
on the false side)

Add in the missing `if` and a test.
2020-12-08 13:47:08 -08:00
Kazushi (Jam) Marukawa 95ea50e4ad [VE] Correct LVLGen (LVL instruction insert pass)
SX Aurora VE uses an intermediate representation similar to VP as its MIR.
VE itself uses invidiual VL register as its own vector length register at
the hardware level.  So, LLVM needs to insert load VL (LVL) instruction just
before vector instructions if the value of VL is changed.  This LVLGen pass
generates LVL instructions for such purpose.  Previously, a bug is pointed
out in D91416.  This patch correct this bug and add a regression test.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D92716
2020-12-09 06:33:53 +09:00
Harald van Dijk 29c8ea6f1a
[X86] Handle localdynamic TLS model in x32 mode
D92346 added TLS_(base_)addrX32 to handle TLS in x32 mode, but missed the
different TLS models. This diff fixes the logic for the local dynamic model
where `RAX` was used when `EAX` should be, and extends the tests to cover
all four TLS models.

Fixes https://bugs.llvm.org/show_bug.cgi?id=26472.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92737
2020-12-08 21:06:00 +00:00
Austin Kerbow 4aa842a800 [AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing
It is possible for copies or spills to be inserted in the middle of indirect
addressing sequences which use VGPR indexing. Spills to accvgprs could be
effected by the indexing mode.

Add new pseudo instructions that are expanded after register allocation to avoid
the problematic spill or copy placement.

Differential Revision: https://reviews.llvm.org/D91048
2020-12-08 12:24:12 -08:00
Craig Topper 98bca0a605 [RISCV] Add isel patterns for SBCLRI/SBSETI/SBINVI(W) instruction
We can use these instructions for single bit immediates that are too large for ANDI/ORI/CLRI.

The _10 test cases are to make sure that we still use ANDI/ORI/CLRI for small immediates.

Differential Revision: https://reviews.llvm.org/D92262
2020-12-08 12:22:40 -08:00
Jessica Paquette 5b5d3fa9d9 [AArch64][GlobalISel] Fold G_SELECT cc, %t, (G_ADD %x, 1) -> CSINC %t, %x, cc
This implements

```
G_SELECT cc, %true, (G_ADD %x, 1) -> CSINC %true, %x, cc
G_SELECT cc, (G_ADD %x, 1), %false -> CSINC %x, %false, inv_cc
```

Godbolt example: https://godbolt.org/z/eoPqKq

Differential Revision: https://reviews.llvm.org/D92868
2020-12-08 10:53:37 -08:00
Jessica Paquette cd9a52b99e [AArch64][GlobalISel] Fold binops on the true side of G_SELECT
This implements the following folds:

```
G_SELECT cc, (G_SUB 0, %x), %false -> CSNEG %x, %false, inv_cc
G_SELECT cc, (G_XOR x, -1), %false -> CSINV %x, %false, inv_cc
```

This is similar to the folds introduced in
5bc0bd05e6.

In 5bc0bd05e6 I mentioned that we may prefer to do
this in AArch64PostLegalizerLowering.

I think that it's probably better to do this in the selector. The way we select
G_SELECT depends on what register banks end up being assigned to it. If we did
this in AArch64PostLegalizerLowering, then we'd end up checking *every* G_SELECT
to see if it's worth swapping operands. Doing it in the selector allows us to
restrict the optimization to only relevant G_SELECTs.

Also fix up some comments in `TryFoldBinOpIntoSelect` which are kind of
confusing IMO.

Example IR: https://godbolt.org/z/3qPGca

Differential Revision: https://reviews.llvm.org/D92860
2020-12-08 10:42:59 -08:00
Jessica Paquette ce199667f6 [AArch64][GlobalISel] Don't explicitly write to the zero register in emitCMN
This case was missed in 78ccb0359d.

Differential Revision: https://reviews.llvm.org/D92438
2020-12-08 10:42:05 -08:00
Huihui Zhang 8e6fc1f97e [AArch64][SVE] Add lowering for llvm.maxnum|minnum for scalable type.
LLVM intrinsic llvm.maxnum|minnum is overloaded intrinsic, can be used on any
floating-point or vector of floating-point type.
This patch extends current infrastructure to support scalable vector type.

This patch also fix a warning message of incorrect use of EVT::getVectorNumElements()
for scalable type, when DAGCombiner trying to split scalable vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92607
2020-12-08 09:35:53 -08:00
Stefan Pintilie 2812c15156 [PowerPC] Fix missing nop after call to weak callee.
Weak functions can be replaced by other functions at link time. Previously it
was assumed that no matter what the weak callee function was replaced with it
would still share the same TOC as the caller. This is no longer true as a weak
callee with a TOC setup can be replaced by another function that was compiled
with PC Relative and does not have a TOC at all.

This patch makes sure that all calls to functions defined as weak from a caller
that has a valid TOC have a nop after the call to allow a place for the linker
to restore the TOC.

Reviewed By: NeHuang

Differential Revision: https://reviews.llvm.org/D91983
2020-12-08 09:38:44 -06:00
Simon Pilgrim 45878ede7e [X86] Regenerate vector-shift-*.ll tests
Replace X32 check prefixes with X86 - X32 is generally used for gnux triple tests
2020-12-08 15:36:50 +00:00
Simon Pilgrim e18f8d63bd [X86] Regenerate store-narrow.ll tests
Replace X32 check prefixes with X86 - X32 is generally used for gnux triple tests
2020-12-08 15:36:49 +00:00
Simon Pilgrim 0785f12e6e [X86] Regenerate bmi-intrinsics-fast-isel.ll tests
Replace X32 check prefixes with X86 - X32 is generally used for gnux triple tests
2020-12-08 15:36:48 +00:00
Simon Pilgrim 3204282a98 [X86] Regenerate addcarry2.ll tests
Replace X32 check prefixes with X86 - X32 is generally used for gnux triple tests
2020-12-08 15:36:48 +00:00
Simon Pilgrim dcff846f4d [X86] Regenerate sttni.ll tests
Replace X32 check prefixes with X86 - X32 is generally used for gnux triple tests
2020-12-08 15:36:47 +00:00
Simon Pilgrim bbc5c4bf40 [X86] Regenerate clzero.ll tests
Replace X32 check prefixes with X86 - X32 is generally used for gnux triple tests
2020-12-08 15:36:47 +00:00
David Green 03e675fd12 [ARM] Turn pred_cast(xor(x, -1)) into xor(pred_cast(x), -1)
This folds a not (an xor -1) though a predicate_cast, so that it can be
turned into a VPNOT and potentially be folded away as an else predicate
inside a VPT block.

Differential Revision: https://reviews.llvm.org/D92235
2020-12-08 15:22:46 +00:00
Jay Foad 03663e4130 [AMDGPU] Add occupancy level tests for GFX10.3. NFC.
getMaxWavesPerEU and getVGPRAllocGranule both changed in GFX10.3 and
they both affect the occupancy calculation.

Differential Revision: https://reviews.llvm.org/D92839
2020-12-08 14:15:01 +00:00
David Green 91fb9eac0b [ARM] Remove dead instructions before creating VPT block bundles
We remove VPNOT instructions in VPT blocks as we create them, turning
them into else predicates. We don't remove the dead instructions until
after the block has been created though. Because the VPNOT will have
killed the vpr register it used, this makes finalizeBundle add internal
flags to the vpr uses of any instructions after the VPNOT. These
incorrect flags can then confuse what is alive and what is not, leading
to machine verifier problems.

This patch removes them earlier instead, before the bundle is finalized
so that kill flags remain valid.

Differential Revision: https://reviews.llvm.org/D92227
2020-12-08 14:05:07 +00:00