Commit Graph

61012 Commits

Author SHA1 Message Date
Hsiangkai Wang 97e33feb08 [RISCV] Implement vloxseg/vluxseg intrinsics.
Define vloxseg/vluxseg intrinsics and pseudo instructions.
Lower vloxseg/vluxseg intrinsics to pseudo instructions in RISCVDAGToDAGISel.

Differential Revision: https://reviews.llvm.org/D94903
2021-01-23 08:54:56 +08:00
Stanislav Mekhanoshin ca904b81e6 [AMDGPU] Fix FP materialization/resolve with flat scratch
Differential Revision: https://reviews.llvm.org/D95266
2021-01-22 16:06:47 -08:00
Stanislav Mekhanoshin 607bec0bb9 Change materializeFrameBaseRegister() to return register
The only caller of this function is in the LocalStackSlotAllocation
and it creates base register of class returned by the target's
getPointerRegClass(). AMDGPU wants to use a different reg class
here so let materializeFrameBaseRegister to just create and return
whatever it wants.

Differential Revision: https://reviews.llvm.org/D95268
2021-01-22 15:51:06 -08:00
Craig Topper d65e8ee507 [RISCV] Add more cmov isel patterns to handle seteq/ne with a small non-zero immediate.
Similar to our free standing setcc patterns, we can use ADDI to
subtract the immediate from the other operand. Then the cmov
can check if the result is zero or non-zero.

Reviewed By: mundaym

Differential Revision: https://reviews.llvm.org/D95169
2021-01-22 14:51:22 -08:00
Mitch Phillips 19ec559c66 Revert "[AArch64][GlobalISel] Make G_USUBO legal and select it."
This reverts commit 3dedad475d.

Broke UBSan on Android:
http://lab.llvm.org:8011/#/builders/77/builds/3082

More details at: https://reviews.llvm.org/D95032
2021-01-22 14:32:12 -08:00
Craig Topper 095e245e16 [RISCV] Add isel patterns for SH*ADD(.UW)
This adds an initial set of patterns for these instructions. Its
more complicated that I would like for the sh*add.uw instructions
because there is no guaranteed canonicalization for shl/and with
constants.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D95106
2021-01-22 13:28:41 -08:00
Craig Topper 20f2e32d2c [RISCV] Update B extension version to 0.93.
Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D95002
2021-01-22 12:49:10 -08:00
Craig Topper f25f7e8ecd [RISCV] Add xperm.* instructions to Zbp extension.
Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D94999
2021-01-22 12:49:10 -08:00
Craig Topper 4d5aa760a7 [RISCV] Add support for rev8 and orc.b to Zbb.
These instructions use a portion of the encodings for grevi and
gorci. The full encodings are only supported with Zbp. Note,
rev8 has a different encoding between rv32 and rv64.

Zbb is closer to being finalized that Zbp which has motivated
some decisions in this patch.

I'm treating rev8 and orc.b as separate instructions when
either Zbb or Zbp is enabled. This allows us to print to suggest
that either feature needs to be enabled to support these mnemonics.
I had tried to put HasStdExtZbbAndNotZbp on the Zbb instructions,
but that caused a diagnostic that said Zbp is required if neither
feature is enabled. We should really mention Zbb since its closer
to final.

This does require extra isel patterns for the different cases so
that bswap will always print as rev8 in assembly listing since
we can't use an InstAlias.

llvm-objdump disassembling should always pick the rev8 or orc.b
instructions. llvm-mc parsing and printing text will not convert
the grevi/gorci spellings to rev8/gorc.b. We could probably fix
this with a special case in processInstruction in the assembly
parser if it its important.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D94944
2021-01-22 12:49:10 -08:00
Craig Topper 3c94cee63b [RISCV] Add zext.h instruction to Zbb.
zext.h uses the same encoding as pack rd, rs, x0 in rv32 and
packw rd, rs, x0 in rv64. Encodings without x0 as the second source
are not valid in Zbb.

I've added two new instructions with these specific encodings with
predicates that enable them when either Zbb or Zbp is enabled.

The pack spelling will only be accepted with Zbp. The disassembler
will use the zext.h instruction when either feature is enabled.

Using the pack spelling will print as pack when llvm-mc is
emitting text. We could fix this with some custom code in
processInstruction if this is important, but I'm not sure it is.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D94818
2021-01-22 12:49:10 -08:00
Craig Topper 83c92fdeda [RISCV] Move pack instructions to Zbp extension only.
Zext.h will need to come back to Zbb, but that only uses specific
encodings of pack.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D94742
2021-01-22 12:49:10 -08:00
Craig Topper 5ae92f1e11 [RISCV] Change zext.w to be an alias of add.uw rd, rs1, x0 instead of pack.
This didn't make it into the published 0.93 spec, but it was the
intention.

But it is in the tex source as of this commit
d172f029c0

This means zext.w now requires Zba. Not sure if we should still use
pack if Zbp is enabled and Zba isn't. I'll leave that for the future
when pack is closer to being final.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D94736
2021-01-22 12:49:10 -08:00
Craig Topper 9d499e037e [RISCV] Modify add.uw patterns to put the masked operand in rs1 to match 0.93 bitmanip spec.
The 0.93 spec has this implementation for add.uw

uint_xlen_t adduw(uint_xlen_t rs1, uint_xlen_t rs2) {
  uint_xlen_t rs1u = (uint32_t)rs1;
  return rs1u + rs2;
}

The 0.92 spec had the usages of rs1 and rs2 swapped.

Reviewed By: frasercrmck, asb

Differential Revision: https://reviews.llvm.org/D95090
2021-01-22 12:49:10 -08:00
Craig Topper efbcd66861 [RISCV] Rename Zbs instructions to start with just 'b' instead of 'sb' to match 0.93 bitmanip spec.
Also renamed Zbe instructions to resolve name conflict even though
that change is in the 0.94 draft.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D94653
2021-01-22 12:49:10 -08:00
Craig Topper 1355458ef6 [RISCV] Move Shift Ones instructions from Zbb to Zbp to match 0.93 bitmanip spec.
It's not really clear in the spec that these are in Zbp now, but
that's what I've gather from previous commits to the spec. I've
file an issue to get it documented properly.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D94652
2021-01-22 12:49:10 -08:00
Craig Topper 83a93ae63b [RISCV] Add SH*ADD(.UW) instructions to Zba extension based on 0.93 bitmanip spec.
Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D94637
2021-01-22 12:49:10 -08:00
Craig Topper 4e6ad11bc6 [RISCV] Add Zba feature and move add.uw and slli.uw to it.
Still need to add SH*ADD instructions.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D94617
2021-01-22 12:49:10 -08:00
Craig Topper b825278364 [RISCV] Rename mnemonics slliu.w->slli.uw and addu.w->add.uw to match 0.93 bitmanip spec.
Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D94582
2021-01-22 12:49:10 -08:00
Craig Topper d985c7321f [RISCV] Swap encodings of max and minu to match 0.93 bitmanip spec.
Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D94580
2021-01-22 12:49:10 -08:00
Craig Topper b2f859500f [RISCV] Remove addiwu, addwu, subwu, subuw, clmulw, clmulrw, clmulhw to match 0.93 bitmanip spec.
Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D94577
2021-01-22 12:49:10 -08:00
Craig Topper 6aced6bf39 [RISCV] Rename pcnt->cpop to match 0.93 bitmanip spec.
This is the first of multiple patches to bring our 0.92
implementation up to 0.93.

Reviewed By: asb, frasercrmck

Differential Revision: https://reviews.llvm.org/D94568
2021-01-22 12:49:10 -08:00
Arthur Eubanks 42d682a217 [NewPM][AMDGPU] Skip adding CGSCCOptimizerLate callbacks at O0
The legacy PM's EP_CGSCCOptimizerLate was only used under not-O0.

Fixes clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp under the new PM.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D95250
2021-01-22 12:29:39 -08:00
Yaxun (Sam) Liu 622eaa4a4c [HIP] Support __managed__ attribute
This patch implements codegen for __managed__ variable attribute for HIP.

Diagnostics will be added later.

Differential Revision: https://reviews.llvm.org/D94814
2021-01-22 11:43:58 -05:00
Simon Pilgrim bd122f6d21 [X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle vperm2x128(movddup(x),movddup(y)) cases
Fold vperm2x128(movddup(x),movddup(y)) -> movddup(vperm2x128(x,y))
2021-01-22 16:05:19 +00:00
Simon Pilgrim c33d36e066 [X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle unary vperm2x128(permute/shift(x,c),undef) cases
Fold vperm2x128(permute/shift(x,c),undef) -> permute/shift(vperm2x128(x,undef),c)
2021-01-22 15:47:23 +00:00
Simon Pilgrim 4846f6ab81 [X86][AVX] combineTargetShuffle - simplify the X86ISD::VPERM2X128 subvector matching
Simplify vperm2x128(concat(X,Y),concat(Z,W)) folding.

Use collectConcatOps / ISD::INSERT_SUBVECTOR to find the source subvectors instead of hardcoded immediate matching.
2021-01-22 15:47:22 +00:00
David Green af03324984 [ARM] Disable sign extended SSAT pattern recognition.
I may have given bad advice, and skipping sext_inreg when matching SSAT
patterns is not valid on it's own. It at least needs to sext_inreg the
input again, but as far as I can tell is still only valid based on
demanded bits. For the moment disable that part of the combine,
hopefully reimplementing it in the future more correctly.
2021-01-22 14:07:48 +00:00
Simon Pilgrim b1166e1317 [X86][AVX] combineX86ShufflesRecursively - attempt to constant fold before widening shuffle inputs
combineX86ShufflesConstants/canonicalizeShuffleMaskWithHorizOp can both handle/earlyout shuffles with inputs of different widths, so delay widening as late as possible to make it easier to match constant folds etc.

The plan is to eventually move the widening inside combineX86ShuffleChain so that we don't create any new nodes unless we successfully combine the shuffles.
2021-01-22 13:19:35 +00:00
Simon Pilgrim ffe72f987f [X86][SSE] Don't fold shuffle(binop(),binop()) -> binop(shuffle(),shuffle()) if the shuffle are splats
rGbe69e66b1cd8 added the fold, but DAGCombiner.visitVECTOR_SHUFFLE doesn't merge shuffles if the inner shuffle is a splat, so we need to bail.

The non-fast-horiz-ops paths see some minor regressions, we might be able to improve on this after lowering to target shuffles.

Fix PR48823
2021-01-22 11:31:38 +00:00
David Green 9ae73cdbc1 [ARM] Adjust isSaturatingConditional to return a new SDValue. NFC
This replaces the isSaturatingConditional function with
LowerSaturatingConditional that directly returns a new SSAT or
USAT SDValue, instead of returning true and the components of it.
2021-01-22 11:11:36 +00:00
Sebastian Neubauer 8214982b50 [AMDGPU] Implement mir parseCustomPseudoSourceValue
Allow parsing generated mir with custom pseudo source value tokens.
Also rename pseudo source values to have more meaningful names.

Relands ba7dcd8542, which had memory leaks.

Differential Revision: https://reviews.llvm.org/D95215
2021-01-22 11:24:08 +01:00
David Sherwood 2e080eb00a [SVE] Add support for scalable vectorization of loops with selects and cmps
I have removed an unnecessary assert in LoopVectorizationCostModel::getInstructionCost
that prevented a cost being calculated for select instructions when using
scalable vectors. In addition, I have changed AArch64TTIImpl::getCmpSelInstrCost
to only do special cost calculations for fixed width vectors and fall
back to the base version for scalable vectors.

I have added a simple cost model test for cmps and selects:

  test/Analysis/CostModel/sve-cmpsel.ll

and some simple tests that show we vectorize loops with cmp and select:

  test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll

Differential Revision: https://reviews.llvm.org/D95039
2021-01-22 09:48:13 +00:00
Christudasan Devadasan ff8a1cae18 [AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses.
During instruction selection, there is an inconsistency in choosing
the initial soffset value. With certain early passes, this value is
getting modified and that brought additional fixup during
eliminateFrameIndex to work for all cases. This whole transformation
looks trivial and can be handled better.

This patch clearly defines the initial value for soffset and keeps it
unchanged before eliminateFrameIndex. The initial value must be zero
for MUBUF with a frame index. The non-frame index MUBUF forms that
use a raw offset from SP will have the stack register for soffset.
During frame elimination, the soffset remains zero for entry functions
with zero dynamic allocas and no callsites, or else is updated to the
appropriate frame/stack register.

Also, did some code clean up and made all asserts around soffset
stricter to match.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D95071
2021-01-22 14:20:59 +05:30
Arthur Eubanks a11bf9a7fb [AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook
Having a custom inliner doesn't really fit in with the new PM's
pipeline. It's also extra technical debt.

amdgpu-inline only does a couple of custom things compared to the normal
inliner:
1) It disables inlining if the number of BBs in a function would exceed
   some limit
2) It increases the threshold if there are pointers to private arrays(?)

These can all be handled as TTI inliner hooks.
There already exists a hook for backends to multiply the inlining
threshold.

This way we can remove the custom amdgpu-inline pass.

This caused inline-hint.ll to fail, and after some investigation, it
looks like getInliningThresholdMultiplier() was previously getting
applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it
not applying at all, so some later inliner change must have fixed
something), so I had to change the threshold in the test.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D94153
2021-01-21 20:29:17 -08:00
Kazu Hirata cfa241680f [llvm] Don't include StringSwitch.h where unnecessary (NFC) 2021-01-21 19:59:48 -08:00
Hsiangkai Wang 5d354220d4 [RISCV] Correct DWARF number for vector registers.
The DWARF numbers of vector registers are already defined in
riscv-elf-psabi. The DWARF number for vector is start from 96.
Correct the DWARF numbers of vector registers.

Differential Revision: https://reviews.llvm.org/D94749
2021-01-22 11:33:42 +08:00
Craig Topper f8f1b20e6b [RISCV] Don't create LMUL=8 pseudo instructions for ternary widening arithmetic instructions
These instructions produce 2*SEW result so the input can't have
an LMUL=8 or the result would need a non-existant LMUL=16. So
only create pseudos for LMUL up to 4.

Differential Revision: https://reviews.llvm.org/D95189
2021-01-21 19:29:02 -08:00
Cassie Jones 3dedad475d [AArch64][GlobalISel] Make G_USUBO legal and select it.
The expansion for wide subtractions includes G_USUBO.

Differential Revision: https://reviews.llvm.org/D95032
2021-01-21 18:53:33 -08:00
ShihPo Hung 9667750331 [RISCV] Add intrinsics for RVV1.0 VFRSQRTE7 & VFRECE7
Reviewed By: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D95113
2021-01-21 18:38:49 -08:00
ShihPo Hung 976cf53cc7 [RISCV] Add intrinsics for vector unordered indexed load in RVV 1.0
Add unordered indexed load: vluxei

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D95028
2021-01-21 18:38:49 -08:00
ShihPo Hung bea661d9a5 [RISCV] Add intrinsics for RVV 1.0 vrgatherei16
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D95014
2021-01-21 18:38:49 -08:00
Qiu Chaofan 449f2f7140 [PowerPC] Duplicate inherited heuristic from base scheduler
PowerPC has its custom scheduler heuristic. It calls parent classes'
tryCandidate in override version, but the function returns void, so this
way doesn't actually help. This patch duplicates code from base scheduler
into PPC machine scheduler class, which does what we wanted.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D94464
2021-01-22 10:11:03 +08:00
Craig Topper 3b5430eb0d [RISCV] Add a VL output to vleff intrinsics.
The fault-only-first-load instructions can reduce VL if an element
other than element 0 triggers a memory fault. This can be used to
vectorize loops with data dependent exit conditions like strcmp or
strlen.

This patch adds a VL output to these intrinsics so that the new
VL value can be captured by software. This will be expanded to
'csrr gpr, vl' after the vleff instruction during SelectionDAG.

By doing this with one intrinsic we are able to guarantee that the
csrr reads the VL value produced by the vleff instruction. Having
it as a separate intrinsic would make it impossible to guarantee
ordering without making every other vector intrinsic have side
effects.

The intrinsics are expanded during lowering into two ISD nodes
that are glued together. These ISD nodes will go
through isel separately, but should maintain the glue so that they
get emitted adjacently by InstrEmitter.

I've only ran the chain through the vleff instruction, allowing
the READ_VL to be deleted if it is unused.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D94286
2021-01-21 17:19:58 -08:00
Hsiangkai Wang 6e360460f1 [RISCV] Use v8-v23 as argument registers to conform to the proposal.
The maximum LMUL is 8. We need 16 vector registers for two LMUL-8
arguments. The modification follows the proposal of psABI in
https://github.com/riscv/riscv-elf-psabi-doc/pull/171

Differential Revision: https://reviews.llvm.org/D95134
2021-01-22 07:55:24 +08:00
Hsiangkai Wang b7ab6726b6 [RISCV] New vector load/store in V extension v1.0
Upgrade RISC-V V extension to v1.0-08a0b46.
Indexed load/store have ordered and unordered form.
New whole vector load/store.

Differential Revision: https://reviews.llvm.org/D93614
2021-01-22 07:30:09 +08:00
David Green 39db5753f9 [LV][ARM] Inloop reduction cost modelling
This adds cost modelling for the inloop vectorization added in
745bf6cf44. Up until now they have been modelled as the original
underlying instruction, usually an add. This happens to works OK for MVE
with instructions that are reducing into the same type as they are
working on. But MVE's instructions can perform the equivalent of an
extended MLA as a single instruction:

  %sa = sext <16 x i8> A to <16 x i32>
  %sb = sext <16 x i8> B to <16 x i32>
  %m = mul <16 x i32> %sa, %sb
  %r = vecreduce.add(%m)
  ->
  R = VMLADAV A, B

There are other instructions for performing add reductions of
v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64
(VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV).
The i64 are particularly interesting as there are no native i64 add/mul
instructions, leading to the i64 add and mul naturally getting very
high costs.

Also worth mentioning, under NEON there is the concept of a sdot/udot
instruction which performs a partial reduction from a v16i8 to a v4i32.
They extend and mul/sum the first four elements from the inputs into the
first element of the output, repeating for each of the four output
lanes. They could possibly be represented in the same way as above in
llvm, so long as a vecreduce.add could perform a partial reduction. The
vectorizer would then produce a combination of in and outer loop
reductions to efficiently use the sdot and udot instructions. Although
this patch does not do that yet, it does suggest that separating the
input reduction type from the produced result type is a useful concept
to model. It also shows that a MLA reduction as a single instruction is
fairly common.

This patch attempt to improve the costmodelling of in-loop reductions
by:
 - Adding some pattern matching in the loop vectorizer cost model to
   match extended reduction patterns that are optionally extended and/or
   MLA patterns. This marks the cost of the reduction instruction correctly
   and the sext/zext/mul leading up to it as free, which is otherwise
   difficult to tell and may get a very high cost. (In the long run this
   can hopefully be replaced by vplan producing a single node and costing
   it correctly, but that is not yet something that vplan can do).
 - getExtendedAddReductionCost is added to query the cost of these
   extended reduction patterns.
 - Expanded the ARM costs to account for these expanded sizes, which is a
   fairly simple change in itself.
 - Some minor alterations to allow inloop reduction larger than the highest
   vector width and i64 MVE reductions.
 - An extra InLoopReductionImmediateChains map was added to the vectorizer
   for it to efficiently detect which instructions are reductions in the
   cost model.
 - The tests have some updates to show what I believe is optimal
   vectorization and where we are now.

Put together this can greatly improve performance for reduction loop
under MVE.

Differential Revision: https://reviews.llvm.org/D93476
2021-01-21 21:03:41 +00:00
Michael Munday 4ab0f51a75 Recommit "[RISCV] Legalize select when Zbt extension available"
This recommits 71ed4b6ce5 with
the polarity of some of the pattern corrected.

Original commit message:
The custom expansion of select operations in the RISC-V backend
interferes with the matching of cmov instructions. Legalizing
select when the Zbt extension is available solves that problem.

Reviewed By: luismarques, craig.topper

Differential Revision: https://reviews.llvm.org/D93767
2021-01-21 12:07:44 -08:00
Duncan P. N. Exon Smith f2fd41d789 X86: Fix use-after-realloc in X86AsmParser::ParseIntelExpression
`X86AsmParser::ParseIntelExpression` has a while loop. In the body,
calls to MCAsmLexer::UnLex can force a reallocation in the MCAsmLexer's
`CurToken` SmallVector, invalidating saved references to
`MCAsmLexer::getTok()`.

`const MCAsmToken &Tok` is such a saved reference, and this moves it
from outside the while loop to inside the body, fixing a
use-after-realloc.

`Tok` will still be reused across calls to `Lex()`, each of which
effectively destroys and constructs the pointed-to token. I'm a bit
skeptical of this usage pattern, but it seems broadly used in the
X86AsmParser (and others) so I'm leaving it alone (for now).

Somehow this bug was exposed by https://reviews.llvm.org/D94739,
resulting in test failures in dot-operator related tests in
llvm/test/tools/llvm-ml. I suspect the exposure path is related to
optimizer changes from splitting up the grow operation, but I haven't
dug all the way in. Regardless, there are already tests in tree that
cover this; they might fail consistently if we added ASan
instrumentation to SmallVector.

Differential Revision: https://reviews.llvm.org/D95112
2021-01-21 11:24:35 -08:00
Hsiangkai Wang b8921af63b [RISCV] Update V instructions constraints to conform to v1.0
Upgrade RISC-V V extension to v1.0-08a0b46.
Update instruction constraints to conform to v1.0.

Differential Revision: https://reviews.llvm.org/D93612
2021-01-22 01:15:55 +08:00
Sebastian Neubauer 4dbdff66fe Revert "[AMDGPU] Implement mir parseCustomPseudoSourceValue"
This reverts commit ba7dcd8542.

(caused memory leaks)
2021-01-21 18:11:48 +01:00