llvm-project

Commit Graph

Author	SHA1	Message	Date
Hsiangkai Wang	97e33feb08	[RISCV] Implement vloxseg/vluxseg intrinsics. Define vloxseg/vluxseg intrinsics and pseudo instructions. Lower vloxseg/vluxseg intrinsics to pseudo instructions in RISCVDAGToDAGISel. Differential Revision: https://reviews.llvm.org/D94903	2021-01-23 08:54:56 +08:00
Stanislav Mekhanoshin	ca904b81e6	[AMDGPU] Fix FP materialization/resolve with flat scratch Differential Revision: https://reviews.llvm.org/D95266	2021-01-22 16:06:47 -08:00
Stanislav Mekhanoshin	607bec0bb9	Change materializeFrameBaseRegister() to return register The only caller of this function is in the LocalStackSlotAllocation and it creates base register of class returned by the target's getPointerRegClass(). AMDGPU wants to use a different reg class here so let materializeFrameBaseRegister to just create and return whatever it wants. Differential Revision: https://reviews.llvm.org/D95268	2021-01-22 15:51:06 -08:00
Craig Topper	d65e8ee507	[RISCV] Add more cmov isel patterns to handle seteq/ne with a small non-zero immediate. Similar to our free standing setcc patterns, we can use ADDI to subtract the immediate from the other operand. Then the cmov can check if the result is zero or non-zero. Reviewed By: mundaym Differential Revision: https://reviews.llvm.org/D95169	2021-01-22 14:51:22 -08:00
Mitch Phillips	19ec559c66	Revert "[AArch64][GlobalISel] Make G_USUBO legal and select it." This reverts commit `3dedad475d`. Broke UBSan on Android: http://lab.llvm.org:8011/#/builders/77/builds/3082 More details at: https://reviews.llvm.org/D95032	2021-01-22 14:32:12 -08:00
Craig Topper	095e245e16	[RISCV] Add isel patterns for SHADD(.UW) This adds an initial set of patterns for these instructions. Its more complicated that I would like for the shadd.uw instructions because there is no guaranteed canonicalization for shl/and with constants. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D95106	2021-01-22 13:28:41 -08:00
Craig Topper	20f2e32d2c	[RISCV] Update B extension version to 0.93. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D95002	2021-01-22 12:49:10 -08:00
Craig Topper	f25f7e8ecd	[RISCV] Add xperm.* instructions to Zbp extension. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94999	2021-01-22 12:49:10 -08:00
Craig Topper	4d5aa760a7	[RISCV] Add support for rev8 and orc.b to Zbb. These instructions use a portion of the encodings for grevi and gorci. The full encodings are only supported with Zbp. Note, rev8 has a different encoding between rv32 and rv64. Zbb is closer to being finalized that Zbp which has motivated some decisions in this patch. I'm treating rev8 and orc.b as separate instructions when either Zbb or Zbp is enabled. This allows us to print to suggest that either feature needs to be enabled to support these mnemonics. I had tried to put HasStdExtZbbAndNotZbp on the Zbb instructions, but that caused a diagnostic that said Zbp is required if neither feature is enabled. We should really mention Zbb since its closer to final. This does require extra isel patterns for the different cases so that bswap will always print as rev8 in assembly listing since we can't use an InstAlias. llvm-objdump disassembling should always pick the rev8 or orc.b instructions. llvm-mc parsing and printing text will not convert the grevi/gorci spellings to rev8/gorc.b. We could probably fix this with a special case in processInstruction in the assembly parser if it its important. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94944	2021-01-22 12:49:10 -08:00
Craig Topper	3c94cee63b	[RISCV] Add zext.h instruction to Zbb. zext.h uses the same encoding as pack rd, rs, x0 in rv32 and packw rd, rs, x0 in rv64. Encodings without x0 as the second source are not valid in Zbb. I've added two new instructions with these specific encodings with predicates that enable them when either Zbb or Zbp is enabled. The pack spelling will only be accepted with Zbp. The disassembler will use the zext.h instruction when either feature is enabled. Using the pack spelling will print as pack when llvm-mc is emitting text. We could fix this with some custom code in processInstruction if this is important, but I'm not sure it is. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94818	2021-01-22 12:49:10 -08:00
Craig Topper	83c92fdeda	[RISCV] Move pack instructions to Zbp extension only. Zext.h will need to come back to Zbb, but that only uses specific encodings of pack. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94742	2021-01-22 12:49:10 -08:00
Craig Topper	5ae92f1e11	[RISCV] Change zext.w to be an alias of add.uw rd, rs1, x0 instead of pack. This didn't make it into the published 0.93 spec, but it was the intention. But it is in the tex source as of this commit `d172f029c0` This means zext.w now requires Zba. Not sure if we should still use pack if Zbp is enabled and Zba isn't. I'll leave that for the future when pack is closer to being final. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94736	2021-01-22 12:49:10 -08:00
Craig Topper	9d499e037e	[RISCV] Modify add.uw patterns to put the masked operand in rs1 to match 0.93 bitmanip spec. The 0.93 spec has this implementation for add.uw uint_xlen_t adduw(uint_xlen_t rs1, uint_xlen_t rs2) { uint_xlen_t rs1u = (uint32_t)rs1; return rs1u + rs2; } The 0.92 spec had the usages of rs1 and rs2 swapped. Reviewed By: frasercrmck, asb Differential Revision: https://reviews.llvm.org/D95090	2021-01-22 12:49:10 -08:00
Craig Topper	efbcd66861	[RISCV] Rename Zbs instructions to start with just 'b' instead of 'sb' to match 0.93 bitmanip spec. Also renamed Zbe instructions to resolve name conflict even though that change is in the 0.94 draft. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94653	2021-01-22 12:49:10 -08:00
Craig Topper	1355458ef6	[RISCV] Move Shift Ones instructions from Zbb to Zbp to match 0.93 bitmanip spec. It's not really clear in the spec that these are in Zbp now, but that's what I've gather from previous commits to the spec. I've file an issue to get it documented properly. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94652	2021-01-22 12:49:10 -08:00
Craig Topper	83a93ae63b	[RISCV] Add SH*ADD(.UW) instructions to Zba extension based on 0.93 bitmanip spec. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94637	2021-01-22 12:49:10 -08:00
Craig Topper	4e6ad11bc6	[RISCV] Add Zba feature and move add.uw and slli.uw to it. Still need to add SH*ADD instructions. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94617	2021-01-22 12:49:10 -08:00
Craig Topper	b825278364	[RISCV] Rename mnemonics slliu.w->slli.uw and addu.w->add.uw to match 0.93 bitmanip spec. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94582	2021-01-22 12:49:10 -08:00
Craig Topper	d985c7321f	[RISCV] Swap encodings of max and minu to match 0.93 bitmanip spec. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94580	2021-01-22 12:49:10 -08:00
Craig Topper	b2f859500f	[RISCV] Remove addiwu, addwu, subwu, subuw, clmulw, clmulrw, clmulhw to match 0.93 bitmanip spec. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94577	2021-01-22 12:49:10 -08:00
Craig Topper	6aced6bf39	[RISCV] Rename pcnt->cpop to match 0.93 bitmanip spec. This is the first of multiple patches to bring our 0.92 implementation up to 0.93. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94568	2021-01-22 12:49:10 -08:00
Arthur Eubanks	42d682a217	[NewPM][AMDGPU] Skip adding CGSCCOptimizerLate callbacks at O0 The legacy PM's EP_CGSCCOptimizerLate was only used under not-O0. Fixes clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp under the new PM. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D95250	2021-01-22 12:29:39 -08:00
Yaxun (Sam) Liu	622eaa4a4c	[HIP] Support __managed__ attribute This patch implements codegen for __managed__ variable attribute for HIP. Diagnostics will be added later. Differential Revision: https://reviews.llvm.org/D94814	2021-01-22 11:43:58 -05:00
Simon Pilgrim	bd122f6d21	[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle vperm2x128(movddup(x),movddup(y)) cases Fold vperm2x128(movddup(x),movddup(y)) -> movddup(vperm2x128(x,y))	2021-01-22 16:05:19 +00:00
Simon Pilgrim	c33d36e066	[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle unary vperm2x128(permute/shift(x,c),undef) cases Fold vperm2x128(permute/shift(x,c),undef) -> permute/shift(vperm2x128(x,undef),c)	2021-01-22 15:47:23 +00:00
Simon Pilgrim	4846f6ab81	[X86][AVX] combineTargetShuffle - simplify the X86ISD::VPERM2X128 subvector matching Simplify vperm2x128(concat(X,Y),concat(Z,W)) folding. Use collectConcatOps / ISD::INSERT_SUBVECTOR to find the source subvectors instead of hardcoded immediate matching.	2021-01-22 15:47:22 +00:00
David Green	af03324984	[ARM] Disable sign extended SSAT pattern recognition. I may have given bad advice, and skipping sext_inreg when matching SSAT patterns is not valid on it's own. It at least needs to sext_inreg the input again, but as far as I can tell is still only valid based on demanded bits. For the moment disable that part of the combine, hopefully reimplementing it in the future more correctly.	2021-01-22 14:07:48 +00:00
Simon Pilgrim	b1166e1317	[X86][AVX] combineX86ShufflesRecursively - attempt to constant fold before widening shuffle inputs combineX86ShufflesConstants/canonicalizeShuffleMaskWithHorizOp can both handle/earlyout shuffles with inputs of different widths, so delay widening as late as possible to make it easier to match constant folds etc. The plan is to eventually move the widening inside combineX86ShuffleChain so that we don't create any new nodes unless we successfully combine the shuffles.	2021-01-22 13:19:35 +00:00
Simon Pilgrim	ffe72f987f	[X86][SSE] Don't fold shuffle(binop(),binop()) -> binop(shuffle(),shuffle()) if the shuffle are splats rGbe69e66b1cd8 added the fold, but DAGCombiner.visitVECTOR_SHUFFLE doesn't merge shuffles if the inner shuffle is a splat, so we need to bail. The non-fast-horiz-ops paths see some minor regressions, we might be able to improve on this after lowering to target shuffles. Fix PR48823	2021-01-22 11:31:38 +00:00
David Green	9ae73cdbc1	[ARM] Adjust isSaturatingConditional to return a new SDValue. NFC This replaces the isSaturatingConditional function with LowerSaturatingConditional that directly returns a new SSAT or USAT SDValue, instead of returning true and the components of it.	2021-01-22 11:11:36 +00:00
Sebastian Neubauer	8214982b50	[AMDGPU] Implement mir parseCustomPseudoSourceValue Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names. Relands `ba7dcd8542`, which had memory leaks. Differential Revision: https://reviews.llvm.org/D95215	2021-01-22 11:24:08 +01:00
David Sherwood	2e080eb00a	[SVE] Add support for scalable vectorization of loops with selects and cmps I have removed an unnecessary assert in LoopVectorizationCostModel::getInstructionCost that prevented a cost being calculated for select instructions when using scalable vectors. In addition, I have changed AArch64TTIImpl::getCmpSelInstrCost to only do special cost calculations for fixed width vectors and fall back to the base version for scalable vectors. I have added a simple cost model test for cmps and selects: test/Analysis/CostModel/sve-cmpsel.ll and some simple tests that show we vectorize loops with cmp and select: test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll Differential Revision: https://reviews.llvm.org/D95039	2021-01-22 09:48:13 +00:00
Christudasan Devadasan	ff8a1cae18	[AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses. During instruction selection, there is an inconsistency in choosing the initial soffset value. With certain early passes, this value is getting modified and that brought additional fixup during eliminateFrameIndex to work for all cases. This whole transformation looks trivial and can be handled better. This patch clearly defines the initial value for soffset and keeps it unchanged before eliminateFrameIndex. The initial value must be zero for MUBUF with a frame index. The non-frame index MUBUF forms that use a raw offset from SP will have the stack register for soffset. During frame elimination, the soffset remains zero for entry functions with zero dynamic allocas and no callsites, or else is updated to the appropriate frame/stack register. Also, did some code clean up and made all asserts around soffset stricter to match. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D95071	2021-01-22 14:20:59 +05:30
Arthur Eubanks	a11bf9a7fb	[AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook Having a custom inliner doesn't really fit in with the new PM's pipeline. It's also extra technical debt. amdgpu-inline only does a couple of custom things compared to the normal inliner: 1) It disables inlining if the number of BBs in a function would exceed some limit 2) It increases the threshold if there are pointers to private arrays(?) These can all be handled as TTI inliner hooks. There already exists a hook for backends to multiply the inlining threshold. This way we can remove the custom amdgpu-inline pass. This caused inline-hint.ll to fail, and after some investigation, it looks like getInliningThresholdMultiplier() was previously getting applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it not applying at all, so some later inliner change must have fixed something), so I had to change the threshold in the test. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D94153	2021-01-21 20:29:17 -08:00
Kazu Hirata	cfa241680f	[llvm] Don't include StringSwitch.h where unnecessary (NFC)	2021-01-21 19:59:48 -08:00
Hsiangkai Wang	5d354220d4	[RISCV] Correct DWARF number for vector registers. The DWARF numbers of vector registers are already defined in riscv-elf-psabi. The DWARF number for vector is start from 96. Correct the DWARF numbers of vector registers. Differential Revision: https://reviews.llvm.org/D94749	2021-01-22 11:33:42 +08:00
Craig Topper	f8f1b20e6b	[RISCV] Don't create LMUL=8 pseudo instructions for ternary widening arithmetic instructions These instructions produce 2*SEW result so the input can't have an LMUL=8 or the result would need a non-existant LMUL=16. So only create pseudos for LMUL up to 4. Differential Revision: https://reviews.llvm.org/D95189	2021-01-21 19:29:02 -08:00
Cassie Jones	3dedad475d	[AArch64][GlobalISel] Make G_USUBO legal and select it. The expansion for wide subtractions includes G_USUBO. Differential Revision: https://reviews.llvm.org/D95032	2021-01-21 18:53:33 -08:00
ShihPo Hung	9667750331	[RISCV] Add intrinsics for RVV1.0 VFRSQRTE7 & VFRECE7 Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D95113	2021-01-21 18:38:49 -08:00
ShihPo Hung	976cf53cc7	[RISCV] Add intrinsics for vector unordered indexed load in RVV 1.0 Add unordered indexed load: vluxei Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D95028	2021-01-21 18:38:49 -08:00
ShihPo Hung	bea661d9a5	[RISCV] Add intrinsics for RVV 1.0 vrgatherei16 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D95014	2021-01-21 18:38:49 -08:00
Qiu Chaofan	449f2f7140	[PowerPC] Duplicate inherited heuristic from base scheduler PowerPC has its custom scheduler heuristic. It calls parent classes' tryCandidate in override version, but the function returns void, so this way doesn't actually help. This patch duplicates code from base scheduler into PPC machine scheduler class, which does what we wanted. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D94464	2021-01-22 10:11:03 +08:00
Craig Topper	3b5430eb0d	[RISCV] Add a VL output to vleff intrinsics. The fault-only-first-load instructions can reduce VL if an element other than element 0 triggers a memory fault. This can be used to vectorize loops with data dependent exit conditions like strcmp or strlen. This patch adds a VL output to these intrinsics so that the new VL value can be captured by software. This will be expanded to 'csrr gpr, vl' after the vleff instruction during SelectionDAG. By doing this with one intrinsic we are able to guarantee that the csrr reads the VL value produced by the vleff instruction. Having it as a separate intrinsic would make it impossible to guarantee ordering without making every other vector intrinsic have side effects. The intrinsics are expanded during lowering into two ISD nodes that are glued together. These ISD nodes will go through isel separately, but should maintain the glue so that they get emitted adjacently by InstrEmitter. I've only ran the chain through the vleff instruction, allowing the READ_VL to be deleted if it is unused. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D94286	2021-01-21 17:19:58 -08:00
Hsiangkai Wang	6e360460f1	[RISCV] Use v8-v23 as argument registers to conform to the proposal. The maximum LMUL is 8. We need 16 vector registers for two LMUL-8 arguments. The modification follows the proposal of psABI in https://github.com/riscv/riscv-elf-psabi-doc/pull/171 Differential Revision: https://reviews.llvm.org/D95134	2021-01-22 07:55:24 +08:00
Hsiangkai Wang	b7ab6726b6	[RISCV] New vector load/store in V extension v1.0 Upgrade RISC-V V extension to v1.0-08a0b46. Indexed load/store have ordered and unordered form. New whole vector load/store. Differential Revision: https://reviews.llvm.org/D93614	2021-01-22 07:30:09 +08:00
David Green	39db5753f9	[LV][ARM] Inloop reduction cost modelling This adds cost modelling for the inloop vectorization added in `745bf6cf44`. Up until now they have been modelled as the original underlying instruction, usually an add. This happens to works OK for MVE with instructions that are reducing into the same type as they are working on. But MVE's instructions can perform the equivalent of an extended MLA as a single instruction: %sa = sext <16 x i8> A to <16 x i32> %sb = sext <16 x i8> B to <16 x i32> %m = mul <16 x i32> %sa, %sb %r = vecreduce.add(%m) -> R = VMLADAV A, B There are other instructions for performing add reductions of v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64 (VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV). The i64 are particularly interesting as there are no native i64 add/mul instructions, leading to the i64 add and mul naturally getting very high costs. Also worth mentioning, under NEON there is the concept of a sdot/udot instruction which performs a partial reduction from a v16i8 to a v4i32. They extend and mul/sum the first four elements from the inputs into the first element of the output, repeating for each of the four output lanes. They could possibly be represented in the same way as above in llvm, so long as a vecreduce.add could perform a partial reduction. The vectorizer would then produce a combination of in and outer loop reductions to efficiently use the sdot and udot instructions. Although this patch does not do that yet, it does suggest that separating the input reduction type from the produced result type is a useful concept to model. It also shows that a MLA reduction as a single instruction is fairly common. This patch attempt to improve the costmodelling of in-loop reductions by: - Adding some pattern matching in the loop vectorizer cost model to match extended reduction patterns that are optionally extended and/or MLA patterns. This marks the cost of the reduction instruction correctly and the sext/zext/mul leading up to it as free, which is otherwise difficult to tell and may get a very high cost. (In the long run this can hopefully be replaced by vplan producing a single node and costing it correctly, but that is not yet something that vplan can do). - getExtendedAddReductionCost is added to query the cost of these extended reduction patterns. - Expanded the ARM costs to account for these expanded sizes, which is a fairly simple change in itself. - Some minor alterations to allow inloop reduction larger than the highest vector width and i64 MVE reductions. - An extra InLoopReductionImmediateChains map was added to the vectorizer for it to efficiently detect which instructions are reductions in the cost model. - The tests have some updates to show what I believe is optimal vectorization and where we are now. Put together this can greatly improve performance for reduction loop under MVE. Differential Revision: https://reviews.llvm.org/D93476	2021-01-21 21:03:41 +00:00
Michael Munday	4ab0f51a75	Recommit "[RISCV] Legalize select when Zbt extension available" This recommits `71ed4b6ce5` with the polarity of some of the pattern corrected. Original commit message: The custom expansion of select operations in the RISC-V backend interferes with the matching of cmov instructions. Legalizing select when the Zbt extension is available solves that problem. Reviewed By: luismarques, craig.topper Differential Revision: https://reviews.llvm.org/D93767	2021-01-21 12:07:44 -08:00
Duncan P. N. Exon Smith	f2fd41d789	X86: Fix use-after-realloc in X86AsmParser::ParseIntelExpression `X86AsmParser::ParseIntelExpression` has a while loop. In the body, calls to MCAsmLexer::UnLex can force a reallocation in the MCAsmLexer's `CurToken` SmallVector, invalidating saved references to `MCAsmLexer::getTok()`. `const MCAsmToken &Tok` is such a saved reference, and this moves it from outside the while loop to inside the body, fixing a use-after-realloc. `Tok` will still be reused across calls to `Lex()`, each of which effectively destroys and constructs the pointed-to token. I'm a bit skeptical of this usage pattern, but it seems broadly used in the X86AsmParser (and others) so I'm leaving it alone (for now). Somehow this bug was exposed by https://reviews.llvm.org/D94739, resulting in test failures in dot-operator related tests in llvm/test/tools/llvm-ml. I suspect the exposure path is related to optimizer changes from splitting up the grow operation, but I haven't dug all the way in. Regardless, there are already tests in tree that cover this; they might fail consistently if we added ASan instrumentation to SmallVector. Differential Revision: https://reviews.llvm.org/D95112	2021-01-21 11:24:35 -08:00
Hsiangkai Wang	b8921af63b	[RISCV] Update V instructions constraints to conform to v1.0 Upgrade RISC-V V extension to v1.0-08a0b46. Update instruction constraints to conform to v1.0. Differential Revision: https://reviews.llvm.org/D93612	2021-01-22 01:15:55 +08:00
Sebastian Neubauer	4dbdff66fe	Revert "[AMDGPU] Implement mir parseCustomPseudoSourceValue" This reverts commit `ba7dcd8542`. (caused memory leaks)	2021-01-21 18:11:48 +01:00

1 2 3 4 5 ...

61012 Commits