llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	b825278364	[RISCV] Rename mnemonics slliu.w->slli.uw and addu.w->add.uw to match 0.93 bitmanip spec. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94582	2021-01-22 12:49:10 -08:00
Craig Topper	d985c7321f	[RISCV] Swap encodings of max and minu to match 0.93 bitmanip spec. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94580	2021-01-22 12:49:10 -08:00
Craig Topper	b2f859500f	[RISCV] Remove addiwu, addwu, subwu, subuw, clmulw, clmulrw, clmulhw to match 0.93 bitmanip spec. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94577	2021-01-22 12:49:10 -08:00
Craig Topper	6aced6bf39	[RISCV] Rename pcnt->cpop to match 0.93 bitmanip spec. This is the first of multiple patches to bring our 0.92 implementation up to 0.93. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D94568	2021-01-22 12:49:10 -08:00
Arthur Eubanks	42d682a217	[NewPM][AMDGPU] Skip adding CGSCCOptimizerLate callbacks at O0 The legacy PM's EP_CGSCCOptimizerLate was only used under not-O0. Fixes clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp under the new PM. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D95250	2021-01-22 12:29:39 -08:00
Yaxun (Sam) Liu	622eaa4a4c	[HIP] Support __managed__ attribute This patch implements codegen for __managed__ variable attribute for HIP. Diagnostics will be added later. Differential Revision: https://reviews.llvm.org/D94814	2021-01-22 11:43:58 -05:00
Simon Pilgrim	bd122f6d21	[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle vperm2x128(movddup(x),movddup(y)) cases Fold vperm2x128(movddup(x),movddup(y)) -> movddup(vperm2x128(x,y))	2021-01-22 16:05:19 +00:00
Simon Pilgrim	c33d36e066	[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle unary vperm2x128(permute/shift(x,c),undef) cases Fold vperm2x128(permute/shift(x,c),undef) -> permute/shift(vperm2x128(x,undef),c)	2021-01-22 15:47:23 +00:00
Simon Pilgrim	4846f6ab81	[X86][AVX] combineTargetShuffle - simplify the X86ISD::VPERM2X128 subvector matching Simplify vperm2x128(concat(X,Y),concat(Z,W)) folding. Use collectConcatOps / ISD::INSERT_SUBVECTOR to find the source subvectors instead of hardcoded immediate matching.	2021-01-22 15:47:22 +00:00
David Green	af03324984	[ARM] Disable sign extended SSAT pattern recognition. I may have given bad advice, and skipping sext_inreg when matching SSAT patterns is not valid on it's own. It at least needs to sext_inreg the input again, but as far as I can tell is still only valid based on demanded bits. For the moment disable that part of the combine, hopefully reimplementing it in the future more correctly.	2021-01-22 14:07:48 +00:00
Simon Pilgrim	b1166e1317	[X86][AVX] combineX86ShufflesRecursively - attempt to constant fold before widening shuffle inputs combineX86ShufflesConstants/canonicalizeShuffleMaskWithHorizOp can both handle/earlyout shuffles with inputs of different widths, so delay widening as late as possible to make it easier to match constant folds etc. The plan is to eventually move the widening inside combineX86ShuffleChain so that we don't create any new nodes unless we successfully combine the shuffles.	2021-01-22 13:19:35 +00:00
Simon Pilgrim	ffe72f987f	[X86][SSE] Don't fold shuffle(binop(),binop()) -> binop(shuffle(),shuffle()) if the shuffle are splats rGbe69e66b1cd8 added the fold, but DAGCombiner.visitVECTOR_SHUFFLE doesn't merge shuffles if the inner shuffle is a splat, so we need to bail. The non-fast-horiz-ops paths see some minor regressions, we might be able to improve on this after lowering to target shuffles. Fix PR48823	2021-01-22 11:31:38 +00:00
David Green	9ae73cdbc1	[ARM] Adjust isSaturatingConditional to return a new SDValue. NFC This replaces the isSaturatingConditional function with LowerSaturatingConditional that directly returns a new SSAT or USAT SDValue, instead of returning true and the components of it.	2021-01-22 11:11:36 +00:00
Sebastian Neubauer	8214982b50	[AMDGPU] Implement mir parseCustomPseudoSourceValue Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names. Relands `ba7dcd8542`, which had memory leaks. Differential Revision: https://reviews.llvm.org/D95215	2021-01-22 11:24:08 +01:00
David Sherwood	2e080eb00a	[SVE] Add support for scalable vectorization of loops with selects and cmps I have removed an unnecessary assert in LoopVectorizationCostModel::getInstructionCost that prevented a cost being calculated for select instructions when using scalable vectors. In addition, I have changed AArch64TTIImpl::getCmpSelInstrCost to only do special cost calculations for fixed width vectors and fall back to the base version for scalable vectors. I have added a simple cost model test for cmps and selects: test/Analysis/CostModel/sve-cmpsel.ll and some simple tests that show we vectorize loops with cmp and select: test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll Differential Revision: https://reviews.llvm.org/D95039	2021-01-22 09:48:13 +00:00
Christudasan Devadasan	ff8a1cae18	[AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses. During instruction selection, there is an inconsistency in choosing the initial soffset value. With certain early passes, this value is getting modified and that brought additional fixup during eliminateFrameIndex to work for all cases. This whole transformation looks trivial and can be handled better. This patch clearly defines the initial value for soffset and keeps it unchanged before eliminateFrameIndex. The initial value must be zero for MUBUF with a frame index. The non-frame index MUBUF forms that use a raw offset from SP will have the stack register for soffset. During frame elimination, the soffset remains zero for entry functions with zero dynamic allocas and no callsites, or else is updated to the appropriate frame/stack register. Also, did some code clean up and made all asserts around soffset stricter to match. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D95071	2021-01-22 14:20:59 +05:30
Arthur Eubanks	a11bf9a7fb	[AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook Having a custom inliner doesn't really fit in with the new PM's pipeline. It's also extra technical debt. amdgpu-inline only does a couple of custom things compared to the normal inliner: 1) It disables inlining if the number of BBs in a function would exceed some limit 2) It increases the threshold if there are pointers to private arrays(?) These can all be handled as TTI inliner hooks. There already exists a hook for backends to multiply the inlining threshold. This way we can remove the custom amdgpu-inline pass. This caused inline-hint.ll to fail, and after some investigation, it looks like getInliningThresholdMultiplier() was previously getting applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it not applying at all, so some later inliner change must have fixed something), so I had to change the threshold in the test. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D94153	2021-01-21 20:29:17 -08:00
Kazu Hirata	cfa241680f	[llvm] Don't include StringSwitch.h where unnecessary (NFC)	2021-01-21 19:59:48 -08:00
Hsiangkai Wang	5d354220d4	[RISCV] Correct DWARF number for vector registers. The DWARF numbers of vector registers are already defined in riscv-elf-psabi. The DWARF number for vector is start from 96. Correct the DWARF numbers of vector registers. Differential Revision: https://reviews.llvm.org/D94749	2021-01-22 11:33:42 +08:00
Craig Topper	f8f1b20e6b	[RISCV] Don't create LMUL=8 pseudo instructions for ternary widening arithmetic instructions These instructions produce 2*SEW result so the input can't have an LMUL=8 or the result would need a non-existant LMUL=16. So only create pseudos for LMUL up to 4. Differential Revision: https://reviews.llvm.org/D95189	2021-01-21 19:29:02 -08:00
Cassie Jones	3dedad475d	[AArch64][GlobalISel] Make G_USUBO legal and select it. The expansion for wide subtractions includes G_USUBO. Differential Revision: https://reviews.llvm.org/D95032	2021-01-21 18:53:33 -08:00
ShihPo Hung	9667750331	[RISCV] Add intrinsics for RVV1.0 VFRSQRTE7 & VFRECE7 Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D95113	2021-01-21 18:38:49 -08:00
ShihPo Hung	976cf53cc7	[RISCV] Add intrinsics for vector unordered indexed load in RVV 1.0 Add unordered indexed load: vluxei Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D95028	2021-01-21 18:38:49 -08:00
ShihPo Hung	bea661d9a5	[RISCV] Add intrinsics for RVV 1.0 vrgatherei16 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D95014	2021-01-21 18:38:49 -08:00
Qiu Chaofan	449f2f7140	[PowerPC] Duplicate inherited heuristic from base scheduler PowerPC has its custom scheduler heuristic. It calls parent classes' tryCandidate in override version, but the function returns void, so this way doesn't actually help. This patch duplicates code from base scheduler into PPC machine scheduler class, which does what we wanted. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D94464	2021-01-22 10:11:03 +08:00
Craig Topper	3b5430eb0d	[RISCV] Add a VL output to vleff intrinsics. The fault-only-first-load instructions can reduce VL if an element other than element 0 triggers a memory fault. This can be used to vectorize loops with data dependent exit conditions like strcmp or strlen. This patch adds a VL output to these intrinsics so that the new VL value can be captured by software. This will be expanded to 'csrr gpr, vl' after the vleff instruction during SelectionDAG. By doing this with one intrinsic we are able to guarantee that the csrr reads the VL value produced by the vleff instruction. Having it as a separate intrinsic would make it impossible to guarantee ordering without making every other vector intrinsic have side effects. The intrinsics are expanded during lowering into two ISD nodes that are glued together. These ISD nodes will go through isel separately, but should maintain the glue so that they get emitted adjacently by InstrEmitter. I've only ran the chain through the vleff instruction, allowing the READ_VL to be deleted if it is unused. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D94286	2021-01-21 17:19:58 -08:00
Hsiangkai Wang	6e360460f1	[RISCV] Use v8-v23 as argument registers to conform to the proposal. The maximum LMUL is 8. We need 16 vector registers for two LMUL-8 arguments. The modification follows the proposal of psABI in https://github.com/riscv/riscv-elf-psabi-doc/pull/171 Differential Revision: https://reviews.llvm.org/D95134	2021-01-22 07:55:24 +08:00
Hsiangkai Wang	b7ab6726b6	[RISCV] New vector load/store in V extension v1.0 Upgrade RISC-V V extension to v1.0-08a0b46. Indexed load/store have ordered and unordered form. New whole vector load/store. Differential Revision: https://reviews.llvm.org/D93614	2021-01-22 07:30:09 +08:00
David Green	39db5753f9	[LV][ARM] Inloop reduction cost modelling This adds cost modelling for the inloop vectorization added in `745bf6cf44`. Up until now they have been modelled as the original underlying instruction, usually an add. This happens to works OK for MVE with instructions that are reducing into the same type as they are working on. But MVE's instructions can perform the equivalent of an extended MLA as a single instruction: %sa = sext <16 x i8> A to <16 x i32> %sb = sext <16 x i8> B to <16 x i32> %m = mul <16 x i32> %sa, %sb %r = vecreduce.add(%m) -> R = VMLADAV A, B There are other instructions for performing add reductions of v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64 (VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV). The i64 are particularly interesting as there are no native i64 add/mul instructions, leading to the i64 add and mul naturally getting very high costs. Also worth mentioning, under NEON there is the concept of a sdot/udot instruction which performs a partial reduction from a v16i8 to a v4i32. They extend and mul/sum the first four elements from the inputs into the first element of the output, repeating for each of the four output lanes. They could possibly be represented in the same way as above in llvm, so long as a vecreduce.add could perform a partial reduction. The vectorizer would then produce a combination of in and outer loop reductions to efficiently use the sdot and udot instructions. Although this patch does not do that yet, it does suggest that separating the input reduction type from the produced result type is a useful concept to model. It also shows that a MLA reduction as a single instruction is fairly common. This patch attempt to improve the costmodelling of in-loop reductions by: - Adding some pattern matching in the loop vectorizer cost model to match extended reduction patterns that are optionally extended and/or MLA patterns. This marks the cost of the reduction instruction correctly and the sext/zext/mul leading up to it as free, which is otherwise difficult to tell and may get a very high cost. (In the long run this can hopefully be replaced by vplan producing a single node and costing it correctly, but that is not yet something that vplan can do). - getExtendedAddReductionCost is added to query the cost of these extended reduction patterns. - Expanded the ARM costs to account for these expanded sizes, which is a fairly simple change in itself. - Some minor alterations to allow inloop reduction larger than the highest vector width and i64 MVE reductions. - An extra InLoopReductionImmediateChains map was added to the vectorizer for it to efficiently detect which instructions are reductions in the cost model. - The tests have some updates to show what I believe is optimal vectorization and where we are now. Put together this can greatly improve performance for reduction loop under MVE. Differential Revision: https://reviews.llvm.org/D93476	2021-01-21 21:03:41 +00:00
Michael Munday	4ab0f51a75	Recommit "[RISCV] Legalize select when Zbt extension available" This recommits `71ed4b6ce5` with the polarity of some of the pattern corrected. Original commit message: The custom expansion of select operations in the RISC-V backend interferes with the matching of cmov instructions. Legalizing select when the Zbt extension is available solves that problem. Reviewed By: luismarques, craig.topper Differential Revision: https://reviews.llvm.org/D93767	2021-01-21 12:07:44 -08:00
Duncan P. N. Exon Smith	f2fd41d789	X86: Fix use-after-realloc in X86AsmParser::ParseIntelExpression `X86AsmParser::ParseIntelExpression` has a while loop. In the body, calls to MCAsmLexer::UnLex can force a reallocation in the MCAsmLexer's `CurToken` SmallVector, invalidating saved references to `MCAsmLexer::getTok()`. `const MCAsmToken &Tok` is such a saved reference, and this moves it from outside the while loop to inside the body, fixing a use-after-realloc. `Tok` will still be reused across calls to `Lex()`, each of which effectively destroys and constructs the pointed-to token. I'm a bit skeptical of this usage pattern, but it seems broadly used in the X86AsmParser (and others) so I'm leaving it alone (for now). Somehow this bug was exposed by https://reviews.llvm.org/D94739, resulting in test failures in dot-operator related tests in llvm/test/tools/llvm-ml. I suspect the exposure path is related to optimizer changes from splitting up the grow operation, but I haven't dug all the way in. Regardless, there are already tests in tree that cover this; they might fail consistently if we added ASan instrumentation to SmallVector. Differential Revision: https://reviews.llvm.org/D95112	2021-01-21 11:24:35 -08:00
Hsiangkai Wang	b8921af63b	[RISCV] Update V instructions constraints to conform to v1.0 Upgrade RISC-V V extension to v1.0-08a0b46. Update instruction constraints to conform to v1.0. Differential Revision: https://reviews.llvm.org/D93612	2021-01-22 01:15:55 +08:00
Sebastian Neubauer	4dbdff66fe	Revert "[AMDGPU] Implement mir parseCustomPseudoSourceValue" This reverts commit `ba7dcd8542`. (caused memory leaks)	2021-01-21 18:11:48 +01:00
Hsiangkai Wang	266820be35	[RISCV] Add new V instructions in v1.0-08a0b46. Add new V instructions. vfrsqrte7.v vfrece7.v vrgatherei16.vv vneg.v vncvt.x.x.w vfneg.v	2021-01-22 00:59:58 +08:00
Hsiangkai Wang	9dd5aea1e0	[RISCV] Make LMUL field in VTYPE continuous. Upgrade RISC-V V extension to v1.0-08a0b46. Update the VTYPE encoding. Make LMUL encoding in a continuous field.	2021-01-22 00:47:32 +08:00
Jay Foad	c0b3c5a064	[AMDGPU][GlobalISel] Run SIAddImgInit This pass is required to get correct codegen for image instructions with the tfe or lwe bits set. Differential Revision: https://reviews.llvm.org/D95132	2021-01-21 15:54:54 +00:00
Matt Arsenault	94375d1083	AMDGPU: Remove v_rsq_f64 patterns This isn't accurate enough without correction	2021-01-21 10:51:36 -05:00
Matt Arsenault	2a0db8d70e	AMDGPU: Use more accurate fast f64 fdiv A raw v_rcp_f64 isn't accurate enough, so start applying correction.	2021-01-21 10:51:36 -05:00
Matt Arsenault	35c535a7df	AArch64/GlobalISel: Factor out parametersInCSRMatch Make this look more like the DAG handling and move to common code. I also noticed AArch64 seems to not be properly adding the physreg:virtreg mapping to the function live ins.	2021-01-21 10:32:48 -05:00
Sebastian Neubauer	ba7dcd8542	[AMDGPU] Implement mir parseCustomPseudoSourceValue Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names. Differential Revision: https://reviews.llvm.org/D94768	2021-01-21 16:32:17 +01:00
David Green	dfac521da1	[ARM] Fix vector saddsat costs. It turns out the vectorizer calls the getIntrinsicInstrCost functions with a scalar return type and vector VF. This updates the costmodel to handle that, still producing the correct vector costs. A vectorizer test is added to show it vectorizing at the correct factor again.	2021-01-21 15:30:39 +00:00
Matt Arsenault	20566a2ed8	AMDGPU: Add occupancy to serialized MachineFunctionInfo Not sure about the default value handling, but also not sure defaulting to a theoretically subtarget dependent value.	2021-01-21 09:21:00 -05:00
Simon Pilgrim	69bc0990a9	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED). Add DemandedElts support inside the TRUNCATE analysis. REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d Differential Revision: https://reviews.llvm.org/D56387	2021-01-21 13:01:34 +00:00
Adhemerval Zanella	ff41ae8b36	MC: AArch64: Add support for gotpage_lo15 It is not used bt LLVM itself, but it would be used on lld tests to implement R_AARCH64_LD64_GOTPAGE_LO15 support.	2021-01-21 08:29:49 -03:00
Simon Pilgrim	86021d98d3	[X86] Avoid a std::string copy by replacing auto with const auto&. NFC. Fixes msvc analyzer warning.	2021-01-21 11:04:07 +00:00
Luo, Yuanke	64132f541e	Revert "[X86][AMX] Fix tile config register spill issue." This reverts commit `20013d02f3`.	2021-01-21 18:11:43 +08:00
Luo, Yuanke	20013d02f3	[X86][AMX] Fix tile config register spill issue. Previous code build the model that tile config register is the user of each AMX instruction. There is a problem for the tile config register spill. When across function, the ldtilecfg instruction may be inserted on each AMX instruction which use tile config register. This cause all tile data register clobber. To fix this issue, we remove the model of tile config register. We analyze the regmask of call instruction and insert ldtilecfg if there is any tile data register live across the call. Inserting the sttilecfg before the call is unneccessary, because the tile config doesn't change and we can just reload the config. Besides we also need check tile config register interference. Since we don't model the config register we should check interference from the ldtilecfg to each tile data register def. ldtilecfg / \ BB1 BB2 / \ call BB3 / \ %1=tileload %2=tilezero We can start from the instruction of each tile def, and backward to ldtilecfg. If there is any call instruction, and tile data register is not preserved, we should insert ldtilecfg after the call instruction. Differential Revision: https://reviews.llvm.org/D94155	2021-01-21 16:01:50 +08:00
madhur13490	dd8ae42674	[IndirectFunctions] Skip propagating attributes to address taken functions In case of indirect calls or address taken functions, skip propagating any attributes to them. We just propagate features to such functions. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D94585	2021-01-21 07:04:28 +00:00
Kazu Hirata	8f5da41c4d	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-20 21:35:52 -08:00
Max Kazantsev	d6bb96e677	[X86] Add experimental option to separately tune alignment of innermost loops We already have an experimental option to tune loop alignment. Its impact is very wide (and there is a suspicion that it's not always profitable). We want to have something more narrow to play with. This patch adds similar option that overrides preferred alignment for innermost loops. This is for experimental purposes, default values do not change the existing behavior. Differential Revision: https://reviews.llvm.org/D94895 Reviewed By: pengfei	2021-01-21 11:15:16 +07:00

1 2 3 4 5 ...

60995 Commits