llvm-project

Commit Graph

Author	SHA1	Message	Date
Jay Foad	3264e95938	[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress Delegate updating of LiveIntervals to each target's convertToThreeAddress implementation, instead of repairing LiveIntervals after the fact in TwoAddressInstruction::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D113493	2021-11-17 10:16:47 +00:00
Stanislav Mekhanoshin	c74f2e5b27	[InstCombine] Use SpecificBinaryOp_match in two more places Differential Revision: https://reviews.llvm.org/D114038	2021-11-17 01:16:06 -08:00
Roman Lebedev	2037ec725f	[X86][Costmodel] `ext v64i1 to v32i16` can appear after legalization, cost is same as for `ext v32i1 to v32i16` Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113914	2021-11-17 12:02:50 +03:00
Roman Lebedev	23b194bf18	[X86][Costmodel] `trunc v32i16 to v64i1` can appear after legalization, cost is same as for `trunc v32i16 to v32i1` Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113913	2021-11-17 12:02:50 +03:00
Eric Tang	f7eb061a5f	[SelectionDAG] Make WidenVecRes_SELECT work for scalable vectors This change make WidenVecRes_SELECT work for scalable vectors. This patch is split from [D110319](https://reviews.llvm.org/D110319) Signed-off-by: Eric Tang <tangxingxin1008@gmail.com> Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D110388	2021-11-17 08:55:11 +00:00
Aaron Puchert	b20da5117f	Don't add irrelevant items to queue in DwarfCompileUnit::createScopeChildrenDIE (NFC) Instead of popping them and then immediately throwing them away, we can just filter out globals and items in different scopes before adding them to WorkList. Shouldn't change anything but keep the queue smaller. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D113864	2021-11-17 00:01:20 +01:00
Aaron Puchert	86b3100cde	[DebugInfo] Use DbgEntityKind in DbgEntity interface (NFC) It was being used occasionally already, and using it on the constructor and getDbgEntityID has obvious type safety benefits. Also use llvm_unreachable in the switch as usual, but since only these two values are used in constructor calls I think it's still NFC. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D113862	2021-11-17 00:01:20 +01:00
Benjamin Kramer	8b8e8704ce	[PowerPC] Fix a nullptr dereference LiMI1/LiMI2 can be null, so don't call a method on them before checking. Found by ubsan.	2021-11-16 23:52:42 +01:00
Philip Reames	8d85e945b2	[SCEV] Canonicalize X - urem X, Y patterns There are multiple possible ways to represent the X - urem X, Y pattern. SCEV was not canonicalizing, and thus, depending on which you analyzed, you could get different results. The sub representation appears to produce strictly inferior results in practice, so I decided to canonicalize to the Y * X/Y version. The motivation here is that runtime unroll produces the sub X - (and X, Y-1) pattern when Y is a power of two. SCEV is thus unable to recognize that an unrolled loop exits because we don't figure out that the new unrolled step evenly divides the trip count of the unrolled loop. After instcombine runs, we convert the the andn form which SCEV recognizes, so essentially, this is just fixing a nasty pass ordering dependency. The ARM loop hardware interaction in the test diff is opague to me, but the comments in the review from others knowledge of the infrastructure appear to indicate these are improvements in loop recognition, not regressions. Differential Revision: https://reviews.llvm.org/D114018	2021-11-16 11:59:21 -08:00
Arthur Eubanks	c95a9f46c9	[Loads] Handle addrspacecast constant expressions when determining dereferenceability Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D114015	2021-11-16 11:17:57 -08:00
Victor Huang	ae27ca9a67	[PowerPC] PPC backend optimization on conditional trap intrustions This patch adds PPC back end optimization to analyze the arguments of a conditional trap instruction to execute one of the following: 1. Delete it if never trap 2. Replace it if always trap 3. Otherwise keep it Reviewed By: nemanjai, amyk, PowerPC Differential revision: https://reviews.llvm.org/D111434	2021-11-16 13:11:57 -06:00
Hongtao Yu	042cefd2b5	[CSSPGO] Fix a hash code truncating issue in ContextTrieNode. std::hash returns a 64bit hash code while previously we were using only lower 32 bits which caused hash collision for large workloads. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D113688	2021-11-16 11:01:52 -08:00
River Riddle	4c484f11d3	[llvm] Add a SFINAE template parameter to DenseMapInfo This allows for using SFINAE partial specialization for DenseMapInfo. In MLIR, this is particularly useful as it will allow for defining partial specializations that support all Attribute, Op, and Type classes without needing to specialize DenseMapInfo for each individual class. Differential Revision: https://reviews.llvm.org/D113641	2021-11-16 18:54:14 +00:00
Mircea Trofin	c6b9b702a0	[NFC][Regalloc] Factor out eviction decision from eviction attempt This splits tryEvict into a const tryFindEvictionCandidate, which attempts to find a candidate, and the actual eviction (should the former be successful) The newly introduced tryFindEvictionCandidate will move subsequently into the RegAllocEvictionAdvisor. RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html Differential Revision: https://reviews.llvm.org/D113941	2021-11-16 10:50:23 -08:00
Duncan P. N. Exon Smith	fd6018072a	DebugInfo: Make DWARFExpression::iterator a const iterator `3d1d8c767b` made DWARFExpression::iterator's Operation member `mutable`. After a few prep commits, the iterator can instead be made a `const` iterator since no caller can change the Operation. Differential Revision: https://reviews.llvm.org/D113958	2021-11-16 10:25:10 -08:00
Duncan P. N. Exon Smith	a0f1f17131	DebugInfo: Stop modifying Operation::Error inside of verify() The only caller of Operation::verify() is DWARFExpression::verify(), which iterates past the (ephemeral) Operation immediately after. - Stop setting Operation::Error because the mutation will never be observed. - Change verify() to a static function to be sure all callers are updated. Differential Revision: https://reviews.llvm.org/D113957	2021-11-16 10:21:04 -08:00
Kazu Hirata	ee0133dc6d	[llvm] Use range-for loops (NFC)	2021-11-16 09:01:56 -08:00
Philip Reames	ed6b69a38f	Add a hasPoisonGeneratingFlags proxy wrapper to Instruction [NFC] This just cuts down on casts to Operator.	2021-11-16 08:48:16 -08:00
David Sherwood	4607459022	[AArch64] Fix TypeSize->uint64_t implicit conversion in AArch64ISelLowering::hasAndNot For now I've just changed the code to only return true from AArch64ISelLowering::hasAndNot if the vector is fixed-length. Once we have the right patterns or DAG combines to use bic/bif we can also enable this for SVE. Test added here: CodeGen/AArch64/vselect-constants.ll Differential Revision: https://reviews.llvm.org/D113994	2021-11-16 16:25:16 +00:00
Jon Chesterfield	30b29db7c7	[amdgpu] Don't crash on empty global ctor/dtor Global ctor/dtor can be an empty array, which is a Constant not a ConstantArray. The cast<ConstantArray> therefore asserts / crashes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D113800	2021-11-16 14:36:08 +00:00
Sanjay Patel	8fce94f916	[InstCombine] canonicalize icmp with trunc op into mask and cmp, part 2 If C is a high-bit mask: (trunc X) u< C --> (X & C) != C (are any masked-high-bits clear?) If C is low-bit mask: (trunc X) u> C --> (X & ~C) != 0 (are any masked-high-bits set?) If C is not-of-power-of-2 (one clear bit): (trunc X) u> C --> (X & (C+1)) == C+1 (are all masked-high-bits set?) This extends the fold added with: `acabad9ff6` (https://alive2.llvm.org/ce/z/aFr7qV) Using decomposeBitTestICmp() to generalize this is a planned follow-up, but that requires removing an inverse fold. Here are Alive2 generalizations for these folds: https://alive2.llvm.org/ce/z/u-ZpC_ (ult, the previous patch) https://alive2.llvm.org/ce/z/YsuAu2 (ult, this patch) https://alive2.llvm.org/ce/z/ekktQP (ugt, low bitmask) https://alive2.llvm.org/ce/z/pJY9wR (ugt, one clear bit) Differential Revision: https://reviews.llvm.org/D112634	2021-11-16 09:27:30 -05:00
Alexey Bataev	900cc1a226	[SLP]Improve cost of the gather nodes. No need to count the final shuffle cost for the constants, gathering of the constants is just a constant vector + extra inserts, if required. Differential Revision: https://reviews.llvm.org/D113770	2021-11-16 06:25:07 -08:00
Alexey Bataev	cdf8a53c1d	[SLP]Fix windows build, NFC. Need to put `IndexIdx` var to the list of captures.	2021-11-16 06:09:51 -08:00
Alexey Bataev	aa9bbb64be	[SLP]Adjust GEP indices types when trying to build entries. Need to adjust the types of GEPs indices when building the tree entries/operands. Otherwise some of the nodes might differ and vectorizer is unable to correctly find them and count their cost. Differential Revision: https://reviews.llvm.org/D113792	2021-11-16 05:44:33 -08:00
Sander.DeSmalen@arm.com	305816ff1e	[IndVarSimplify] Reduce nondeterministic behaviour in visitIVCast. rGf39978b84f1d3a1da6c32db48f64c8daae64b3ad led to and/or exposed an issue with IndVarSimplification for a loop where a i32 phi node is no longer replaced by a widened (i64) phi node, because the SCEVs of a sign-extend no longer folded the same way. I'm unsure how to properly explain this because it's all rather complicated, but in short: SCEVs don't fold as nicely as they used to and this caused a difference. While investigating this, I found that IndVarSimplify can actually optimise the case in the way we want to if it chooses the widened IV to be 'signed' (the i32 IV is both sign and zero-extended). Oddly enough, there is some level of indeterminism in the way the algorithm works, it just picks the sign of the 'first' zext/sext user, where the order of the users-iterator is not guaranteed to be the same on each invocation of the pass (e.g. shown by first running loop-rotate, which puts the users in a different order). While I think the fix is valid in the sense that consistently picking _any_ order is better than having an nondeterministic order, I can use a bit of advice from people more familiar in this area of the code-base. For example, I'm not sure if this fix is hiding another issue where the IndVarSimplify pass could actually draw the same conclusions (i.e. that it only needs an i64 phi node) if it does a bit more work, regardless of whether it chooses the induction variable to be signed or unsigned. I'm also not sure if choosing signed is better than unsigned, or whether that just happens to be beneficial only in this individual case. Any feedback would be much appreciated! Reviewed By: reames Differential Revision: https://reviews.llvm.org/D112573	2021-11-16 12:41:04 +00:00
Florian Hahn	b7aec4f08e	[SCEV] Support rewriting ZExt expressions with loop guard info. So far, applying loop guard information has been restricted to SCEVUnknown. In a few cases, like PR40961 and PR52464, this leads to SCEV failing to determine tight upper bounds for the backedge taken count. This patch adjusts SCEVLoopGuardRewriter and applyLoopGuards to support re-writing ZExt expressions. This is a first step towards fixing PR40961 and PR52464. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D113577	2021-11-16 11:16:07 +00:00
Matt Devereau	f526c600c0	[AArch64][SVE] Instcombine SVE LD1/ST1 to stock LLVM IR InstCombine AArch64 LD1/ST1 to llvm.masked.load/llvm.masked.store and LD1/ST1 to load/store when a ptrue all predicate pattern operand is present. This allows existing IR optimizations such as dead-load removal to occur. Differential Revision: https://reviews.llvm.org/D113489	2021-11-16 11:10:23 +00:00
Frederik Gossen	3f3d4e8a15	Fix unused variable warning in LoadStoreOpt.cpp with (void)	2021-11-16 12:03:59 +01:00
Frederik Gossen	2bceb7c8da	Revert "Fix unused variable in llvm/lib/CodeGen/GlobalISel/LoadStoreOpt.cpp" This reverts commit `40a609aebe`.	2021-11-16 12:00:17 +01:00
Frederik Gossen	ecfe7a3404	Revert "Fix unused variable warning." This reverts commit `a062e2a8ca`.	2021-11-16 11:59:34 +01:00
Frederik Gossen	9a6817b7ed	Revert "Fix another unused variable error." This reverts commit `5b84ae7c48`.	2021-11-16 11:58:02 +01:00
Adrian Kuegel	5b84ae7c48	Fix another unused variable error.	2021-11-16 11:32:44 +01:00
Adrian Kuegel	a062e2a8ca	Fix unused variable warning.	2021-11-16 11:17:33 +01:00
Frederik Gossen	40a609aebe	Fix unused variable in llvm/lib/CodeGen/GlobalISel/LoadStoreOpt.cpp	2021-11-16 11:05:18 +01:00
Amara Emerson	dcd8728d83	Remove unnecessary <any> include.	2021-11-16 00:50:30 -08:00
jacquesguan	6405e8b584	[RISCV] Refactor some rvv instructions' definition with foreach. Simplify rvv instructions that use eew in their mnemonic and encoding with foreach. And fix a scheduling bug. Differential Revision: https://reviews.llvm.org/D113453	2021-11-16 15:20:45 +08:00
Serguei Katkov	0ecb12a27f	[STATEPOINT] Force implicit-def for lr register. STATEPOINT instruction behavior is similar to call instruction. In aarch64 BL instruction implicitly define lr register, so STATEPOINT instruction should do the same. However STATEPOINT is a general pseudo instruction and I could not find a way to override list of implicit defs for specific target. So this patch post processes inserting STATEPOINT instruction by adding implisit dead def for lr. Reviewers: reames, loicottet, ostannard Reviewed By: reames Subscribers: danilaml, hiraditya, kristof.beyls, llvm-commits, yrouban Differential Revision: https://reviews.llvm.org/D111114	2021-11-16 12:52:00 +07:00
Kazu Hirata	7f00806a6a	[llvm] Use make_early_inc_range (NFC)	2021-11-15 21:28:46 -08:00
Amara Emerson	dc84770d55	[GlobalISel] Add a store-merging optimization pass and enable for AArch64. This is a first attempt at a constant value consecutive store merging pass, a counterpart to the DAGCombiner's store merging optimization. The high level goals of this pass: * Have a simple and efficient algorithm. As close to linear time as we can get. Thus, prioritizing scalability of the algorithm over merging every corner case we can find. The DAGCombiner's store merging code has been the source of compile time and complexity issues in the past and I wanted to avoid that. * Don't introduce any new data structures for ordering memory operations. In MIR, we don't have the concept of chains like we do in the DAG, and the instruction order is stricter than enforcing ordering with graph edges. Although I considered adding something similar, I couldn't justify the overhead. The pass is current split into 3 main parts. The main store merging code focuses on identifying candidate stores and managing the candidate group that's under consideration for merging. Analyzing addressing of stores is a potentially complex part and for now there's just a basic implementation to identify easy cases. Finally, the other main bit of complexity is the alias analysis, which tries to follow the same logic as the DAG's AA. Currently this implementation only supports merging of constant stores. Stores of arbitrary variables are technically possible with a very small change, but the DAG chooses not to do this. Doing so here makes most code worse since there's extra overhead in merging values into wider registers. On AArch64 -Os, this optimization results in very minor savings on CTMark. Differential Revision: https://reviews.llvm.org/D109131	2021-11-15 21:10:39 -08:00
Craig Topper	391b0ba603	[RISCV] Override TargetLowering::hasAndNot for Zbb. Differential Revision: https://reviews.llvm.org/D113937	2021-11-15 18:44:07 -08:00
Fabian Wolff	b484fa8289	[X86] Fix crash with inline asm using wrong register name Fixes PR#48678. `X86TargetLowering::getRegForInlineAsmConstraint()` can adjust the register class to match the type, e.g. change `VR128X` to `VR256X` if the type needs 256 bits. However, the function currently returns the unadjusted register and the adjusted register class, e.g. `xmm15` and `VR256X`, which then causes an assertion failure later because the register class does not contain that register. This patch fixes this behavior. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D113834	2021-11-16 10:38:12 +08:00
Matt Arsenault	659887b405	AMDGPU: Mark prolog/epilog SCC defs as dead A future change will add SCC liveness checks. Since we are still relying on forward register scavenging, add dead flags to avoid spuriously detecting SCC as live.	2021-11-15 21:35:06 -05:00
Duncan P. N. Exon Smith	79df41011b	DebugInfo: const-qualify accessors of DWARFExpression::Operation Add `const` to DWARFExpression::Operation's accessors and make Operation::extract() private, since it's only used by the friend class DWARFExpression::iterator.	2021-11-15 17:30:10 -08:00
Craig Topper	233def40f7	[DAGCombiner] Prevent unfoldMaskedMerge from creating an AND with two inverted inputs. It's possible that the mask is already a NOT. At least if InstCombine hasn't canonicalized the input. In that case we will form an ANDN with X instead of with Y. So we don't need to worry about Y being a constant. We might need to check that X isn't a constant instead, but we don't have a test case for that yet. This fixes a size regression found when trying to enable this combine for RISCV in D113937. Differential Revision: https://reviews.llvm.org/D113948	2021-11-15 17:15:51 -08:00
Mehrnoosh Heidarpour	62c51a72f9	[InstSimplify] Fold A\|B \| (A^B) --> A\|B This patch adds the following fold opportunity: A\|B \| (A^B) --> A\|B that is reported here : https://bugs.llvm.org/show_bug.cgi?id=52479 https://alive2.llvm.org/ce/z/33-My- Test cases with base results are added in D113860 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D113861	2021-11-15 18:55:04 -05:00
Ben Shi	4c3d916c4b	[RISCV] Optimize immediate materialisation with SHADD Use LUI+SHADD+ADDI to compose specific immediates. Reviewed By: craig.topper, luismarques Differential Revision: https://reviews.llvm.org/D113568	2021-11-15 23:34:28 +00:00
Stanislav Mekhanoshin	833cdb0a07	Revert "[InstSimplify] Fold A\|B \| (A^B) --> A\|B" This reverts commit `193c40e966`.	2021-11-15 14:56:20 -08:00
Arthur Eubanks	19867de9e7	[NewPM] Only invalidate modified functions' analyses in CGSCC passes + turn on eagerly invalidate analyses Previously, any change in any function in an SCC would cause all analyses for all functions in the SCC to be invalidated. With this change, we now manually invalidate analyses for functions we modify, then let the pass manager know that all function analyses should be preserved since we've already handled function analysis invalidation. So far this only touches the inliner, argpromotion, function-attrs, and updateCGAndAnalysisManager(), since they are the most used. This is part of an effort to investigate running the function simplification pipeline less on functions we visit multiple times in the inliner pipeline. However, this causes major memory regressions especially on larger IR. To counteract this, turn on the option to eagerly invalidate function analyses. This invalidates analyses on functions immediately after they're processed in a module or scc to function adaptor for specific parts of the pipeline. Within an SCC, if a pass only modifies one function, other functions in the SCC do not have their analyses invalidated, so in later function passes in the SCC pass manager the analyses may still be cached. It is only after the function passes that the eager invalidation takes effect. For the default pipelines this makes sense because the inliner pipeline runs the function simplification pipeline after all other SCC passes (except CoroSplit which doesn't request any analyses). Overall this has mostly positive effects on compile time and positive effects on memory usage. https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=instructions https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=max-rss D113196 shows that we slightly regressed compile times in exchange for some memory improvements when turning on eager invalidation. D100917 shows that we slightly improved compile times in exchange for major memory regressions in some cases when invalidating less in SCC passes. Turning these on at the same time keeps the memory improvements while keeping compile times neutral/slightly positive. Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D113304	2021-11-15 14:44:53 -08:00
Steven Wu	fcd07f8107	[JITLink] Fix splitBlock if there are symbols span across the boundary Fix `splitBlock` so that it can handle the case when the block being split has symbols span across the split boundary. This is an error case in general but for EHFrame splitting on macho platforms, there is an anonymous symbol that marks the entire block. Current implementation will leave a symbol that is out of bound of the underlying block. Fix the problem by dropping such symbols when the block is split. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D113912	2021-11-15 13:55:21 -08:00
Stanislav Mekhanoshin	193c40e966	[InstSimplify] Fold A\|B \| (A^B) --> A\|B This patch adds the following fold opportunity: A\|B \| (A^B) --> A\|B that is reported here : https://bugs.llvm.org/show_bug.cgi?id=52479 https://alive2.llvm.org/ce/z/33-My- Test cases with base results are added in D113860 (authored by MehrHeidar, committed by rampitec). Differential Revision: https://reviews.llvm.org/D113861	2021-11-15 13:49:20 -08:00
Jonas Paulsson	1c3ef9ef4a	[SystemZ] Support symbolic displacements. This patch adds support for symbolic displacements, e.g. like 'lg %r0, sym(%r1)', which is done using relocations. This is needed to compile the kernel without disabling the integrated assembler. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D113341	2021-11-15 16:46:31 -05:00
Mircea Trofin	19e6b730ce	[NFC][Regalloc] Factor types that would be used by the eviction advisor This is in prepartion of pulling the eviction decision-making into an analysis pass, which would then allow swapping that decision making process. RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html Differential Revision: https://reviews.llvm.org/D113929	2021-11-15 13:15:14 -08:00
Fangrui Song	fee52fe0ad	[X86] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off build. NFC	2021-11-15 13:10:47 -08:00
Philip Reames	8f95e915cd	[unroll-runtime] Relax two profitability limitations on multi-exit unrolling This change is mostly about getting rid of some "uninteresting" cases in a follow on deeper heuristic change. If anyone sees actually interesting code differences out of this, please let me know. I'm not expecting this to have much impact at all. Case 1 - With the single deoptimize non-latch exit, we can't have two exiting blocks sharing an exit block. We can only hit this with a poorly documented debug flag. Case 2 - Why should we treat epilog cases differently from prolog cases? Or to say it differently, why should starting with a constant control whether a multiple exit loop gets unrolled? Sorry for the lack of tests here. These are both exceedingly narrow cases in practice, and after a while trying, I couldn't come up with a test which did anything "useful" as opposed to simply exercise a random combination of force flags. Note that the legality cases for each are already exercised with force flags.	2021-11-15 13:00:14 -08:00
Snehasish Kumar	bee8e203c6	[InstrProf][NFC] Fix a few typos in code comments.	2021-11-15 12:55:25 -08:00
Nico Weber	b4e50e5228	[asm] Make EmitMSInlineAsmStr and EmitGCCInlineAsmStr more alike https://reviews.llvm.org/D71677 copied a bunch of code from EmitGCCInlineAsmStr() to EmitMSInlineAsmStr() but made a few small (likely unintentional) changes. This makes these pieces look the same. No behavior change. (Why are these functions two copies? No great reason as far as I can tell. https://reviews.llvm.org/rG1778831a3d1d24ab6545635f63da4d9c5f8f0ac7 did the split; we might want to undo them at some point. But PR23933 suggests that a bigger change is planned for this file in the future, so keeping this incremental for now.) Differential Revision: https://reviews.llvm.org/D113924	2021-11-15 15:43:01 -05:00
Nico Weber	0be836b7dd	[asm] Convert AsmPrinter::PrintSpecial() to StringRef No behavior change. Differential Revision: https://reviews.llvm.org/D113911	2021-11-15 15:38:27 -05:00
Nico Weber	833393e021	[asm] Correctly handle special names in variants There's really no reason why anyone should use these special names in a variant. I noticed this while reading the code: all other writes to OS are guarded by this conditional, and the behavior with the check seems more correct, so let's add the check. Differential Revision: https://reviews.llvm.org/D113909	2021-11-15 15:37:09 -05:00
Lei Huang	f50c6c1718	[PowerPC] Fix 32bit vector insert instructions for ISA3.1 The platform independent ISD::INSERT_VECTOR_ELT take a element index, but vins* instructions take a byte index. Update 32bit td patterns for vector insert to handle the element index accordingly. Since vector insert for non constant index are supported in ISA3.1, there is no need to use platform specific ISD node, PPCISD::VECINSERT. Update td pattern to directly use ISD::INSERT_VECTOR_ELT instead. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D113802	2021-11-15 14:36:39 -06:00
Philip Reames	423da61835	[runtime-unroll] Inline canSafelyUnrollMultiExitLoop [NFC] All of the interesting logic from this routine has been removed, inline the single check into the sole non-assert caller. The assert use has little value with the restructured code and is simply dropped.	2021-11-15 11:39:07 -08:00
Stanislav Mekhanoshin	e785f4ab6a	[PatternMatch] Add m_BinOp/m_c_BinOp with specific opcode Differential Revision: https://reviews.llvm.org/D113508	2021-11-15 11:24:27 -08:00
Philip Reames	e99902a872	[runtime-unroll] Restructure if-clause to improve readability [NFC]	2021-11-15 11:13:27 -08:00
Alexey Bataev	224e46d355	[SLP][DOT][NFCI]Output all scalars for the splats, not only the first one.	2021-11-15 10:54:26 -08:00
Simon Pilgrim	441de2536b	[X86] Add generic splitVectorOp helper. NFC Update splitVectorIntUnary/splitVectorIntBinary to use this internally, after some operand type sanity checks. Avoid code duplication and makes it easier to split other vector instruction forms in the future.	2021-11-15 17:59:23 +00:00
Craig Topper	f59307bfdc	[RISCV] Teach needVSETVLIPHI to handle mask register instructions. This handles the case where the mask register instruction input comes from a Phi of vsetvlis. If the VLMAX is the same as the VLMAX required by the mask register instruction, we can avoid a vsetvli. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D113204	2021-11-15 09:57:28 -08:00
Mehrnoosh Heidarpour	7daa95c8fa	[InstCombine] Fold (A^B)\|~A-->~(A&B) https://alive2.llvm.org/ce/z/2v6rhF Fixes: https://llvm.org/PR52478 Differential Revision: https://reviews.llvm.org/D113783	2021-11-15 12:29:37 -05:00
Simon Pilgrim	fc7c1cebbc	[X86] LowerFunnelShift - pull out repeated EltSizeInBits variable. NFC.	2021-11-15 17:11:44 +00:00
Sanjay Patel	3d01507c2d	[x86] fold vector (X > -1) & Y to shift+andn (2nd try) The first try at this patch ( `bf5748a1af` ) was reverted ( `5be64d4164` ) because it could crash. The cause of that problem was failing to account for the optional peek-through-bitcast in the enclosing function. This version of the patch adds a clause to avoid the fold in case of bitcasts because it is unlikely to be profitable in that scenario. A test case based on https://llvm.org/PR52504 was added to make sure we don't have that problem again. Original commit message: and (pcmpgt X, -1), Y --> pandn (vsrai X, BitWidth-1), Y This avoids the -1 constant vector in favor of an arithmetic shift instruction if it exists (the ISA is still not complete after all these years...). We catch this pattern late in combining by matching PCMPGT, so it should not interfere with more general folds. Differential Revision: https://reviews.llvm.org/D113603	2021-11-15 11:09:32 -05:00
Roman Lebedev	5c7255fe3a	[X86][Costmodel] `getReplicationShuffleCost()`: promote 8 bit-wide elements to 32 bit when no AVX512VBMI Currently `X86TTIImpl::getInterleavedMemoryOpCostAVX512()` asks about i8 elt type, so this change does affect vectorization. In the end, it will ask about i1. We should also try to promote to i16 if we have AVX512BW, i'll do that in a follow-up. All costs here look good, i've added the missing truncation costs in preparatory patches. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113853	2021-11-15 19:04:02 +03:00
Roman Lebedev	a468c39c90	[X86][Costmodel] `trunc v32i16 to v64i8` can appear after legalization, cost is same as for `trunc v32i16 to v32i8` Some of the costs get larger here, but i suppose that makes sense since we'd previously query scalarization costs that may not be really representative of the reality. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113852	2021-11-15 19:04:02 +03:00
Roman Lebedev	9e57d9b09d	[X86][Costmodel] `trunc v8i64 to v16i8/v32i8/v64i8` can appear after legalization, cost is same as for `trunc v8i64 to v8i8` While this one is trivial and identical to the previous patch, there is a weird cost change in a follow-up patch that i'm not sure about. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113851	2021-11-15 19:04:02 +03:00
Roman Lebedev	0116c708c6	[X86][Costmodel] `trunc v16i32 to v32i8/v64i8` can appear after legalization, cost is same as for `trunc v16i32 to v16i8` While this one is trivial and identical to the previous patch, there is a weird cost change in a follow-up patch that i'm not sure about. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113850	2021-11-15 19:04:02 +03:00
Simon Pilgrim	ea9e6aa423	[X86] getAVX512Node() - find constant broadcasts to encourage load-folding If an operand is a bitcasted or widended constant, try to more aggressively create broadcastable constants for folding, which in particular helps non-VLX modes. I've refactored getAVX512Node so that VLX targets can make better use of this as well. NOTE: In the future, I think we should consider removing the broadcast of constant data from DAG entirely and move this to either X86InstrInfo::foldMemoryOperand or a new pass - AVX1/2 targets has similar problems with missed (whole vector) folds that need to be improved as well. Differential Revision: https://reviews.llvm.org/D113845	2021-11-15 15:52:03 +00:00
Alexey Bataev	036207d5f2	[SLP]Improve splat detection. A bunch of scalars can be treated as a splat not only if all elements are the same but also if some of them are undefvalues. Differential Revision: https://reviews.llvm.org/D113774	2021-11-15 07:50:34 -08:00
Alexey Bataev	b85152f8b1	[SLP][NFC]Use `isa_and_nonnull` and fix comment, NFC.	2021-11-15 06:49:33 -08:00
ksyx	72b5138d37	Revert "[GVN][NFC] Remove redundant check" This reverts commit `c35e8185d8`. mstorsjo reported in the revision thread that one VNCoercion assertion is violated and seemly in relate to this commit. As per "If a test case that demonstrates a problem is reported in the commit thread, please revert and investigate offline", this commit is reverted.	2021-11-15 09:14:13 -05:00
Alexey Bataev	6fb5bed7d1	[SLP]Do not create unused gather nodes for scalar arguments of vector intrinsics. If the vector intrinsic has scalar argument, we currently still create a tree entry for this argument. This entry is not used, just consumes resources and increases the cost of the tree. Differential Revision: https://reviews.llvm.org/D113806	2021-11-15 06:11:19 -08:00
Florian Hahn	112c1c346a	[IVDescriptor] Make sure the sign is included for negative extension. At the moment, computeRecurrenceType does not include any sign bits in the maximum bit width. If the value can be negative, this means the sign bit will be missing and the sext won't properly extend the value. If the value can be negative, increment the bitwidth by one to make sure there is at least one sign bit in the result value. Note that the increment is also needed if the value is known to be negative, as a sign bit needs to be preserved for the sext to work. Note that this at the moment prevents vectorization, because the analysis computes i1 as type for the recurrence when looking through the AND in lookThroughAnd. Fixes PR51794, PR52485. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D113056	2021-11-15 13:12:57 +00:00
Hans Wennborg	5be64d4164	Revert "[x86] fold vector (X > -1) & Y to shift+andn" This casued assertion failures: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:9446: void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode , llvm::SDNode ): Assertion `(!From->hasAnyUseOfValue(i) \|\| From->getValueType(i) == To->getValueType(i)) && "Cannot use this version of ReplaceAllUsesWith!"' failed. See comment on the code review. (Had to update some expectations in test/CodeGen/X86/vselect-zero.ll manually due to other changes having landed after the reverted one.) > and (pcmpgt X, -1), Y --> pandn (vsrai X, BitWidth-1), Y > > This avoids the -1 constant vector in favor of an arithmetic shift > instruction if it exists (the ISA is still not complete after all > these years...). > > We catch this pattern late in combining by matching PCMPGT, so it > should not interfere with more general folds. > > Differential Revision: https://reviews.llvm.org/D113603 This reverts commit `bf5748a1af`.	2021-11-15 12:35:49 +01:00
Simon Pilgrim	7bac1985f4	[DAG] SimplifyVBinOp - add SDLoc() argument Pass in SDLoc instead of (repeated) local creations in SimplifyVBinOp and scalarizeBinOpOfSplats	2021-11-15 10:43:56 +00:00
Simon Pilgrim	8658d20724	[DAG] SimplifyVBinOp - pull out repeated getValueType() call. NFC.	2021-11-15 10:43:55 +00:00
Jay Foad	4119da2f7c	[MachineVerifier] Live interval for a subreg must have subranges MachineVerifier verified the subranges of a live interval if they existed, but did not complain if they did not exist. This patch changes the verifier to complain if there are no subranges in the live interval for a subreg operand (so long as MachineRegisterInfo says we should be tracking subreg liveness for that register). This matches the conditions for LiveIntervalCalc to create subranges in the first place. Differential Revision: https://reviews.llvm.org/D112556	2021-11-15 10:13:35 +00:00
Dmitry Preobrazhensky	91f4650ebb	[AMDGPU][MC][GFX10] Corrected global_atomic_fcmpswap* Corrected src data size of global_atomic_fcmpswap and global_atomic_fcmpswap_x2 opcodes. Differential Revision: https://reviews.llvm.org/D113746	2021-11-15 12:51:12 +03:00
David Green	4c3bfdc7f1	[ARM] Fix GatherScatter AddLikeOr condition	2021-11-15 09:44:41 +00:00
Peter Waller	599ea3e73f	[AArch64][SVE] Break false dependencies for inactive lanes of FP unary operations Follow up to D105889, covering instructions using sve_fp_2op_p_zd_HSD: frintn, frintp, frintm, frintz, frinta, frintx, frinti, frecpx and fsqrt. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D113485	2021-11-15 09:15:21 +00:00
Simon Moll	7cf887b950	[VE] Fix SDNode user loop after `efa896e5f7` Rewriting SDNode user loops broke VEISelLowering (commit `efa896e5f7`). This fixes it.	2021-11-15 09:53:09 +01:00
Sander de Smalen	f835fe8ef7	[LV] Rename blockNeedsPredication to blockNeedsPredicationForAnyReason. The interface is a convenience function to ask if a block requires predication when widening, but it's important that there are two separate concepts to consider: (A) The block was predicated in the original loop. (B) The block was unpredicated in the original loop, but requires predication because of tail folding. In the case of (B) we know that at least one lane of the vector will be executed, which means we can implementing a load from a uniform address with a scalar load + splat (D112552). In the case of predication because of (A), we cannot do this, because the scalar load itself requires predication. The name 'blockNeedsPredication' does not make the distinction between (A) and (B), hence the reason to rename it. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D113392	2021-11-15 08:04:20 +00:00
Kyungwoo Lee	6747d44bda	[DebugInfo] Fix end_sequence of debug_line in LTO Object In a LTO build, the `end_sequence` in debug_line table for each compile unit (CU) points the end of text section which merged all CUs. The `end_sequence` needs to point to the end of each CU's range. This bug often causes invalid `debug_line` table in the final `.dSYM` binary for MachO after running `dsymutil` which tries to compensate an out-of-range address of `end_sequence`. The fix is to sync the line table termination with the range operations that are already maintained in DwarfDebug. When CU or section changes, or nodebug functions appear or module is finished, the prior pending line table is terminated using the last range label. In the MC path where no range is tracked, the old logic is conservatively used to end the line table using the section end symbol. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D108261	2021-11-14 20:19:47 -08:00
Kazu Hirata	feb40a3a47	[llvm] Use range-based for loops with instructions (NFC)	2021-11-14 19:40:48 -08:00
Kazu Hirata	d243cbf8ea	[llvm] Use isa instead of dyn_cast (NFC)	2021-11-14 19:40:46 -08:00
Kazu Hirata	a84a401f7e	[AMDGPU] Remove selectStoreIntrinsic (NFC) The last use was removed on Jan 13, 2020 in commit `533d650e94`.	2021-11-14 19:40:44 -08:00
Mircea Trofin	a32c2c3808	[NFC] Use Optional<ProfileCount> to model invalid counts ProfileCount could model invalid values, but a user had no indication that the getCount method could return bogus data. Optional<ProfileCount> addresses that, because the user must dereference the optional. In addition, the patch removes concept duplication. Differential Revision: https://reviews.llvm.org/D113839	2021-11-14 19:03:30 -08:00
Chen Zheng	eec9ca622c	[PowerPC] guard update form prepare with non-const increment with option Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D113471	2021-11-15 02:16:46 +00:00
Lang Hames	69be352a19	Reapply "[ORC] Initial MachO debugging support (via GDB JIT debug.." with fixes. This reapplies `e1933a0488` (which was reverted in `f55ba3525e` due to bot failures, e.g. https://lab.llvm.org/buildbot/#/builders/117/builds/2768). The bot failures were due to a missing symbol error: We use the input object's mangling to decide how to mangle the debug-info registration function name. This caused lookup of the registration function to fail when the input object mangling didn't match the host mangling. Disbaling the test on non-Darwin platforms is the easiest short-term solution. I have filed https://llvm.org/PR52503 with a proposed longer term solution.	2021-11-14 14:44:07 -08:00
Simon Pilgrim	c3a772fdf5	[X86] Add getPack helper This helper provides a more complete approach for lowering to X86ISD::PACKSS/PACKUS nodes - testing for existing suitable sign/zero extension before recreating it. It also optionally packs the upper half instead of the lower half.	2021-11-14 21:27:15 +00:00
Koakuma	3e0f3041cc	[SPARC] Zero-extend the operands when doing UMULO on 64-bit integers On SPARC, S/UMULO operation on 64-bit integers works by extending the value to 128-bit, then doing a multiplication and checking the upper half of the result. This makes UMULO works correctly by putting a zero in the upper half rather than doing a sign extension. Reviewed By: LemonBoy Differential Revision: https://reviews.llvm.org/D110555	2021-11-14 19:59:52 +01:00
Kazu Hirata	7379736774	[llvm] Use range-based for loops with User::operands (NFC)	2021-11-14 09:32:38 -08:00
Kazu Hirata	098e935174	[llvm] Use range-based for loops with CallBase::args (NFC)	2021-11-14 09:32:36 -08:00
Roman Lebedev	4dd2f0446c	[X86][Costmodel] `getReplicationShuffleCost()`: promote 16 bit-wide elements to 32 bit when no AVX512BW The basic idea is simple, if we don't have native shuffle for this element type, then we must have native shuffle for wider element type, so promote, replicate, demote. I believe, asking `getCastInstrCost(Instruction::Trunc` is correct semantically, case in point `trunc <32 x i32> to <32 x i8>` aka 2 * ZMM will naively result in 2 * XMM, that then will be packed into 1 * YMM, and it should count the cost of said packing, not just the truncations. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113609	2021-11-14 20:01:38 +03:00
Roman Lebedev	e876698a5d	[NFC][TTI] `getReplicationShuffleCost()`: s/Replicated/Dst/ 'Replicated' is mouthful and somewhat ambigious, while 'destination' is pretty self-explanatory.	2021-11-14 20:01:38 +03:00
Roman Lebedev	b283961012	[X86][Costmodel] `trunc v8i64 to v16i16/v32i16` can appear after legalization, cost is same as for `trunc v8i64 to v8i16` Same as D113842, but for i64 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113843	2021-11-14 18:41:38 +03:00
Roman Lebedev	a5f2fdca99	[X86][Costmodel] `trunc v16i32 to v32i16` can appear after legalization, cost is same as for `trunc v16i32 to v16i16` This was noticed in D113609, hopefully it unblocks that patch. There are likely other similar problems. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113842	2021-11-14 18:41:37 +03:00
Mircea Trofin	0662a3612c	[NFC][InlineFunction] Renamed some vars to conform to coding style	2021-11-14 07:26:44 -08:00
Sanjay Patel	254c5246e9	[DAGCombiner] match inverted/swapped patterns for vselect of mask of signbit This was noted as a follow-up to D113212 / D113426: `4fc1fc4005` `7e30404c3b` `11522cfcad` https://alive2.llvm.org/ce/z/e4o96b The canonicalization rules for these IR patterns are complicated, and we were not matching the expected forms in 2 out of the 3 cases. We can make codegen more robust by matching the swapped forms (and that will also work if these patterns are created late).	2021-11-14 09:35:26 -05:00
Simon Pilgrim	f4143ffed7	[X86] Widen 128/256-bit VPTERNLOG patterns to 512-bit on non-VLX targets Similar to what we've done for other ops, this patch widens VPTERNLOG to a 512-bit op for non-VLX targets. Fixes regressions in D113192 Differential Revision: https://reviews.llvm.org/D113827	2021-11-14 13:40:53 +00:00
David Green	355ee18c5d	[TypePromotion] Extend TypePromotion::isSafeWrap This modifies the preconditions of TypePromotion's isSafeWrap method, to allow it to work from all constants from the ICmp. Using the code: %a = add %x, C1 %c = icmp ult %a, C2 According to Alive, we can prove that is equivalent to icmp ult (add zext(%x), sext(C1)), zext(C2) given C1 <=s 0 and C1 >s C2. https://alive2.llvm.org/ce/z/CECYZB Which is similar to what is already present. We can also prove icmp ult (add zext(%x), sext(C1)), sext(C2) given C1 <=s 0 and C1 <=s C2. https://alive2.llvm.org/ce/z/KKgyeL The PrepareWrappingAdds method was removed, and the constants are now altered to sext or zext directly as required by the above methods. Differential Revision: https://reviews.llvm.org/D113678	2021-11-14 11:18:31 +00:00
Kristina Bessonova	5b4bfd8c24	[DwarfCompileUnit] getOrCreateCommonBlock(): check for existing entity first. NFCI For global variables and common blocks there is no way to create entities through getOrCreateContextDIE(), so no need to obtain the context first. Differential Revision: https://reviews.llvm.org/D113651	2021-11-14 10:58:24 +02:00
Kristina Bessonova	90c5ab54a9	[DwarfCompileUnit] getOrCreateGlobalVariableDIE(): remove outdated comment. NFC	2021-11-14 10:56:54 +02:00
Lang Hames	f55ba3525e	Revert "[ORC] Initial MachO debugging support (via GDB JIT debug..." This reverts commit `e1933a0488` until I can look into bot failures.	2021-11-14 00:14:39 -08:00
Kazu Hirata	7505b7045f	[llvm] Use GetElementPtrInst::indices (NFC)	2021-11-13 21:43:28 -08:00
Lang Hames	e1933a0488	[ORC] Initial MachO debugging support (via GDB JIT debug registration interface) This commit adds a new plugin, GDBJITDebugInfoRegistrationPlugin, that checks for objects containing debug info and registers any debug info found via the GDB JIT registration API. To enable this registration without redundantly representing non-debug sections this plugin synthesizes a new embedded object within a section of the LinkGraph. An allocation action is used to make the registration call. Currently MachO only. ELF users can still use the DebugObjectManagerPlugin. The two are likely to be merged in the near future.	2021-11-13 13:21:01 -08:00
ksyx	c35e8185d8	[GVN][NFC] Remove redundant check The if-check above deleted part guarantees that StoreOffset <= LoadOffset and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0, while it could be meaningless to have a type with nonpositive size, so that the check could be removed. Part of revision D100179 Reviewed By: nikic	2021-11-13 15:59:43 -05:00
Keith Smiley	86e2af8043	reland: [VFS] Use original path when falling back to external FS This reverts commit `f0cf544d6f`. Just a small change to fix: ``` /home/buildbot/as-builder-4/llvm-clang-x86_64-expensive-checks-ubuntu/llvm-project/llvm/lib/Support/VirtualFileSystem.cpp: In static member function ‘static llvm::ErrorOr<std::unique_ptr<llvm::vfs::File> > llvm::vfs::File::getWithPath(llvm::ErrorOr<std::unique_ptr<llvm::vfs::File> >, const llvm::Twine&)’: /home/buildbot/as-builder-4/llvm-clang-x86_64-expensive-checks-ubuntu/llvm-project/llvm/lib/Support/VirtualFileSystem.cpp:2084:10: error: could not convert ‘F’ from ‘std::unique_ptr<llvm::vfs::File>’ to ‘llvm::ErrorOr<std::unique_ptr<llvm::vfs::File> >’ return F; ^ ``` Differential Revision: https://reviews.llvm.org/D113832	2021-11-13 12:14:34 -08:00
Keith Smiley	f0cf544d6f	Revert "[VFS] Use original path when falling back to external FS" ``` /work/omp-vega20-0/openmp-offload-amdgpu-runtime/llvm.src/llvm/lib/Support/VirtualFileSystem.cpp: In static member function 'static llvm::ErrorOr<std::unique_ptr<llvm::vfs::File> > llvm::vfs::File::getWithPath(llvm::ErrorOr<std::unique_ptr<llvm::vfs::File> >, const llvm::Twine&)': /work/omp-vega20-0/openmp-offload-amdgpu-runtime/llvm.src/llvm/lib/Support/VirtualFileSystem.cpp:2084:10: error: could not convert 'F' from 'std::unique_ptr<llvm::vfs::File>' to 'llvm::ErrorOr<std::unique_ptr<llvm::vfs::File> >' return F; ^ ``` This reverts commit `c972175649`.	2021-11-13 10:11:51 -08:00
Keith Smiley	c972175649	[VFS] Use original path when falling back to external FS This is a follow up to `0be9ca7c0f` to make paths in the case of falling back to the external file system use the original format, preserving relative paths, and allow the external filesystem to canonicalize them if needed. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D109128	2021-11-13 09:34:44 -08:00
Kazu Hirata	609ccbb240	[PowerPC] Use SDNode::uses (NFC)	2021-11-13 08:34:22 -08:00
Simon Pilgrim	a310cbae02	[X86] Add getAVX512Node helper. NFC. For AVX512 targets without VLX, we have to widen 128/256-bit vectors to 512-bits to use some specific AVX512 instructions (or some other instructions with predicates etc.). I've pulled out the widening code from LowerFunnelShift into the helper function, so we can convert some other widening patterns in the future.	2021-11-13 13:59:42 +00:00
Florian Hahn	8ed8d37088	[SCEV] Update SCEVLoopGuardRewriter to hold reference to map. (NFC) SCEVLoopGuardRewriter doesn't need to copy the rewrite map. It can just hold a const reference instead, to avoid an unnecessary copy.	2021-11-13 09:39:14 +00:00
Craig Topper	82bc6a094e	[X86] Promote f16 STRICT_FROUND to f32 and call libc. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D113817	2021-11-12 21:37:03 -08:00
Lang Hames	2272ec1c63	[JITLink][MachO] Fix "find-symbol-by-address" logic. Only search within the requested section, and allow one-past-then-end addresses. This is needed to support section-end-address references to sections with no symbols in them.	2021-11-12 21:28:32 -08:00
Kazu Hirata	efa896e5f7	[Target] Use SDNode::uses (NFC)	2021-11-12 21:23:04 -08:00
Duncan P. N. Exon Smith	79c5479822	Support: Pass wrapped Error's error code through FileError Change FileError to pass through the error code from the Error it wraps. This allows APIs that return ECError to transition to FileError without changing returned std::error_code. This was extracted from https://reviews.llvm.org/D109345. Differential Revision: https://reviews.llvm.org/D113225	2021-11-12 21:19:09 -08:00
Phoebe Wang	e49fcfc7cd	[X86][ABI] Change the alignment of f80 in 32-bit calling convention to meet with different data layout Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113739	2021-11-13 10:00:34 +08:00
Ben Langmuir	2a739f2789	[ORC][ORC-RT] Register type metadata from __swift5_types MachO section Similar to how the other swift sections are registered by the ORC runtime's macho platform, add the __swift5_types section, which contains type metadata. Add a simple test that demonstrates that the swift runtime recognized the registered types. rdar://85358530 Differential Revision: https://reviews.llvm.org/D113811	2021-11-12 16:39:59 -08:00
Philip Reames	37ead201e6	[runtime-unroll] Use incrementing IVs instead of decrementing ones This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing. Why does this matter? A couple of reasons: * SCEV doesn't have a native subtract node. Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such. As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones. (You can see this in the inferred flags in some of the test cases.) * Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language. We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced. (You can see this looking at nearby phis in the test cases.) Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen. * Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value. We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.	2021-11-12 15:44:58 -08:00
Craig Topper	02bed66cd5	[RISCV] Improve codegen for i32 udiv/urem by constant on RV64. The division by constant optimization often produces constants that are uimm32, but not simm32. These constants require 3 or 4 instructions to materialize without Zba. Since these instructions are often used by a multiply with a LHS that needs to be zero extended with an AND, we can switch the MUL to a MULHU by shifting both inputs left by 32. Once we shift the constant left, the upper 32 bits no longer need to be 0 so constant materialization is free to use LUI+ADDIW. This reduces the constant materialization from 4 instructions to 3 in some cases while also reducing the zero extend of the LHS from 2 shifts to 1. Differential Revision: https://reviews.llvm.org/D113805	2021-11-12 14:49:10 -08:00
Philip Reames	de2fed6152	[unroll] Keep unrolled iterations with initial iteration The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is that the new iterations are placed far from the initial iteration. With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting. As such, placing Count-1 copies out of line is a fairly poor code placement choice. We'd much rather fall through into the hot (non-exiting) path. For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code. However, the real motivation for this change isn't performance. Its readability and human understanding. Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.	2021-11-12 11:40:50 -08:00
Lang Hames	9d5e647428	[JITLink] Fix think-o in handwritten CWrapperFunctionResult -> Error converter. We need to skip the length field when generating error strings. No test case: This hand-hacked deserializer should be removed in the near future once JITLink can use generic ORC APIs (including SPS and WrapperFunction).	2021-11-12 10:36:17 -08:00
Florian Hahn	03cfea68c6	[SCEV] Update SCEVLoopGuardRewriter to take SCEV -> SCEV map (NFC). Split off refactoring from D113577 to reduce the diff. NFC as the new interface will only be used in D113577.	2021-11-12 18:16:03 +00:00
Simon Pilgrim	6bb71738e2	[X86] convertShiftLeftToScale - improve vXi8 constant handling Add support for v32i8/v64i8 converting shift-by-constant to multiply-by-constant. This helps us avoid the generic vXi8 shift lowering, and a lot of VPBLENDVB ops which can be particularly slow. We also needed to reorder a few shift lowering patterns to prevent regressions, particularly for XOP+AVX2 (Excavator) targets (which can split to fast v16i8 shifts) and AVX512-BWI targets (which prefers to extend to fast v32i16 shifts).	2021-11-12 16:48:10 +00:00
Joel E. Denny	c9dfe322ee	[OpenMP] Fix main thread barrier for Pascal and amdgpu Fixes what's left of https://bugs.llvm.org/show_bug.cgi?id=51781. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113602	2021-11-12 11:18:45 -05:00
Jay Foad	a70bbb5f7a	[AMDGPU] Simplify 64-bit division/remainder expansion The old expansion open-coded a 64-bit addition in a strange way, by adding the high parts without carry-in from the low part, and then adding the carry back in later on. Fixing this saves a couple of instructions and makes the code much easier to understand. Differential Revision: https://reviews.llvm.org/D113679	2021-11-12 15:48:41 +00:00
Kazu Hirata	99d5cbbd7e	[CodeGen] Use SDNode::uses (NFC)	2021-11-12 07:33:29 -08:00
Kerry McLaughlin	7647822156	[AArch64][SVE] Remove i1 type from isElementTypeLegalForScalableVector `collectElementTypesForWidening` collects the types of load, store and reduction Phis in a loop. These types are later checked using `isElementTypeLegalForScalableVector` to prevent vectorisation of loops with instruction types that are unsupported. This patch removes i1 from the list of types supported for scalable vectors. This fixes an assert ("Cannot yet scalarize uniform stores") in `setCostBasedWideningDecision` when we have a loop containing a uniform i1 store and a scalable VF, which we cannot create a scatter for. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D113680	2021-11-12 14:24:38 +00:00
Alexey Bataev	352c46e707	[SLP]Improve vectorization of split loads. Need to fix ther cost estimation for split loads, since we look at the subregs already, no need to permute them, need just to estimate subregister insert, if it is smaller than the real register. Also, using split loads, it might be profitable already to vectorize smaller trees with gathering of the loads. Differential Revision: https://reviews.llvm.org/D107188	2021-11-12 06:13:22 -08:00
Simon Pilgrim	59087dce3b	[X86] combineX86ShufflesConstants - constant fold from target shuffles unless optsize = true Currently we only constant fold target shuffles if any of the sources has one use, or it would remove a variable shuffle mask - the aim being to avoid constant pool bloat. This patch proposes we should constant fold by default and only limit this if optsize is enabled - I've added a basic test for this in vector-mul.ll (the pmuludq case is by far the most common), I can add other specific test cases if people need them. This should permit further constant folding, break some instruction dependencies and help reduce shuffle port pressure. Differential Revision: https://reviews.llvm.org/D113748	2021-11-12 14:02:43 +00:00
Sanjay Patel	bf5748a1af	[x86] fold vector (X > -1) & Y to shift+andn and (pcmpgt X, -1), Y --> pandn (vsrai X, BitWidth-1), Y This avoids the -1 constant vector in favor of an arithmetic shift instruction if it exists (the ISA is still not complete after all these years...). We catch this pattern late in combining by matching PCMPGT, so it should not interfere with more general folds. Differential Revision: https://reviews.llvm.org/D113603	2021-11-12 08:17:46 -05:00
Florian Hahn	819bca9b90	[SCEV] Use APIntOps::umin to select best max BC count (NFC). Suggested in D102267, but I missed this in the committed version.	2021-11-12 12:20:01 +00:00
Neubauer, Sebastian	d1f45ed58f	[AMDGPU][NFC] Fix typos Differential Revision: https://reviews.llvm.org/D113672	2021-11-12 11:37:21 +01:00
Simon Moll	751aa6c280	[VE][NFCi] Remove unused tablegen parameters TableGen has started warning about unused template parameters in the isel patterns. Remove those. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D113675	2021-11-12 08:19:50 +01:00
Markus Lavin	4e94e25c90	Fix minor deficiency in machine-sink. Register uses that are MRI->isConstantPhysReg() should not inhibit sinking transformation. Reviewed By: StephenTozer Differential Revision: https://reviews.llvm.org/D111531	2021-11-12 08:01:13 +01:00
Kazu Hirata	2ca45adf24	[CodeGen, Target] Use MachineRegisterInfo::use_operands (NFC)	2021-11-11 22:28:55 -08:00
Serge Pavlov	3057e850b8	[X86] Preserve FPSW when popping x87 stack When compiler converts x87 operations to stack model, it may insert instructions that pop top stack element. To do it the compiler inserts instruction FSTP right after the instruction that calculates value on the stack. It can break the code that uses FPSW set by the last instruction. For example, an instruction FXAM is usually followed by FNSTSW, but FSTP is inserted after FXAM. As FSTP leaves condition code in FPSW undefined, the compiler produces incorrect code. With this change FSTP in inserted after the FPSW consumer if the last instruction sets FPSW. Differential Revision: https://reviews.llvm.org/D113335	2021-11-12 12:00:09 +07:00
Luís Ferreira	665b4138d9	[DebugInfo] run clang-format on some unformatted files This trivial patch runs clang-format on some unformatted files before doing logic changes and prevent hard to review diffs. Differential Revision: https://reviews.llvm.org/D113572	2021-11-11 18:59:41 -08:00
Phoebe Wang	74b979abcd	[X86][FP16] Avoid to generate VZEXT_MOVL with i16 This fixes the crash due to lacking VZEXT_MOVL support with i16. Reviewed By: LuoYuanke, RKSimon Differential Revision: https://reviews.llvm.org/D113661	2021-11-12 09:32:29 +08:00
Nikita Popov	986416251b	[InstCombine] Drop redundant fold for and/or of icmp eq/ne (NFCI) This handles a special case of foldAndOrOfICmpsUsingRanges() with two equality predicates.	2021-11-11 20:25:40 +01:00
Min-Yih Hsu	99152a4164	[M68k][NFC] Rename 'GlSel' -> 'GISel' AArch64 as well as other targets use the abbrev "GISel" so we'd better to be consistent with them. NFC.	2021-11-11 11:01:09 -08:00
Simon Pilgrim	94a901a50a	[X86] Move LowerFunnelShift below LowerShift. NFC. Makes it easier to reuse the various vector shift helpers defined above LowerShift	2021-11-11 18:45:51 +00:00
Simon Pilgrim	010b09b0c5	[DAG] reassociateOpsCommutative - test getNode result directly. NFC Matches the clean code style we use directly above	2021-11-11 18:45:50 +00:00
Mircea Trofin	f64eee1625	[NFC][InlineAdvisor] Inform advisor when the module is invalidated This avoids unnecessary re-calculation of module-wide features in the MLInlineAdvisor. In cases where function passes don't invalidate functions (and, thus, don't invalidate the module), but we re-process a CGSCC, we currently refreshed module features unnecessarily. The overhead of fetching cached results (albeit they weren't themselves invalidated) was noticeable in certain modules' compilations. We don't want to just invalidate the advisor object, though, via the analysis manager, because we'd then need to re-create expensive state (like the model evaluator in the ML 'development' mode). Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D113644	2021-11-11 10:23:49 -08:00

1 2 3 4 5 ...

152643 Commits