llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	c407769f5e	[InstCombine] (~(a \| b) & c) \| ~(c \| (a ^ b)) -> ~((a \| b) & (c \| (b ^ a))) Transform ``` (~(a \| b) & c) \| ~(c \| (a ^ b)) -> ~((a \| b) & (c \| (b ^ a))) ``` And swapped case: ``` (a \| ~(b & c)) & ~(a & (b ^ c)) --> ~(a \| b) \| (a ^ b ^ c) ``` ``` ---------------------------------------- define i3 @src(i3 %a, i3 %b, i3 %c) { %0: %or1 = or i3 %b, %c %not1 = xor i3 %or1, 7 %and1 = and i3 %a, %not1 %xor1 = xor i3 %b, %c %or2 = or i3 %xor1, %a %not2 = xor i3 %or2, 7 %or3 = or i3 %and1, %not2 ret i3 %or3 } => define i3 @tgt(i3 %a, i3 %b, i3 %c) { %0: %obc = or i3 %b, %c %xbc = xor i3 %b, %c %o = or i3 %a, %xbc %and = and i3 %obc, %o %r = xor i3 %and, 7 ret i3 %r } Transformation seems to be correct! ``` ``` ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %and1 = and i4 %b, %c %not1 = xor i4 %and1, 15 %or1 = or i4 %not1, %a %xor1 = xor i4 %b, %c %and2 = and i4 %xor1, %a %not2 = xor i4 %and2, 15 %and3 = and i4 %or1, %not2 ret i4 %and3 } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %xor1 = xor i4 %b, %c %xor2 = xor i4 %xor1, %a %or1 = or i4 %a, %b %not1 = xor i4 %or1, 15 %or2 = or i4 %xor2, %not1 ret i4 %or2 } Transformation seems to be correct! ``` Differential Revision: https://reviews.llvm.org/D112955	2021-11-22 10:49:21 -08:00
Diego Caballero	4348cd42c3	[LV] Drop integer poison-generating flags from instructions that need predication This patch fixes PR52111. The problem is that LV propagates poison-generating flags (`nuw`/`nsw`, `exact` and `inbounds`) in instructions that contribute to the address computation of widen loads/stores that are guarded by a condition. It may happen that when the code is vectorized and the control flow within the loop is linearized, these flags may lead to generating a poison value that is effectively used as the base address of the widen load/store. The fix drops all the integer poison-generating flags from instructions that contribute to the address computation of a widen load/store whose original instruction was in a basic block that needed predication and is not predicated after vectorization. Reviewed By: fhahn, spatel, nlopes Differential Revision: https://reviews.llvm.org/D111846	2021-11-22 10:57:29 +00:00
Roland McGrath	b72b56016a	NFC: clang-format lib/Transforms/Instrumentation/InstrProfiling.cpp Differential Revision: https://reviews.llvm.org/D114343	2021-11-21 18:16:02 -08:00
Nikita Popov	aeba28bc62	[DSE] Drop hasAnalyzableMemoryWrite() (NFCI) The functionality of hasAnalyzableMemoryWrite() is effectively subsumed by getLocForWriteEx(), which will return None if the instruction is not analyzable. The implementations don't match exactly (e.g. getLocForWriteEx() does not limit non-calls to stores), but in conjunction with the isRemovable() check, it ends up being the same.	2021-11-20 23:20:12 +01:00
Florian Hahn	cf8efbd30e	[VPlan] Wrap vector loop blocks in region. A first step towards modeling preheader and exit blocks in VPlan as well. Keeping the vector loop in a region allows for changing the VF as we traverse region boundaries. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D113182	2021-11-20 17:59:48 +00:00
Sanjay Patel	337948ac6e	[InstCombine] add folds for binop with sexted bool and constant operands This is a generalization/extension of the existing and/or folds noted with TODO comments. Those have a one-use constraint that is not necessary. Potential follow-ups are noted by the TODO comments in the new function. We can also call this function from other binop visit* functions, but we need to add tests first. This solves: https://llvm.org/PR52543 https://alive2.llvm.org/ce/z/NWuCR5	2021-11-20 12:33:00 -05:00
Kazu Hirata	d1abf481da	[llvm] Use range-based for loops (NFC)	2021-11-19 21:12:13 -08:00
ksyx	97b9e8438e	[GVN][NFC] Remove redundant check The if-check above deleted part guarantees that StoreOffset <= LoadOffset and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0, while it could be meaningless to have a type with nonpositive size, so that the check could be removed. The values are converted to signed types to avoid unsigned operation with negative offsets. Part of revision D100179 Reapply commit `c35e8185d8` with fixing problem reported by mstorsjo	2021-11-19 20:24:36 -05:00
Ellis Hoag	de11de308b	[InstrProf] Use i32 for GEP index from lowering llvm.instrprof.increment The `llvm.instrprof.increment` intrinsic uses `i32` for the index. We should use this same type for the index into the GEP instructions. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D114268	2021-11-19 15:45:14 -08:00
Fabian Wolff	7eec832def	[DSE] Improve handling of `strncpy` in Dead Store Elimination Fixes PR#52062 and one of the remaining cases of PR#47644. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D114035	2021-11-19 17:46:29 +00:00
Florian Hahn	76effb001d	[LV] Remove obsolete comment about creating a dummy block (NFC) No dummy pre-entry block is created since `a6c4969f5f`. The comment is stale now and can be removed. Mentioned by @Ayal in D113182.	2021-11-19 17:17:04 +00:00
Alexey Bataev	d1fdf867b1	[SLP][NFC]Introduce TreeEntry::getVectorFactor member function, NFC. Added TreeEntry::getVectorFactor to get the final vectotization factor to simplify the code. Differential Revision: https://reviews.llvm.org/D114190	2021-11-19 06:32:19 -08:00
Senran Zhang	0425ea4621	[NFC][OpaquePtr][Evaluator] Remove call to PointerType::getElementType There are still another 2 uses of PointerType::getElementType in Evaluator when evaluating BitCast's on pointers. BitCast's on pointers should be removed when opaque ptr is ready, so I just keep them as is. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D114131	2021-11-19 10:32:55 +08:00
Nikita Popov	46c26991ae	[DSE] Remove getLocForWrite() (NFCI) This implements nearly the same logic as getLocForWriteEx(), and is only used in one place. In that context, we should also know that getLocForWriteEx() returns a non-None result. As such, consolidate everything to use one function.	2021-11-18 21:19:18 +01:00
Nikita Popov	f1295563f1	[DSE] Move removePartiallyOverlappedStores() into DSEState (NFC) So it can use getLocForWriteEx().	2021-11-18 21:19:18 +01:00
Arnold Schwaighofer	7d11c5dac2	Coro: Remove coro_end and coro_suspend_retcon in private unprocessed functions We might emit functions that are private and never called. The coro split pass only processes functions that might be called. Remove intrinsics that we can't generate code for. rdar://84619859 Differential Revision: https://reviews.llvm.org/D114021	2021-11-18 07:48:24 -08:00
Stanislav Mekhanoshin	6d3db28088	[InstCombine] Generalize complex OR patterns to AND For every pattern with only NOT, OR, and AND operations there is always a symmetrical attern with AND and OR swapped. This adds 2 transformations: https://reviews.llvm.org/D113526 ``` (~(a & b) \| c) & (~(a & c) \| b) --> ~((b ^ c) & a) (~(a & b) \| c) & ~(a & c) --> ~((b \| c) & a) ``` ``` ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %and1 = and i4 %b, %a %not1 = xor i4 %and1, 15 %and2 = and i4 %a, %c %not2 = xor i4 %and2, 15 %or = or i4 %not2, %b %r = and i4 %or, %not1 ret i4 %r } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %or = or i4 %b, %c %and = and i4 %or, %a %r = xor i4 %and, 15 ret i4 %r } Transformation seems to be correct! ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %and1 = and i4 %a, %b %not1 = xor i4 %and1, 15 %or1 = or i4 %not1, %c %and2 = and i4 %a, %c %not2 = xor i4 %and2, 15 %or2 = or i4 %not2, %b %and3 = and i4 %or1, %or2 ret i4 %and3 } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %xor = xor i4 %b, %c %and = and i4 %xor, %a %not = xor i4 %and, 15 ret i4 %not } Transformation seems to be correct! ``` Differential Revision: https://reviews.llvm.org/D113526	2021-11-17 10:47:36 -08:00
Arthur Eubanks	e3e25b5112	[NewPM] Add option to prevent rerunning function pipeline on functions in CGSCC adaptor In a CGSCC pass manager, we may visit the same function multiple times due to SCC mutations. In the inliner pipeline, this results in running the function simplification pipeline on a function multiple times even if it hasn't been changed since the last function simplification pipeline run. We use a newly introduced analysis to keep track of whether or not a function has changed since the last time the function simplification pipeline has run on it. If we see this analysis available for a function in a CGSCCToFunctionPassAdaptor, we skip running the function passes on the function. The analysis is queried at the end of the function passes so that it's available after the first time the function simplification pipeline runs on a function. This is a per-adaptor option so it doesn't apply to every adaptor. The goal of this is to improve compile times. However, currently we can't turn this on by default at least for the higher optimization levels since the function simplification pipeline is not robust enough to be idempotent in many cases, resulting in performance regressions if we stop running the function simplification pipeline on a function multiple times. We may be able to turn this on for -O1 in the near future, but turning this on for higher optimization levels would require more investment in the function simplification pipeline. Heavily inspired by D98103. Example compile time improvements with flag turned on: https://llvm-compile-time-tracker.com/compare.php?from=998dc4a5d3491d2ae8cbe742d2e13bc1b0cacc5f&to=5c27c913687d3d5559ef3ab42b5a3d513531d61c&stat=instructions Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D113947	2021-11-17 09:06:46 -08:00
Dmitry Vyukov	a7c57c4ec8	tsan: don't consider debug calls as calls Tsan pass does 2 optimizations based on presence of calls: 1. Don't emit function entry/exit callbacks if there are no calls and no memory accesses. 2. Combine read/write of the same variable if there are no intervening calls. However, all debug info is represented as CallInst as well and thus effectively disables these optimizations. Don't consider debug info calls as calls. Reviewed By: glider, melver Differential Revision: https://reviews.llvm.org/D114079	2021-11-17 14:42:16 +01:00
David Sherwood	670dd40244	[Analysis] Fix getNumberOfParts to return 0 when the answer is unknown When asking how many parts are required for a scalable vector type there are occasions when it cannot be computed. For example, <vscale x 1 x i3> is one such vector for AArch64+SVE because at the moment no matter how we promote the i3 type we never end up with a legal vector. This means that getTypeConversion returns TypeScalarizeScalableVector as the LegalizeKind, and then getTypeLegalizationCost returns an invalid cost. This then causes BasicTTImpl::getNumberOfParts to dereference an invalid cost, which triggers an assert. This patch changes getNumberOfParts to return 0 for such cases, since the definition of getNumberOfParts in TargetTransformInfo.h states that we can use a return value of 0 to represent an unknown answer. Currently, LoopVectorize.cpp is the only place where we need to check for 0 as a return value, because all other instances will not currently ask for the number of parts for <vscale x 1 x iX> types. In addition, I have changed the target-independent interface for getNumberOfParts to return 1 and assume there is a single register that can fit the type. The loop vectoriser has lots of tests that are target-independent and they relied upon the 0 value to mean the answer is known and that we are not scalarising the vector. I have added tests here that show we correctly return an invalid cost for VF=vscale x 1 when the loop contains unusual types such as i7: Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll Differential Revision: https://reviews.llvm.org/D113772	2021-11-17 12:07:09 +00:00
Stanislav Mekhanoshin	c74f2e5b27	[InstCombine] Use SpecificBinaryOp_match in two more places Differential Revision: https://reviews.llvm.org/D114038	2021-11-17 01:16:06 -08:00
Hongtao Yu	042cefd2b5	[CSSPGO] Fix a hash code truncating issue in ContextTrieNode. std::hash returns a 64bit hash code while previously we were using only lower 32 bits which caused hash collision for large workloads. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D113688	2021-11-16 11:01:52 -08:00
Sanjay Patel	8fce94f916	[InstCombine] canonicalize icmp with trunc op into mask and cmp, part 2 If C is a high-bit mask: (trunc X) u< C --> (X & C) != C (are any masked-high-bits clear?) If C is low-bit mask: (trunc X) u> C --> (X & ~C) != 0 (are any masked-high-bits set?) If C is not-of-power-of-2 (one clear bit): (trunc X) u> C --> (X & (C+1)) == C+1 (are all masked-high-bits set?) This extends the fold added with: `acabad9ff6` (https://alive2.llvm.org/ce/z/aFr7qV) Using decomposeBitTestICmp() to generalize this is a planned follow-up, but that requires removing an inverse fold. Here are Alive2 generalizations for these folds: https://alive2.llvm.org/ce/z/u-ZpC_ (ult, the previous patch) https://alive2.llvm.org/ce/z/YsuAu2 (ult, this patch) https://alive2.llvm.org/ce/z/ekktQP (ugt, low bitmask) https://alive2.llvm.org/ce/z/pJY9wR (ugt, one clear bit) Differential Revision: https://reviews.llvm.org/D112634	2021-11-16 09:27:30 -05:00
Alexey Bataev	900cc1a226	[SLP]Improve cost of the gather nodes. No need to count the final shuffle cost for the constants, gathering of the constants is just a constant vector + extra inserts, if required. Differential Revision: https://reviews.llvm.org/D113770	2021-11-16 06:25:07 -08:00
Alexey Bataev	cdf8a53c1d	[SLP]Fix windows build, NFC. Need to put `IndexIdx` var to the list of captures.	2021-11-16 06:09:51 -08:00
Alexey Bataev	aa9bbb64be	[SLP]Adjust GEP indices types when trying to build entries. Need to adjust the types of GEPs indices when building the tree entries/operands. Otherwise some of the nodes might differ and vectorizer is unable to correctly find them and count their cost. Differential Revision: https://reviews.llvm.org/D113792	2021-11-16 05:44:33 -08:00
Sander.DeSmalen@arm.com	305816ff1e	[IndVarSimplify] Reduce nondeterministic behaviour in visitIVCast. rGf39978b84f1d3a1da6c32db48f64c8daae64b3ad led to and/or exposed an issue with IndVarSimplification for a loop where a i32 phi node is no longer replaced by a widened (i64) phi node, because the SCEVs of a sign-extend no longer folded the same way. I'm unsure how to properly explain this because it's all rather complicated, but in short: SCEVs don't fold as nicely as they used to and this caused a difference. While investigating this, I found that IndVarSimplify can actually optimise the case in the way we want to if it chooses the widened IV to be 'signed' (the i32 IV is both sign and zero-extended). Oddly enough, there is some level of indeterminism in the way the algorithm works, it just picks the sign of the 'first' zext/sext user, where the order of the users-iterator is not guaranteed to be the same on each invocation of the pass (e.g. shown by first running loop-rotate, which puts the users in a different order). While I think the fix is valid in the sense that consistently picking _any_ order is better than having an nondeterministic order, I can use a bit of advice from people more familiar in this area of the code-base. For example, I'm not sure if this fix is hiding another issue where the IndVarSimplify pass could actually draw the same conclusions (i.e. that it only needs an i64 phi node) if it does a bit more work, regardless of whether it chooses the induction variable to be signed or unsigned. I'm also not sure if choosing signed is better than unsigned, or whether that just happens to be beneficial only in this individual case. Any feedback would be much appreciated! Reviewed By: reames Differential Revision: https://reviews.llvm.org/D112573	2021-11-16 12:41:04 +00:00
Arthur Eubanks	19867de9e7	[NewPM] Only invalidate modified functions' analyses in CGSCC passes + turn on eagerly invalidate analyses Previously, any change in any function in an SCC would cause all analyses for all functions in the SCC to be invalidated. With this change, we now manually invalidate analyses for functions we modify, then let the pass manager know that all function analyses should be preserved since we've already handled function analysis invalidation. So far this only touches the inliner, argpromotion, function-attrs, and updateCGAndAnalysisManager(), since they are the most used. This is part of an effort to investigate running the function simplification pipeline less on functions we visit multiple times in the inliner pipeline. However, this causes major memory regressions especially on larger IR. To counteract this, turn on the option to eagerly invalidate function analyses. This invalidates analyses on functions immediately after they're processed in a module or scc to function adaptor for specific parts of the pipeline. Within an SCC, if a pass only modifies one function, other functions in the SCC do not have their analyses invalidated, so in later function passes in the SCC pass manager the analyses may still be cached. It is only after the function passes that the eager invalidation takes effect. For the default pipelines this makes sense because the inliner pipeline runs the function simplification pipeline after all other SCC passes (except CoroSplit which doesn't request any analyses). Overall this has mostly positive effects on compile time and positive effects on memory usage. https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=instructions https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=max-rss D113196 shows that we slightly regressed compile times in exchange for some memory improvements when turning on eager invalidation. D100917 shows that we slightly improved compile times in exchange for major memory regressions in some cases when invalidating less in SCC passes. Turning these on at the same time keeps the memory improvements while keeping compile times neutral/slightly positive. Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D113304	2021-11-15 14:44:53 -08:00
Philip Reames	8f95e915cd	[unroll-runtime] Relax two profitability limitations on multi-exit unrolling This change is mostly about getting rid of some "uninteresting" cases in a follow on deeper heuristic change. If anyone sees actually interesting code differences out of this, please let me know. I'm not expecting this to have much impact at all. Case 1 - With the single deoptimize non-latch exit, we can't have two exiting blocks sharing an exit block. We can only hit this with a poorly documented debug flag. Case 2 - Why should we treat epilog cases differently from prolog cases? Or to say it differently, why should starting with a constant control whether a multiple exit loop gets unrolled? Sorry for the lack of tests here. These are both exceedingly narrow cases in practice, and after a while trying, I couldn't come up with a test which did anything "useful" as opposed to simply exercise a random combination of force flags. Note that the legality cases for each are already exercised with force flags.	2021-11-15 13:00:14 -08:00
Philip Reames	423da61835	[runtime-unroll] Inline canSafelyUnrollMultiExitLoop [NFC] All of the interesting logic from this routine has been removed, inline the single check into the sole non-assert caller. The assert use has little value with the restructured code and is simply dropped.	2021-11-15 11:39:07 -08:00
Stanislav Mekhanoshin	e785f4ab6a	[PatternMatch] Add m_BinOp/m_c_BinOp with specific opcode Differential Revision: https://reviews.llvm.org/D113508	2021-11-15 11:24:27 -08:00
Philip Reames	e99902a872	[runtime-unroll] Restructure if-clause to improve readability [NFC]	2021-11-15 11:13:27 -08:00
Alexey Bataev	224e46d355	[SLP][DOT][NFCI]Output all scalars for the splats, not only the first one.	2021-11-15 10:54:26 -08:00
Mehrnoosh Heidarpour	7daa95c8fa	[InstCombine] Fold (A^B)\|~A-->~(A&B) https://alive2.llvm.org/ce/z/2v6rhF Fixes: https://llvm.org/PR52478 Differential Revision: https://reviews.llvm.org/D113783	2021-11-15 12:29:37 -05:00
Alexey Bataev	036207d5f2	[SLP]Improve splat detection. A bunch of scalars can be treated as a splat not only if all elements are the same but also if some of them are undefvalues. Differential Revision: https://reviews.llvm.org/D113774	2021-11-15 07:50:34 -08:00
Alexey Bataev	b85152f8b1	[SLP][NFC]Use `isa_and_nonnull` and fix comment, NFC.	2021-11-15 06:49:33 -08:00
ksyx	72b5138d37	Revert "[GVN][NFC] Remove redundant check" This reverts commit `c35e8185d8`. mstorsjo reported in the revision thread that one VNCoercion assertion is violated and seemly in relate to this commit. As per "If a test case that demonstrates a problem is reported in the commit thread, please revert and investigate offline", this commit is reverted.	2021-11-15 09:14:13 -05:00
Alexey Bataev	6fb5bed7d1	[SLP]Do not create unused gather nodes for scalar arguments of vector intrinsics. If the vector intrinsic has scalar argument, we currently still create a tree entry for this argument. This entry is not used, just consumes resources and increases the cost of the tree. Differential Revision: https://reviews.llvm.org/D113806	2021-11-15 06:11:19 -08:00
Sander de Smalen	f835fe8ef7	[LV] Rename blockNeedsPredication to blockNeedsPredicationForAnyReason. The interface is a convenience function to ask if a block requires predication when widening, but it's important that there are two separate concepts to consider: (A) The block was predicated in the original loop. (B) The block was unpredicated in the original loop, but requires predication because of tail folding. In the case of (B) we know that at least one lane of the vector will be executed, which means we can implementing a load from a uniform address with a scalar load + splat (D112552). In the case of predication because of (A), we cannot do this, because the scalar load itself requires predication. The name 'blockNeedsPredication' does not make the distinction between (A) and (B), hence the reason to rename it. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D113392	2021-11-15 08:04:20 +00:00
Kazu Hirata	feb40a3a47	[llvm] Use range-based for loops with instructions (NFC)	2021-11-14 19:40:48 -08:00
Kazu Hirata	d243cbf8ea	[llvm] Use isa instead of dyn_cast (NFC)	2021-11-14 19:40:46 -08:00
Mircea Trofin	a32c2c3808	[NFC] Use Optional<ProfileCount> to model invalid counts ProfileCount could model invalid values, but a user had no indication that the getCount method could return bogus data. Optional<ProfileCount> addresses that, because the user must dereference the optional. In addition, the patch removes concept duplication. Differential Revision: https://reviews.llvm.org/D113839	2021-11-14 19:03:30 -08:00
Kazu Hirata	7379736774	[llvm] Use range-based for loops with User::operands (NFC)	2021-11-14 09:32:38 -08:00
Kazu Hirata	098e935174	[llvm] Use range-based for loops with CallBase::args (NFC)	2021-11-14 09:32:36 -08:00
Mircea Trofin	0662a3612c	[NFC][InlineFunction] Renamed some vars to conform to coding style	2021-11-14 07:26:44 -08:00
Kazu Hirata	7505b7045f	[llvm] Use GetElementPtrInst::indices (NFC)	2021-11-13 21:43:28 -08:00
ksyx	c35e8185d8	[GVN][NFC] Remove redundant check The if-check above deleted part guarantees that StoreOffset <= LoadOffset and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0, while it could be meaningless to have a type with nonpositive size, so that the check could be removed. Part of revision D100179 Reviewed By: nikic	2021-11-13 15:59:43 -05:00
Philip Reames	37ead201e6	[runtime-unroll] Use incrementing IVs instead of decrementing ones This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing. Why does this matter? A couple of reasons: * SCEV doesn't have a native subtract node. Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such. As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones. (You can see this in the inferred flags in some of the test cases.) * Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language. We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced. (You can see this looking at nearby phis in the test cases.) Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen. * Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value. We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.	2021-11-12 15:44:58 -08:00
Philip Reames	de2fed6152	[unroll] Keep unrolled iterations with initial iteration The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is that the new iterations are placed far from the initial iteration. With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting. As such, placing Count-1 copies out of line is a fairly poor code placement choice. We'd much rather fall through into the hot (non-exiting) path. For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code. However, the real motivation for this change isn't performance. Its readability and human understanding. Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.	2021-11-12 11:40:50 -08:00
Joel E. Denny	c9dfe322ee	[OpenMP] Fix main thread barrier for Pascal and amdgpu Fixes what's left of https://bugs.llvm.org/show_bug.cgi?id=51781. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113602	2021-11-12 11:18:45 -05:00

1 2 3 4 5 ...

28968 Commits