llvm-project

Commit Graph

Author	SHA1	Message	Date
Kazu Hirata	972d4133e9	Use {DenseSet,SmallPtrSet}::contains (NFC)	2021-10-29 20:26:07 -07:00
Florian Hahn	274a9b0f0b	[DSE] Support redundant stores eliminated by memset. This patch adds support to remove stores that write the same value as earlier memesets. It uses isOverwrite to check that a memset completely overwrites a later store. The candidate store must store the same bytewise value as the byte stored by the memset. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112321	2021-10-29 22:19:53 +01:00
Sanjay Patel	8f786b4618	[InstCombine] fix comments to match code; NFC	2021-10-29 15:48:35 -04:00
modimo	5caad9b5d3	[InlineAdvisor] Add fallback/format switches and negative remark processing to Replay Inliner Adds the following switches: 1. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback: controls what the replay advisor does for inline sites that are not present in the replay. Options are: 1. Original: defers to original advisor 2. AlwaysInline: inline all sites not in replay 3. NeverInline: inline no sites not in replay 2. --sample-profile-inline-replay-format/--cgscc-inline-replay-format: controls what format should be generated to match against the replay remarks. Options are: 1. Line 2. LineColumn 3. LineDiscriminator 4. LineColumnDiscriminator Adds support for negative inlining decisions. These are denoted by "will not be inlined into" as compared to the positive "inlined into" in the remarks. All of these together with the previous `--sample-profile-inline-replay-scope/--cgscc-inline-replay-scope` allow tweaking in how to apply replay. In my testing, I'm using: 1. --sample-profile-inline-replay-scope/--cgscc-inline-replay-scope = Function to only replay on a function 2. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback = NeverInline since I'm feeding in only positive remarks to the replay system 3. --sample-profile-inline-replay-format/--cgscc-inline-replay-format = Line since I'm generating the remarks from DWARF information from GCC which can conflict quite heavily in column number compared to Clang An alternative configuration could be to do Function, AlwaysInline, Line fallback with negative remarks which closer matches the final call-sites. Note that this can lead to unbounded inlining if a negative remark doesn't match/exist for one reason or another. Updated various tests to cover the new switches and negative remarks Testing: ninja check-all Reviewed By: wenlei, mtrofin Differential Revision: https://reviews.llvm.org/D112040	2021-10-29 12:32:03 -07:00
modimo	51ce567b38	[SampleProfile] Add all callsites to AllCandidates if InlineReplay is in effect Replay in sample profiling needs to be asked on candidates that may not have counts or below the threshold. If replay is in effect for a function make sure these are captured and also imported during thinLTO. Testing: ninja check-all Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D112033	2021-10-29 12:04:52 -07:00
Roman Lebedev	0ae7bf124a	[NFC][LoopDeletion] Count the number of broken backedges Those don't contribute to the number of deleted loops.	2021-10-29 21:58:16 +03:00
Sanjay Patel	d0e9879d96	[InstCombine] allow vector splat matching for bitwise logic folds These transforms are also likely missing a one-use check, but that's another patch.	2021-10-29 14:22:50 -04:00
Stanislav Mekhanoshin	a905c54b76	[InstCombine] Fold `(~(a \| b) & c) \| ~(a \| c)` into `~((b & c) \| a)` ``` ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %or1 = or i4 %b, %a %not1 = xor i4 %or1, -1 %or2 = or i4 %a, %c %not2 = xor i4 %or2, -1 %and = and i4 %not2, %b %or3 = or i4 %and, %not1 ret i4 %or3 } define i4 @tgt(i4 %a, i4 %b, i4 %c) { %and = and i4 %c, %b %or = or i4 %and, %a %or3 = xor i4 %or, -1 ret i4 %or3 } Transformation seems to be correct! ``` Differential Revision: https://reviews.llvm.org/D112338	2021-10-29 10:58:09 -07:00
Jay Foad	1b758925ad	[IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction createReplacementInstr was a trivial wrapper around ConstantExpr::getAsInstruction, which also inserted the new instruction into a basic block. Implement this directly in getAsInstruction by adding an InsertBefore parameter and change all callers to use it. NFC. A follow-up patch will remove createReplacementInstr. Differential Revision: https://reviews.llvm.org/D112791	2021-10-29 15:02:58 +01:00
David Green	11630dbbc3	[InstCombine] Fold BW/2+1 tops bits are same pattern Match "icmp eq (trunc (lsr A, BW), (ashr (trunc A), BW-1))", which checks the top BW/2 + 1 bits are all the same. Create "A >=s INT_MIN && A <=s INT_MAX", which we generate as "icmp ult (add A, 2^BW-1), 2^BW" to skip a few steps of instcombining. https://alive2.llvm.org/ce/z/NjH6Ty https://alive2.llvm.org/ce/z/_fEQ9P Differential Revision: https://reviews.llvm.org/D109155	2021-10-29 12:30:20 +01:00
David Green	9020e22a87	[InstCombine] Convert xor (ashr X, BW-1), C -> select(X >=s 0, C, ~C) The sequence of instructions `xor (ashr X, BW-1), C` (or with a truncation `xor (trunc (ashr X, BW-1)), C)` takes a value, produces all zeros or all ones and with it optionally inverts a constant depending on whether the original input was positive or negative. This is the same as checking if the value is positive, and selecting between the constant and ~constant. https://alive2.llvm.org/ce/z/NJ85qY This is a fairly general version of a fold that helps pull saturating arithmetic into a canonical form. Differential Revision: https://reviews.llvm.org/D109151	2021-10-29 11:19:20 +01:00
Chuanqi Xu	bb16e83932	[NFC] [Coroutines] Use llvm::make_scope_exit to replace self-defined RTTIHelper	2021-10-29 12:14:20 +08:00
Stanislav Mekhanoshin	f7f430c913	[InstCombine] Fixed non-determinisctic order of new instructions Fixes non-determinisctic order of XOR instructions created after `5a7a458306`. The order of call argument evaluation is not defined, so create one Value before the call.	2021-10-28 12:14:02 -07:00
Stanislav Mekhanoshin	5a7a458306	[InstCombine] Fold `(c & ~(a \| b)) \| (b & ~(a \| c))` to `~a & (b ^ c)` ``` ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %or1 = or i4 %a, %b %not1 = xor i4 %or1, 15 %and1 = and i4 %not1, %c %or2 = or i4 %a, %c %not2 = xor i4 %or2, 15 %and2 = and i4 %not2, %b %or3 = or i4 %and1, %and2 ret i4 %or3 } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %xor = xor i4 %b, %c %not = xor i4 %a, 15 %or3 = and i4 %xor, %not ret i4 %or3 } Transformation seems to be correct! ``` Differential Revision: https://reviews.llvm.org/D112276	2021-10-28 11:54:30 -07:00
Yuanfang Chen	c18ed69873	[Internalize] Preserve __stack_chk_fail in Internalizer correctly Move the section collecting `AlwaysPreserved` up before any `maybeInternalize` is called. Otherwise, functions in `AlwaysPreserved` (in this case, `__stack_chk_fail`) are not preserved. Reviewed By: MaskRay, tejohnson Differential Revision: https://reviews.llvm.org/D112684	2021-10-28 11:22:26 -07:00
Florian Hahn	c45045bfd0	[VPlan] Keep induction recipes in header. This patch updates recipe creation to ensure all VPWidenIntOrFpInductionRecipes are in the header block. At the moment, new induction recipes can be created in different blocks when trying to optimize casts and induction variables. Having all induction recipes in the header makes it easier to analyze/transform them in VPlan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111300	2021-10-28 18:22:05 +01:00
Leonard Grey	793b481f54	[CGProfile] Don't emit call graph profile edges with zero weight With D112160 and D112164, on a Chrome Mac build this reduces the total size of CGProfile sections by 78% (around 25% eliminated entirely) and total size of object files by 0.14%. Differential Revision: https://reviews.llvm.org/D112655	2021-10-28 11:32:49 -04:00
David Green	9358384fd6	[InstCombine] Extend canonicalizeClampLike to handle truncated inputs This extends the canonicalizeClampLike function to allow cases where the input is truncated, but still matching on the types of the ICmps. For example %t = trunc i32 %X to i8 %a = add i32 %X, 128 %cmp = icmp ult i32 %a, 256 %c = icmp sgt i32 %X, -1 %f = select i1 %c, i8 High, i8 Low %r = select i1 %cmp, i8 %t, i8 %f becomes %c1 = icmp slt i32 %X, -128 %c2 = icmp sge i32 %X, 128 %s1 = select i1 %c1, i32 sext(Low), i32 %X %s2 = select i1 %c2, i32 sext(High), i32 %s1 %t = trunc i32 %s2 to i8 https://alive2.llvm.org/ce/z/vPzfxH We limit the transform to constant High and Low values, where we know the sext are free. Differential Revision: https://reviews.llvm.org/D108049	2021-10-28 15:46:58 +01:00
Dawid Jurczak	f87e0c68d7	[DSE] Eliminates redundant store of an exisiting value (PR16520) That's https://reviews.llvm.org/D90328 follow-up. This change eliminates writes to variables where the value that is being written is already stored in the variable. This achieves the goal by looping through all memory definitions in the current state and getting defining access from each of them. When there is defining access where the write instruction is identical to the original instruction it will remove this redundant write. For example: void f() { x = 1; if foo() { x = 1; g(); } else { h(); } } void g(); void h(); The second x=1 will be eliminated since it is rewriting 1 to x. This pass will produce this: void f() { x = 1; if foo() { g(); } else { h(); } } void g(); void h(); Differential Revision: https://reviews.llvm.org/D111727	2021-10-28 16:20:09 +02:00
David Green	79011c705b	[InstCombine] Fix rare condition violation in canonicalizeClampLike With a "ult x, 0", the fold in canonicalizeClampLike does not validate with undef inputs. This condition will usually have been simplified away, but we should ensure the code is correct in case. https://alive2.llvm.org/ce/z/S8HQ6H vs https://alive2.llvm.org/ce/z/h2XBJ_ See: https://reviews.llvm.org/D108049	2021-10-28 15:03:07 +01:00
Alexey Bataev	07ef9f513f	[SLP]Improve/fix reordering of the gathered graph nodes. Gathered loads/extractelements/extractvalue instructions should be checked if they can represent a vector reordering node too and their order should ve taken into account for better graph reordering analysis/ Also, if the gather node has reused scalars, they must be reordered instead of the scalars themselves. Differential Revision: https://reviews.llvm.org/D112454	2021-10-28 05:45:09 -07:00
Sanjay Patel	e8535fa784	[InstCombine] allow Negator to fold multi-use select with constant arms The motivating test is reduced from: https://llvm.org/PR52261 Note that the more general problem of folding any binop into a multi-use select of constants is still there. We need to ease the restriction in InstCombinerImpl::FoldOpIntoSelect() to catch those. But these examples never reach that code because Negator exclusively handles negation patterns within visitSub(). Differential Revision: https://reviews.llvm.org/D112657	2021-10-28 08:35:58 -04:00
Johannes Doerfert	acf3093117	[Attributor][FIX] Do not ignore memory writes in AAMemoryBehavior Even if we look for `nocapture` we need to bail on escaping pointers. The crucial thing is that we might not look at a big enough scope when we derive the memory behavior. Thus, it might be `nocapture` in a larger context while it is "captured" in a smaller context.	2021-10-27 21:04:32 -05:00
Johannes Doerfert	734f91441d	[Attributor][NFC] Improve debug messages	2021-10-27 21:04:31 -05:00
Johannes Doerfert	8a4551b893	[Attributor][FIX] Use right address space to avoid assertion When we strip and accumulate constant offsets we need to pick the right address space such that the offset APInt has the right bit width. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D112544	2021-10-27 18:22:37 -05:00
Nick Desaulniers	3ccd041af9	[LowerTypeTests] Emit cfi_jt aliases regardless of function export A constant complaint we get is that the __typeid__ symbols in the CFI jump tables causes confusing stack traces in applications. Emit the more readable cfi_jt aliases regardless of function export (LTO vs Thin LTO). Reviewed By: pcc, tejohnson Differential Revision: https://reviews.llvm.org/D107934	2021-10-27 11:36:26 -07:00
Alexey Bataev	f06e332982	Revert "[SLP]Improve/fix reordering of the gathered graph nodes." This reverts commit `64d1617d18` to fix test non-stability.	2021-10-27 11:16:58 -07:00
Roman Lebedev	156f10c840	[IR] `SCEVExpander::generateOverflowCheck()`: short-circuit `umul_with_overflow`-by-one It's a no-op, no overflow happens ever: https://alive2.llvm.org/ce/z/Zw89rZ While generally i don't like such hacks, we have a very good reason to do this: here we are expanding a run-time correctness check for the vectorization, and said `umul_with_overflow` will not be optimized out before we query the cost of the checks we've generated. Which means, the cost of run-time checks would be artificially inflated, and after https://reviews.llvm.org/D109368 that will affect the minimal trip count for which these checks are even evaluated. And if they aren't even evaluated, then the vectorized code certainly won't be run. We could consider doing this in IRBuilder, but then we'd need to also teach `CreateExtractValue()` to look into chain of `insertvalue`'s, and i'm not sure there's precedent for that. Refs. https://reviews.llvm.org/D109368#3089809	2021-10-27 19:45:55 +03:00
Alexey Bataev	64d1617d18	[SLP]Improve/fix reordering of the gathered graph nodes. Gathered loads/extractelements/extractvalue instructions should be checked if they can represent a vector reordering node too and their order should ve taken into account for better graph reordering analysis/ Also, if the gather node has reused scalars, they must be reordered instead of the scalars themselves. Differential Revision: https://reviews.llvm.org/D112454	2021-10-27 08:49:13 -07:00
David Sherwood	5d9318638e	[NFC][LoopVectorize] Change getStepVector to take a Value* for the StartIdx This patch changes the definition of getStepVector from: Value getStepVector(Value Val, int StartIdx, Value Step, ... to Value getStepVector(Value Val, Value StartIdx, Value Step, ... because: 1. it seems inconsistent to pass some values as Value and some as integer, and 2. future work will require the StartIdx to be an expression made up of runtime calculations of the VF. In widenIntOrFpInduction I've changed the code to pass in the value returned from getRuntimeVF, but the presence of the assert: assert(!VF.isScalable() && "scalable vectors not yet supported."); means that currently this code path is only exercised for fixed-width VFs and so the patch is still NFC. Differential revision: https://reviews.llvm.org/D111882	2021-10-27 16:12:38 +01:00
Alexey Bataev	9b12975cbf	Revert "[SLP]Improve/fix reordering of the gathered graph nodes." This reverts commit `f719b794bc` to fix instability in tests.	2021-10-27 07:31:36 -07:00
Alexey Bataev	f719b794bc	[SLP]Improve/fix reordering of the gathered graph nodes. Gathered loads/extractelements/extractvalue instructions should be checked if they can represent a vector reordering node too and their order should ve taken into account for better graph reordering analysis/ Also, if the gather node has reused scalars, they must be reordered instead of the scalars themselves. Differential Revision: https://reviews.llvm.org/D112454	2021-10-27 06:08:40 -07:00
Alexey Bataev	cb4feae7bd	[SLP]Fix logical and/or reductions. Need to emit select(cmp) instructions for poison-safe forms of select ops. Currently alive reports that `Target is more poisonous than source` for operations we generating for such instructions. https://alive2.llvm.org/ce/z/FiNiAA Differential Revision: https://reviews.llvm.org/D112562	2021-10-27 04:25:20 -07:00
David Sherwood	3d706c20f8	[NFC][LoopVectorize] Remove setBestPlan in favour of getBestPlanFor I have removed LoopVectorizationPlanner::setBestPlan, since this function is quite aggressive because it deletes all other plans except the one containing the <VF,UF> pair required. The code is currently written to assume that all <VF,UF> pairs will live in the same vplan. This is overly restrictive, since scalable VFs live in different plans to fixed-width VFS. When we add support for vectorising epilogue loops when the main loop uses scalable vectors then we will the vplan for the main loop will be different to the epilogue. Instead I have added a new function called LoopVectorizationPlanner::getBestPlanFor that returns the best vplan for the <VF,UF> pair requested and leaves all the vplans untouched. We then pass this best vplan to LoopVectorizationPlanner::executePlan which now takes an additional VPlanPtr argument. Differential revision: https://reviews.llvm.org/D111125	2021-10-27 09:38:27 +01:00
Arthur Eubanks	ae27c57b18	[InferAddressSpaces] Make pass work with opaque pointers Avoid getPointerElementType().	2021-10-26 23:53:20 -07:00
Sanjay Patel	acabad9ff6	[InstCombine] try to canonicalize icmp with trunc op into mask and cmp The motivating test is based on: https://llvm.org/PR52260 We have better analysis for X == 0, so try harder to form that.	2021-10-26 17:43:28 -04:00
Usman Nadeem	da1318ccca	[NFC][Instcombine] Cleanup some obsolete matches in visitSelectInstr These are now redundant after https://reviews.llvm.org/D106872 Change-Id: I82edfedf1d45cac4e3368d77ce3a48c78e342c19	2021-10-26 10:07:08 -07:00
Rosie Sumpter	b716d0aa94	[LoopVectorize] Clean up VPReductionRecipe::execute. NFC Use RdxDesc->getOpcode instead of getUnderlingInstr()->getOpcode. Move the code which finds Kind and IsOrdered to be outside the for loop since neither of these change with the vector part. Differential Revision: https://reviews.llvm.org/D112547	2021-10-26 17:18:25 +01:00
Alexey Bataev	ce14d1b690	[SLP]Do not reorder reduction nodes. The final reduction nodes should not be reordered, the order does not matter for reductions. Also, it might be profitable to vectorize smaller reduction trees, reduction cost may compensate small tree cost. Part of D111574 Differential Revision: https://reviews.llvm.org/D112467	2021-10-26 07:41:24 -07:00
Nikita Popov	11a8423dab	[SCEV] Use reverse() (NFC)	2021-10-26 11:08:58 +02:00
Max Kazantsev	9bbfe0f72c	[NFC] Remove obsolete simplifyOnceImpl function The function simplifyOnce only calls simplifyOnceImpl and does nothing else. Having this separate helper makes no sense. Removing it. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D112517 Reviewed By: mkazantsev	2021-10-26 13:51:42 +07:00
Max Kazantsev	d4c74cd4e8	[NFC] [LoopPeel] Update IDoms of non-loop blocks dominated by the loop When peeling a loop, we assume that the latch has a `br` terminator and that all loop exits are either terminated with an `unreachable` or have a terminating deoptimize call. So when we peel off the 1st iteration, we change the IDom of all loop exits to the peeled copy of `NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support loops with exits that are followed by a block with an `unreachable` or a terminating deoptimize call, changing the exit's idom wouldn't be enough and DT would be broken. For example, let `Exit1` and `Exit2` are loop exits, and each of them unconditionally branches to the same `unreachable` terminated block. So neither of the exits dominates this unreachable block. If we change the IDoms of the exits to some peeled loop block, we don't update the dominators of the unreachable block. Currently we just don't get to the peeling logic, saying that we can't peel such loops. Previously we stored exits' IDoms in a map before peeling a loop and then, after peeling off one iteration, we changed their IDoms. Now we use the same logic not only for exits but for all non-loop blocks dominated by the loop. So when we add logic to support peeling loops with exits which branch, for example, to an unreachable-terminated block, we would update the IDoms not only for exits, but for their successors. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D111611 Reviewed By: mkazantsev, nikic	2021-10-26 13:09:07 +07:00
Nikita Popov	3a995c918e	[SCEV] Move SCEVLostPoisonFlags() check into SCEVExpander Always insert values into ExprValueMap, and instead skip using them in SCEVExpander if poison-generating flags have been lost. This ensures that all values that are in ValueExprMap are also in ExprValueMap, so we can use the latter to invalidate the former. This change is probably not entirely NFC for the case where originally the SCEV had no nowrap flags but they were inferred later, in which case that would now allow reusing the existing value for expansion. Differential Revision: https://reviews.llvm.org/D112389	2021-10-25 22:37:20 +02:00
Arthur Eubanks	4a9db7367d	[AlwaysInliner] Invalidate analyses when we delete functions Fixes PR52292. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D112473	2021-10-25 13:36:32 -07:00
Zarko Todorovski	9769e97c35	[LLVM] Inclusive terms: remove/replace references to sanity in RewriteStatepointsForGC.cpp and test Part of work to have the LLVM backend to use more inclusive terms. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D112461	2021-10-25 16:17:41 -04:00
Philip Reames	f82cf6187f	[indvars] Fix pr52276 (missing one use check) The recently added logic to canonicalize exit conditions to unsigned relies on facts which hold about the use (i.e. exit test). Applying this blindly to the icmp is not legal, as there may be another use which never reaches the exit. Restrict ourselves to case where we have a single use.	2021-10-25 09:26:55 -07:00
Alexey Bataev	eb9b75dd4d	[SLP]Change the order of the reduction/binops args pair vectorization attempts. Need to change the order of the reduction/binops args pair vectorization attempts. Need to try to find the reduction at first and postpone vectorization of binops args. This may help to find more reduction patterns and vectorize them. Part of D111574. Differential Revision: https://reviews.llvm.org/D112224	2021-10-25 06:27:14 -07:00
Max Kazantsev	a9b0776a81	[SimplifyCFG] Sanity assert in iterativelySimplifyCFG We observe a hang within iterativelySimplifyCFG due to infinite loop execution. Currently, there is no limit to this loop, so in case of bug it just works forever. This patch adds an assert that will break it after 1000 iterations if it didn't converge.	2021-10-25 17:10:17 +07:00
Nikita Popov	75384ecdf8	[InstSimplify] Refactor invariant.group load folding Currently strip.invariant/launder.invariant are handled by constructing constant expressions with the intrinsics skipped. This takes an alternative approach of accumulating the offset using stripAndAccumulateConstantOffsets(), with a flag to look through invariant.group intrinsics. Differential Revision: https://reviews.llvm.org/D112382	2021-10-25 10:56:25 +02:00
Florian Hahn	a6c4969f5f	[VPlan] Do not create dummy entry block (NFC). At the moment a dummy entry block is created at the beginning of VPlan construction. This dummy block is later removed again. This means it is not easy to identify the VPlan header block in a general fashion, because during recipe creation it is the single successor of the entry block, while later it is the entry block. To make getting the header easier, just skip creating the dummy block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111299	2021-10-25 09:52:58 +01:00
Nikita Popov	477551fd09	[SCEVExpander] Minor cleanup in value reuse (NFC) Use dyn_cast_or_null and convert one of the checks into an assertion. SCEV is a per-function analysis.	2021-10-25 10:32:17 +02:00
Kazu Hirata	9800731367	[Target, Transforms] Use predecessors instead of pred_begin and pred_end (NFC)	2021-10-24 17:35:35 -07:00
Philip Reames	3c06ecaa1e	[instcombine] Fix oss-fuzz 39934 (mul matcher can match non-instruction) Fixes a crash observed by oss-fuzz in 39934. Issue at hand is that code expects a pattern match on m_Mul to imply the operand is a mul instruction, however mul constexprs are also valid here.	2021-10-24 14:42:03 -07:00
Nikita Popov	710596a1e1	[ConstantFolding] Accept offset in ConstantFoldLoadFromConstPtr (NFCI) As this API is now internally offset-based, we can accept a starting offset and remove the need to create a temporary bitcast+gep sequence to perform an offset load. The API now mirrors the ConstantFoldLoadFromConst() API.	2021-10-23 17:59:39 +02:00
Philip Reames	412eb07edd	[indvars] Use fact loop must exit to canonicalize to unsigned conditions The logic in this patch is that if we find a comparison which would be unsigned except for when the loop is infinite, and we can prove that an infinite loop must be ill defined, we can still make the predicate unsigned. The eventual goal (combined with a follow on patch) is to use the fact the loop exits to remove the zext (see tests) entirely. A couple of points worth noting: * We loose the ability to prove the loop unreachable by committing to the must exit interpretation. If instead, we later proved that rhs was definitely outside the range required for finiteness, we could have killed the loop entirely. (We don't currently implement this transform, but could in theory, do so.) * simplifyAndExtend has a very limited list of users it walks. In particular, in the examples is stops at the zext and never visits the icmp. (Because we can't fold the zext to an addrec yet in SCEV.) Being willing to visit when we haven't simplified regresses multiple tests (seemingly because of less optimal results when computing trip counts). D112170 explores fixing that, but - at least so far - appears to be too expensive compile time wise. Differential Revision: https://reviews.llvm.org/D111836	2021-10-22 10:31:36 -07:00
Nikita Popov	5bb7562962	[Attributor] Generalize GEP construction Make use of the getGEPIndicesForOffset() helper for creating GEPs. This handles arrays as well, uses correct GEP index types and reduces code duplication. Differential Revision: https://reviews.llvm.org/D112263	2021-10-22 18:30:43 +02:00
Kazu Hirata	6fe949c4ed	[Target, Transforms] Use StringRef::contains (NFC)	2021-10-22 08:52:33 -07:00
Chuanqi Xu	ddbf196194	[Coroutines] Ignore partial lifetime markers refer of an alloca When I playing with Coroutines, I found that it is possible to generate following IR: ``` %struct = alloca ... %sub.element = getelementptr %struct, i64 0, i64 index ; index is not %zero lifetime.marker.start(%sub.element) % use of %sub.element lifetime.marker.end(%sub.element) store %struct to xxx ; %struct is escaping! <suspend points> ``` Then the AllocaUseVisitor would collect the lifetime marker for sub.element and treat it as the lifetime markers of the alloca! So it judges that the alloca could be put on the stack instead of the frame by judging the lifetime markers only. The root cause for the bug is that AllocaUseVisitor collects wrong lifetime markers. This patch fixes this. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D112216	2021-10-22 09:49:50 +08:00
Vitaly Buka	b7ea298dfd	[msan] Don't use TLS slots of noundef args Transformations may strip the attribute from the argument, e.g. for unused, which will result in shadow offsets mismatch between caller and callee. Stripping noundef for used arguments can be a problem, as TLS is not going to be set by caller. However this is not the goal of the patch and I am not aware if that's even possible. Differential Revision: https://reviews.llvm.org/D112197	2021-10-21 18:35:12 -07:00
Nikita Popov	1848525842	[CodeMetrics] Don't require speculatability for ephemeral values As discussed in D112016, our current requirement of speculatability for ephemeral is overly strict: What we really care about is that the instruction will be DCEd once the assume is dropped. For that it is sufficient that the instruction is side-effect free and not a terminator. In particular, this allows non-dereferenceable loads to be ephemeral values. Differential Revision: https://reviews.llvm.org/D112179	2021-10-21 20:30:01 +02:00
Sanjay Patel	66d22b4da4	[VectorCombine] fold shuffle-of-binops with common operand shuf (bo X, Y), (bo X, W) --> bo (shuf X), (shuf Y, W) This is motivated by an example in D111800 (although that patch avoids the problem for that particular example). The pattern is shown in reduced form with: https://llvm.org/PR52178 https://alive2.llvm.org/ce/z/d8zB4D There is no difference on the PhaseOrdering test from D111800 because the aarch64 cost model says that the shuffle cost is 3 while the fadd cost is 2. Differential Revision: https://reviews.llvm.org/D111901	2021-10-21 12:37:54 -04:00
Sanjay Patel	3888de9507	[InstCombine] generalize reassociated Demorgan folds This updates the recent D112108 / `b92412fb28` to handle the flipped logic ('or') sibling: https://alive2.llvm.org/ce/z/Y2L6Ch	2021-10-21 10:39:29 -04:00
Alexey Bataev	3ea7877c8b	[SLP]Unify vectorization of PHI and store nodes with improved tiny tree vectorization. Vectorization of PHIs and stores very similar, it might be beneficial to try to revectorize stores (like PHIs) if the total number of stores with the same/alternate opcode is less than the vector size but number of stores with the same type is larger than the vector size. Differential Revision: https://reviews.llvm.org/D109831	2021-10-21 06:25:32 -07:00
Dawid Jurczak	9ba5bb4309	[NFC][LoopIdiom] Make for loops more readable Patch simplifies for loops in LIR following LLVM guidelines: https://llvm.org/docs/CodingStandards.html#use-range-based-for-loops-wherever-possible. Differential Revision: https://reviews.llvm.org/D112077	2021-10-21 12:17:44 +02:00
Evgeniy Brevnov	1a8ec24efb	[NARY-REASSOCIATE][NFC] Simplify min/max handling In order to explore different variants of reassociation current implementation uses "swap in a loop" approach. Unfortunately, the implementation is more complicated than it could be. This is an attempt to streamline the code. New approach is to extract core functionality into a helper function and call it explicitly as many times as required. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112128	2021-10-21 15:45:53 +07:00
Vitaly Buka	6742c8a2d8	[NFC][msan] Break the loop when done We have nothing to do after the Argument is found.	2021-10-20 21:08:12 -07:00
Stanislav Mekhanoshin	b92412fb28	[InstCombine] Fold `(a & ~b) & ~c` to `a & ~(b \| c)` %not1 = xor i32 %b, -1 %not2 = xor i32 %c, -1 %and1 = and i32 %a, %not1 %and2 = and i32 %and1, %not2 => %i1 = or i32 %b, %c %i2 = xor i32 %1, -1 %and2 = and i32 %i2, %a Differential Revision: https://reviews.llvm.org/D112108	2021-10-20 13:05:46 -07:00
Florian Hahn	8977bd5806	[IndVars] Invalidate SCEV when IR is changed in rewriteLoopExitValue. At the moment, rewriteLoopExitValue forgets the current phi node in the loop that collects phis to rewrite. A few lines after the value is forgotten, SCEV is used again to analyze incoming values and potentially expand SCEV expression. This means that another SCEV is created for PN, before the IR is actually updated in the next loop. This leads to accessing invalid cached expression in combination with D71539. PN should only be changed once the actual incoming exit value is set in the next loop. Moving invalidation there should ensure that PN is invalidated in all relevant cases. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D111495	2021-10-20 20:48:33 +01:00
Sanjay Patel	80ab06c599	[InstCombine] fold fake vector insert to bit-logic bitcast (inselt (bitcast X), Y, 0) --> or (and X, MaskC), (zext Y) https://alive2.llvm.org/ce/z/Ux-662 Similar to D111082 / `db231ebdb0` : We want to avoid relatively opaque vector ops on types that are likely supported by the backend as scalar integers. The bitwise logic ops are more likely to allow further combining. We probably want to generalize this to allow a shift too, but that would oppose instcombine's general rule of not creating extra instructions, so that's left as a potential follow-up. Alternatively, we could do that transform in VectorCombine with the help of the TTI cost model. This is part of solving: https://llvm.org/PR52057	2021-10-20 14:21:40 -04:00
Itay Bookstein	08ed216000	[IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol As discussed in: * https://reviews.llvm.org/D94166 * https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html The GlobalIndirectSymbol class lost most of its meaning in https://reviews.llvm.org/D109792, which disambiguated getBaseObject (now getAliaseeObject) between GlobalIFunc and everything else. In addition, as long as GlobalIFunc is not a GlobalObject and getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee is a GlobalIFunc cannot currently be modeled properly. Creating aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition, calling getAliaseeObject on a GlobalIFunc will currently return nullptr, which is undesirable because it should return the object itself for non-aliases. This patch refactors the GlobalIFunc class to inherit directly from GlobalObject, and removes GlobalIndirectSymbol (while inlining the relevant parts into GlobalAlias and GlobalIFunc). This allows for calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc itself, making getAliaseeObject() more consistent and enabling alias-to-ifunc to be properly modeled in the IR. I exercised some judgement in the API clients of GlobalIndirectSymbol: some were 'monomorphized' for GlobalAlias and GlobalIFunc, and some remained shared (with the type adapted to become GlobalValue). Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D108872	2021-10-20 10:29:47 -07:00
Evgeniy Brevnov	269f563a2b	[NARY-REASSOCIATE] Fix infinite recursion optimizing min\max To guarantee convergence of the algorithm each optimization step should decrease number of instructions when IR is modified. This property is not held in this test case. The problem is that SCEV Expander may do "unexpected" reassociation what results in creation of new min/max chains and introduction of extra instructions. As a result on each step we indefinitely optimize back and forth. The solution is to restrict SCEV Expander to perform uncontrolled reassociations by means of "Unknown" expressions. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112060	2021-10-20 14:23:03 +07:00
Philip Reames	0836a1059d	Extend transform introduced in D111896 to multiple exits This is trivial. It was left out of the original review only because we had multiple copies of the same code in review at the same time, and keeping them in sync was easiest if the structure was kept in sync.	2021-10-19 12:12:19 -07:00
Philip Reames	fca0218875	[indvars] Canonicalize exit conditions to unsigned using range info This patch duplicates a bit of logic we apply to comparisons encountered during the IV users walk to conditions which feed exit conditions. Why? simplifyAndExtend has a very limited list of users it walks. In particular, in the examples is stops at the zext and never visits the icmp. (Because we can't fold the zext to an addrec yet in SCEV.) Being willing to visit when we haven't simplified regresses multiple tests (seemingly because of less optimal results when computing trip counts). Note that this can be trivially extended to multiple exiting blocks. I'm leaving that to a future patch (solely to cut down on the number of versions of the same code in review at once.) Differential Revision: https://reviews.llvm.org/D111896	2021-10-19 11:49:12 -07:00
Anna Thomas	9403514e76	[LoopPredication] Calculate profitability without BPI Using BPI within loop predication is non-trivial because BPI is only preserved lossily in loop pass manager (one fix exposed by lossy preservation is up for review at D111448). However, since loop predication is only used in downstream pipelines, it is hard to keep BPI from breaking for incomplete state with upstream changes in BPI. Also, correctly preserving BPI for all loop passes is a non-trivial undertaking (D110438 does this lossily), while the benefit of using it in loop predication isn't clear. In this patch, we rely on profile metadata to get almost similar benefit as BPI, without actually using the complete heuristics provided by BPI. This avoids the compile time explosion we tried to fix with D110438 and also avoids fragile bugs because BPI can be lossy in loop passes (D111448). Reviewed-By: asbirlea, apilipenko Differential Revision: https://reviews.llvm.org/D111668	2021-10-19 14:24:04 -04:00
Simon Pilgrim	71e39e3f18	[ADT] Add APInt::isNegatedPowerOf2() helper Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos..... Differential Revision: https://reviews.llvm.org/D111998	2021-10-19 14:38:21 +01:00
Alexey Bataev	b9cfa016da	[SLP]Fix emission of the shrink shuffles. Need to follow the order of the reused scalars from the ReuseShuffleIndices mask rather than rely on the natural order. Differential Revision: https://reviews.llvm.org/D111898	2021-10-18 13:13:12 -07:00
modimo	313c657fce	[InlineAdvisor] Add -inline-replay-scope=<Function\|Module> to control replay scope The goal is to allow grafting an inline tree from Clang or GCC into a new compilation without affecting other functions. For GCC, we're doing this by extracting the inline tree from dwarf information and generating the equivalent remarks. This allows easier side-by-side asm analysis and a trial way to see if a particular inlining setup provides benefits by itself. Testing: ninja check-all Reviewed By: wenlei, mtrofin Differential Revision: https://reviews.llvm.org/D110658	2021-10-18 13:08:39 -07:00
Florian Hahn	e844f05397	[LoopUtils] Simplify addRuntimeCheck to return a single value. This simplifies the return value of addRuntimeCheck from a pair of instructions to a single `Value `. The existing users of addRuntimeChecks were ignoring the first element of the pair, hence there is not reason to track FirstInst and return it. Additionally all users of addRuntimeChecks use the second returned `Instruction ` just as `Value `, so there is no need to return an `Instruction `. Therefore there is no need to create a redundant dummy `and X, true` instruction any longer. Effectively this change should not impact the generated code because the redundant AND will be folded by later optimizations. But it is easy to avoid creating it in the first place and it allows more accurately estimating the cost of the runtime checks.	2021-10-18 18:03:09 +01:00
Gil Rapaport	1156bd4fc3	[LV] Record memory widening decisions (NFCI) Record widening decisions for memory operations within the planned recipes and use the recorded decisions in code-gen rather than querying the cost model. Differential Revision: https://reviews.llvm.org/D110479	2021-10-18 18:03:35 +03:00
Sanjay Patel	2a3cc4d461	[Analysis] add utility function for unary shuffle mask creation This is NFC-intended for the callers. Posting in case there are other potential users that I missed. I would also use this from VectorCombine in a patch for: https://llvm.org/PR52178 ( D111901 ) Differential Revision: https://reviews.llvm.org/D111891	2021-10-18 09:00:39 -04:00
Peter Waller	c4603a8a43	[InstCombine][DebugInfo] Remove superflous assertion, add test When this code was added, an unnecessary assertion slipped in which we now hit in real code. Add a test to defend against it firing again.	2021-10-18 11:00:01 +00:00
Max Kazantsev	baad10c09e	Revert "[NFC] [LoopPeel] Change the way DT is updated for loop exits" This reverts commit `fa16329ae0`. See comments in discussion. Merged by mistake, not entirely getting what the problem was.	2021-10-18 17:14:11 +07:00
Max Kazantsev	fa16329ae0	[NFC] [LoopPeel] Change the way DT is updated for loop exits When peeling a loop, we assume that the latch has a `br` terminator and that all loop exits are either terminated with an `unreachable` or have a terminating deoptimize call. So when we peel off the 1st iteration, we change the IDom of all loop exits to the peeled copy of `NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support loops with exits that are followed by a block with an `unreachable` or a terminating deoptimize call, changing the exit's idom wouldn't be enough and DT would be broken. For example, let `Exit1` and `Exit2` are loop exits, and each of them unconditionally branches to the same `unreachable` terminated block. So neither of the exits dominates this unreachable block. If we change the IDoms of the exits to some peeled loop block, we don't update the dominators of the unreachable block. Currently we just don't get to the peeling logic, saying that we can't peel such loops. With this NFC we just insert edges from cloned exiting blocks to their exits after peeling each iteration (we accumulate the insertion updates and then after peeling apply the updates to DT). This patch was a part of D110922. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D111611 Reviewed By: mkazantsev	2021-10-18 10:23:05 +07:00
Sanjay Patel	a49f5386ce	[InstCombine] generalize fold for mask-with-signbit-splat, part 2 This removes an over-specified fold. The more general transform was added with: `727e642e97` There's a difference on an existing test that shows a potentially unnecessary use limit on an icmp fold. That fold is in InstCombinerImpl::foldICmpSubConstant(), and IIRC there was some back-and-forth on it and similar folds because they could cause analysis/passes (SCEV, LSR?) to miss optimizations. Differential Revision: https://reviews.llvm.org/D111410	2021-10-15 17:11:29 -04:00
Sanjay Patel	727e642e97	[InstCombine] generalize fold for mask-with-signbit-splat (iN X s>> (N-1)) & Y --> (X < 0) ? Y : 0 https://alive2.llvm.org/ce/z/qeYhdz I was looking at a missing abs() transform and found my way to this generalization of an existing fold that was added with D67799. As discussed in that review, we want to make sure codegen handles this difference well, and for all of the targets/types that I spot-checked, it looks good. I am leaving the existing fold in place in this commit because it covers a potentially missing icmp fold, but I plan to remove that as a follow-up commit as suggested during review. Differential Revision: https://reviews.llvm.org/D111410	2021-10-15 16:25:48 -04:00
Florian Hahn	4a1d63d7d0	[VectorCombine] Add option to only run scalarization transforms. This patch adds a pass option to only run transforms that scalarize vector operations and do not create new vector instructions. When running VectorCombine early in the pipeline introducing new vector operations can have negative effects, like blocking loop or SLP vectorization. To avoid regressions, restrict the early VectorCombine run (when using -enable-matrix) to only perform scalarization and not introduce new vector operations. This is done as option to the pass directly, which is then set when adding the pass to the pipeline. This is done for the new pass manager only. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D111800	2021-10-15 20:35:58 +01:00
Stephen Tozer	f5ed223b0f	[DebugInfo] Limit the size of DIExpressions that we will salvage up to Fixes: https://bugs.llvm.org/show_bug.cgi?id=51841 This patch places an arbitrary limit on the size of DIExpressions that we will produce via salvaging, for performance reasons. This helps to fix a performance issue observed in the bug above, in which debug values would be salvaged hundreds of times, producing expressions with over 1000 elements and causing the compiler to hang. Limiting the size of debug values that we will produce to 128 largely fixes this issue. Reviewed By: dblaikie, jmorse Differential Revision: https://reviews.llvm.org/D110332	2021-10-15 17:34:21 +01:00
Anton Afanasyev	7b07c01351	[InstCombine] Support arbitrary const shift amount for `lshr (sext i1 ...)` Add lshr (sext i1 X to iN), C --> select (X, -1 >> C, 0) case. This expands C == N-1 case to arbitrary C. Fixes PR52078. Reviewed By: spatel, RKSimon, lebedev.ri Differential Revision: https://reviews.llvm.org/D111330	2021-10-15 13:39:13 +03:00
Kazu Hirata	81e9c90686	[llvm] Use llvm::is_contained (NFC)	2021-10-14 22:44:09 -07:00
Alexey Bataev	414abff1fe	[SLP]Fix PR52090: clang crashes: Assertion `Index < Length && "Invalid index!"' failed. Need to check that either Idx is UndefMaskElem and value is UndefValue or Idx is valid and value is the same as the scalar value in the node. Differential Revision: https://reviews.llvm.org/D111802	2021-10-14 14:26:29 -07:00
Nikita Popov	69853f9920	[IVUsers] Move preheader check into SCEVExpander Rather than checking for loop nest preheaders upfront in IVUsers, move this requirement into isSafeToExpand() from SCEVExpander. Historically, LSR did not check whether SCEVs are safe to expand and fully relied on IVUsers to validate this. Later, support for non-expandable SCEVs was added via rigid formulas. Checking this in isSafeToExpand() makes it more obvious what exactly this check is guarding against, and avoids the awkward loop nest scan. This is a followup to https://reviews.llvm.org/D111493#3055286. Differential Revision: https://reviews.llvm.org/D111681	2021-10-14 21:52:31 +02:00
Simon Pilgrim	13185f0154	[Transforms] eliminateDeadStores - remove unused variable. NFC. The initial MemoryAccess *Current assignment is never used, and all other uses are initialized/used within the worklist loop (and not across multiple iterations) - so move the variable internal to the loop. Fixes scan-build unused assignment warning.	2021-10-14 18:10:03 +01:00
Shoaib Meenai	6404f4b5af	[InstCombine] Remove attributes after hoisting free above null check If the parameter had been annotated as nonnull because of the null check, we want to remove the attribute, since it may no longer apply and could result in miscompiles if left. Similarly, we also want to remove undef-implying attributes, since they may not apply anymore either. Fixes PR52110. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D111515	2021-10-13 15:34:56 -07:00
Philip Reames	47d10b25f8	[instcombine] PRE freeze to only potentially posion/undef operand of phi This extends the foldOpIntoPhi code used when visiting a freeze user of a phi to allow any non-undef/poison operand as opposed to only non-undef/poison constants. This lets us hoist a freeze in the increment of an IV into the preheader in many cases. Differential Revision: https://reviews.llvm.org/D111744	2021-10-13 13:55:54 -07:00
Sjoerd Meijer	67a58fa3a6	[FuncSpec] Don't run the solver if there's nothing to do Even if there are no interesting functions, the SCCP solver would still run before bailing. Now bail earlier, avoid running the solver for nothing. Differential Revision: https://reviews.llvm.org/D111645	2021-10-13 19:05:19 +01:00
Arthur Eubanks	3628bb7436	Make various assume bundle data structures use uint64_t Following D110451, we need to make sure to support 64 bit values.	2021-10-13 10:38:41 -07:00
Sanjay Patel	02928fcb8c	[InstCombine] improve code comments; NFC	2021-10-13 10:40:44 -04:00
Sanjay Patel	905d170803	[InstCombine] allow matching vector splat constants in foldLogOpOfMaskedICmps() This is NFC-intended for scalar code. There are still unnecessary m_ConstantInt restrictions in surrounding code, so this is not a complete fix. This prevents regressions seen with a planned follow-on to D111410.	2021-10-13 10:15:26 -04:00
Philip Reames	6f34839407	[instcombine] propagate freeze through single use poison producing flag instruction If we have an instruction which produces poison only when flags are specified on the instruction, then we know that freezing the operands and dropping flags is equivalent to freezing the result. If we know those flags don't result in any undefined behavior being executed, then there's no point in preserving the flags as we gain no knowledge by having them. This patch extends the existing propagation logic which sinks freeze to single potential non-poison operands to allow dropping of flags when we know the freeze is the sole use of the instruction with poison flags. The main value is that we tend to sink freezes towards the phi in IV cycles where the incoming value to the phi is the freeze of an IV increment. This will in turn (in a future patch), let us fold the freeze through the phi into the loop preheader. Motivated by eliminating need for CanonicalizeFreezeInLoops for the clearly profitable cases from onephi.ll test case in the test directory. Differential Revision: https://reviews.llvm.org/D111675	2021-10-12 13:52:41 -07:00
Ayal Zaks	15692fd6b5	[LV] Fix 2nd crash for reverse interleaved groups under mask/fold-tail. This patch fixes another crash revealed by PR51614: when deciding to vectorize with masked interleave groups, check if the access is reverse (which is currently not supported). Differential Revision: https://reviews.llvm.org/D108900	2021-10-12 21:44:42 +03:00
Mircea Trofin	ea4a6c8426	[Inline] Make sure the InlineAdvisor is correctly cleared. If another inlining session came after a ModuleInlinerWrapperPass, the advisor alanysis would still be cached, but its Result would be cleared. We need to clear both. This addresses PR52118 Differential Revision: https://reviews.llvm.org/D111586	2021-10-12 10:42:41 -07:00
Sanjay Patel	7a2949647a	[InstCombine] propagate no-wrap flag through select-of-mul fold This may not be obvious, but Alive2 agrees: https://alive2.llvm.org/ce/z/Ld9qNT If the mul has "nsw", then -1 * INT_MIN is poison, so the negate can also have "nsw" because 0 - INT_MIN is poison. If the mul has "nuw", then that means the "OtherOp" can only be 0 or 1 (anything else multiplied by 0xfff... would wrap). So the replacement negate must be "nsw" because it is either "0-0" or "0-1". This is another regression noticed with a planned follow-up to D111410.	2021-10-12 12:57:20 -04:00
Hongtao Yu	098a0d8fbc	[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3. This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation. Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are: - Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes - Unblocked CSE by avoiding pseudo probe from clobbering memory SSA - Unblocked induction variable simpliciation - Allow empty loop deletion by treating probe intrinsic isDroppable - Some refactoring. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D110847	2021-10-12 09:44:12 -07:00
Kerry McLaughlin	1439ef1a3f	[LoopVectorize] Classify pointer induction updates as scalar only if they have one use collectLoopScalars collects pointer induction updates in ScalarPtrs, assuming that the instruction will be scalar after vectorization. This may crash later in VPReplicateRecipe::execute() if there there is another user of the instruction other than the Phi node which needs to be widened. This changes collectLoopScalars so that if there are any other users of Update other than a Phi node, it is not added to ScalarPtrs. Reviewed By: david-arm, fhahn Differential Revision: https://reviews.llvm.org/D111294	2021-10-12 13:24:49 +01:00
Florian Hahn	40d85f16c4	[LoopPeel] Use any_of & contains instead of for & find. Using contains was suggested in D108114, but I forgot to include it when landing the patch.	2021-10-12 12:18:01 +01:00
Sjoerd Meijer	fc0fa85171	[FuncSpec] Allow ConstExprs that are function pointers This is a follow up of D110529 that disallowed constexprs. That change introduced a regression as this also disallowed constexprs that are function pointers, which is actually one of the motivating use cases that we do want to support. Differential Revision: https://reviews.llvm.org/D111567	2021-10-12 11:44:26 +01:00
Florian Hahn	cd0ba9dc58	[LoopPeel] Peel if it turns invariant loads dereferenceable. This patch adds a new cost heuristic that allows peeling a single iteration off read-only loops, if the loop contains a load that 1. is feeding an exit condition, 2. dominates the latch, 3. is not already known to be dereferenceable, 4. and has a loop invariant address. If all non-latch exits are terminated with unreachable, such loads in the loop are guaranteed to be dereferenceable after peeling, enabling hoisting/CSE'ing them. This enables vectorization of loops with certain runtime-checks, like multiple calls to `std::vector::at` if the vector is passed as pointer. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D108114	2021-10-12 11:42:28 +01:00
Alina Sbirlea	f7ca54289c	[LoopSimplifyCFG] Do not require MSSA. Continue to preserve if available. LoopSimplifyCFG does not need MSSA, but should preserve it if it's available. This is a legacy PM change, aimed to denoise the test changes in D109958. Differential Revision: https://reviews.llvm.org/D111578	2021-10-11 14:27:15 -07:00
Sanjay Patel	59441c7329	[InstCombine] fold signbit check of X \| (X -1) There may be some other patterns like this or a generalization, but this is an example that I noticed would definitely regress with a planned follow-up to D111410. https://alive2.llvm.org/ce/z/GVpQDb	2021-10-11 16:14:13 -04:00
Arthur Eubanks	fbddf22ef7	[SCCP] Properly report changes when changing a pointer argument Fixes one of the issues in PR51946. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D111277	2021-10-11 13:12:08 -07:00
Florian Hahn	ab33427c86	[VPlan] Print live-in backedge taken count as part of plan. At the moment, a VPValue is created for the backedge-taken count, which is used by some recipes. To make it easier to identify the operands of recipes using the backedge-taken count, print it at the beginning of the VPlan if it is used. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D111298	2021-10-11 20:13:01 +01:00
Philip Reames	7f55209cee	[SCEV] Extend trip count to avoid overflow by default As a brief reminder, an "exit count" is the number of times the backedge executes before some event. It can be zero if we exit before the backedge is reached. A "trip count" is the number of times the loop header is entered if we branch into the loop. In general, TC = BTC + 1 and thus a zero trip count is ill defined There is a cornercases which we don't handle well. Let's assume i8 for our examples to keep things simple. If BTC = 255, then the correct trip count is 256. However, 256 is not representable in i8. In theory, code which needs to reason about trip counts is responsible for checking for this cornercase, and either bailing out, or handling it correctly. Historically, we don't have a great track record about actually doing so. When reviewing D109676, I found myself asking a basic question. Was there any good reason to preserve the current wrap-to-zero behavior when converting from backedge taken counts to trip counts? After reviewing existing code, I could not find a single case which appears to correctly and precisely handle the overflow case. This patch changes the default behavior to extend instead of wrap. That is, if the result might be 256, we return a value of i9 type to ensure we interpret the count correctly. I did leave the legacy behavior as an option since a) loop-flatten stops triggering if I extend due to weirdly specific pattern matching I didn't understand and b) we could reasonably use the mode if we'd externally established a lack of overflow. I want to emphasize that this change is not NFC. There are two call sites (one in ScalarEvolution.cpp, one in LoopCacheAnalysis.cpp) which are switched to the extend semantics. The former appears imprecise (but correct) for a constant 255 BTC. The later appears incorrect, though I don't have a test case. Differential Revision: https://reviews.llvm.org/D110587	2021-10-11 09:55:55 -07:00
hyeongyu kim	0aeb37324d	[SimpleLoopUnswitch] Re-fix introduction of UB when hoisted condition may be undef or poison https://bugs.llvm.org/show_bug.cgi?id=27506 https://bugs.llvm.org/show_bug.cgi?id=31652 https://bugs.llvm.org/show_bug.cgi?id=51043 Problems with SimpleLoopUnswitch cause the bug reports above. ``` while (...) { if (C) { A } else { B } } Into: C' = freeze(C) if (C') { while (...) { A } } else { while (...) { B } } ``` This problem can be solved by adding a freeze on hoisted branches(above transform) and has been solved by D29015. However, D29015 is now reverted by performance regression(`2b5a897651`) It is not the first time that an added freeze has caused performance regression. SimplifyCFG also had a problem with UB caused by branching-on-undef, which was solved by adding freeze to the branching condition. (D104569) Performance regression occurred in D104569, and patches such as D105344 and D105392 were written to minimize it. This patch will correct the SimpleLoopUnswitch as D104569 handles the SimplyCFG while minimizing performance loss by introducing patches like D105344 and D105392(This patch was rebased with the author's permission) Reviewed By: reames Differential Revision: https://reviews.llvm.org/D106041	2021-10-12 01:02:09 +09:00
David Sherwood	26b7d9d622	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-11 09:41:38 +01:00
Clement Courbet	6aaf1e7ea9	[LoopIdiom] Fix store size SCEV type. We were using the type of the loop back edge count to represent the store size. This failed for small loop counts (e.g. in the added test, the loop count was an i2). Use the index type instead. Fixes PR52104. Differential Revision: https://reviews.llvm.org/D111401	2021-10-11 09:39:06 +02:00
Dawid Jurczak	9e65929a8e	[DSE] Re-enable calloc transformation with extra care (PR25892) Transformation from malloc+memset to calloc is always correct and in many situations it brings significant observable benefits in terms of execution speed and memory consumption [1][2]. Unfortunately there are cases when producing calloc cause performance drops [3]. As discussed here: https://reviews.llvm.org/D103009 it's possible to differentiate between those 2 scenarios. If optimizer is able to prove that after malloc call it's _very_ likely to reach memset branch then after calloc emission we shouldn't observe any performance hits. Therefore finding "null pointer check" pattern before memset basic block sounds like good justification for performing transformation. Also that method was already suggested by GCC folks [4]. Main reason for change is that for now to be safe we check for post dominance relation which is way too conservative approach making transformation "almost" disabled in practice. This patch tends to enable transformation again but with extra care. [1] https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc [2] https://vorpus.org/blog/why-does-calloc-exist/ [3] http://smalldatum.blogspot.com/2017/11/a-new-optimization-in-gcc-5x-and-mysql.html [4] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83022 Differential Revision: https://reviews.llvm.org/D110021	2021-10-10 21:47:14 +02:00
Sanjay Patel	05281d95f2	[InstCombine] move fold for "(X-Y) == 0"; NFC This consolidates related folds that all have a similar use restriction that may not be necessary.	2021-10-10 11:26:03 -04:00
Sanjay Patel	da210f5d34	[InstCombine] canonicalize "(C2 - Y) > C" as (Y + ~C2) < ~C The test diffs show that we have better analysis/folds for 'add' (although we should at least have the simplifications independently, so we don't have the one-use restriction). This is related to solving regressions that would appear in transforms related to D111410, and that is part of a series of enhancements that may eventually helpi solve PR34047. https://alive2.llvm.org/ce/z/3tB9KG define i1 @src(i8 %x, i8 %C, i8 %C2) { %sub = sub nuw i8 %C2, %x %r = icmp slt i8 %sub, %C ret i1 %r } define i1 @tgt(i8 %x, i8 %C, i8 %C2) { %Cnot = xor i8 %C, -1 %C2not = xor i8 %C2, -1 %add = add nuw i8 %x, %C2not %r = icmp sgt i8 %add, %Cnot ret i1 %r }	2021-10-10 11:06:49 -04:00
Sanjay Patel	acafde09a3	[InstCombine] enhance icmp with sub folds There were 2 related but over-specified folds for: C1 - X == C One allowed multi-use but was limited to equal constants. The other allowed different constants but disallowed multi-use. This combines the 2 folds into a more general match. The test diffs show the multi-use cases that were falling through the cracks. https://alive2.llvm.org/ce/z/4_hEt2 define i1 @src(i8 %x, i8 %subC, i8 %C) { %s = sub i8 %subC, %x %r = icmp eq i8 %s, %C ret i1 %r } define i1 @tgt(i8 %x, i8 %subC, i8 %C) { %newC = sub i8 %subC, %C %isneg = icmp eq i8 %x, %newC ret i1 %isneg }	2021-10-09 11:39:49 -04:00
Dávid Bolvanský	943b304848	Fixed some errors detected by PVS Studio	2021-10-09 17:27:41 +02:00
Nikita Popov	ea12adc169	[CanonicalizeFreeze] Drop IVUsers.h include (NFC) Looking for users of IVUsers, this was a false positive. Only LSR uses IVUsers.	2021-10-09 17:01:26 +02:00
Max Kazantsev	4c0da23663	[LoopDeletion] Support selects when symbolically evaluating 1st iteration Adds support for selects for which we know value on the 1st iteration. Differential Revision: https://reviews.llvm.org/D104111 Reviewed By: nikic	2021-10-09 14:47:44 +07:00
Arthur Eubanks	20a0c482e0	[LICM] Use Align instead of int	2021-10-08 18:26:15 -07:00
Nikita Popov	e3129fb792	[LoopFlatten] Mark inner loop as deleted If a loop is flattened, the inner loop is removed and the LPM should be informed of this fact, so it can invalidate associated analyses. To support this, we relax an assertion in LPMUpdater to allow invalidating non-top-level loops when running in LoopNestMode, as the pass does not know how exactly it will get scheduled. Differential Revision: https://reviews.llvm.org/D111350	2021-10-08 23:12:15 +02:00
Andrew Browne	007d98f520	[DFSan] Fix warning: getArgsFunctionType defined but not used Warning introduced in `61ec2148c5`	2021-10-08 11:58:36 -07:00
Arthur Eubanks	a3358fcff1	More followup type changes after `05392466`	2021-10-08 11:51:36 -07:00
Andrew Browne	61ec2148c5	[DFSan] Remove -dfsan-args-abi support in favor of TLS. ArgsABI was originally added in https://reviews.llvm.org/D965 Current benchmarking does not show a significant difference. There is no need to maintain both ABIs. Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D111097	2021-10-08 11:18:36 -07:00
Arthur Eubanks	9405217999	Revert "Recommit "[LoopPeel] Peel loops with deoptimizing exits"" This reverts commit `d68b59f3eb`. This is causing crashes, see D110922 for details.	2021-10-08 10:53:23 -07:00
Philip Reames	d694dd0f0d	Add iterator range variants of isGuaranteedToTransferExecutionToSuccessor [mostly-nfc] This factors out utilities for scanning a bounded block of instructions since we have this code repeated in a bunch of places. The change to InlineFunction isn't strictly NFC as the limit mechanism there didn't handle debug instructions correctly.	2021-10-08 09:50:10 -07:00
Max Kazantsev	d68b59f3eb	Recommit "[LoopPeel] Peel loops with deoptimizing exits" Removed obsolete DT verification that should not be there because the strategy of DT updates has changed. Differential Revision: https://reviews.llvm.org/D110922	2021-10-08 17:54:27 +07:00
Max Kazantsev	48a5a2d1af	Revert "[LoopPeel] Peel loops with deoptimizing exits" This reverts commit `8a959625c4`. Reported failures with LLVM_ENABLE_EXPENSIVE_CHECKS, need to investigate.	2021-10-08 16:07:59 +07:00
Jingu Kang	4c98070cce	[LoopBoundSplit] Handle the case in which exiting block is loop header Update the incoming value of phi nodes in header of post-loop correctly. Differential Revision: https://reviews.llvm.org/D110060	2021-10-08 09:13:41 +01:00
Dawid Jurczak	dd5991cc6f	[LoopIdiom] Transform loop containing memcpy to memmove The purpose of patch is to learn Loop Idiom Recognize pass how to recognize simple memmove patterns in similar way like GCC does: https://godbolt.org/z/dKjGvTGff It's follow-up of following change: https://reviews.llvm.org/D104464 Differential Revision: https://reviews.llvm.org/D107075	2021-10-08 09:56:01 +02:00
Max Kazantsev	8a959625c4	[LoopPeel] Peel loops with deoptimizing exits Added support for peeling loops with "deoptimizing" exits - such exits that it or any of its children (or any of their children, etc) either has a @llvm.experimental.deoptimize call prior to the terminating return instruction of this basic block or is terminated with unreachable. All blocks in the the sequence must have a single successor, maybe except for the last one. Previously we only checked the exit block for being deoptimizing. Now we check if the last reachable block from the exit is deoptimizing. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D110922 Reviewed By: mkazantsev	2021-10-08 10:32:13 +07:00
Nikita Popov	c5245dd339	[LoopFlatten] Mark loop analyses as preserved LoopFlatten does preserve loop analyses (DT, LI and SCEV), but currently doesn't mark them as preserved in the NewPM (they are marked as preserved in the LegacyPM). I think this doesn't really have an effect in the end because the loop pass adaptor will just assume they're preserved anyway, but let's be explicit about this for the sake of clarity. Differential Revision: https://reviews.llvm.org/D111328	2021-10-07 21:56:49 +02:00
Sanjay Patel	d95ebef4b8	[InstCombine] ease use check for fold of bitcasted extractelt to trunc This helps with examples like: https://llvm.org/PR52057 ...but we need at least one more fold to fix that case.	2021-10-07 15:09:34 -04:00
Akira Hatanaka	392a2a554c	Refactor code in ObjCARC.cpp. NFC This is in preparation for another patch I'm planning to send later.	2021-10-07 11:25:01 -07:00
Bjorn Pettersson	7f93bb4a58	[LoopRotate] Forget SCEV values in RewriteUsesOfClonedInstructions This patch fixes problems reported in PR51981. When rotating a loop it isn't enough to just forget SCEV for that loop nest. When rotating we might clone some instructions from the old header into the preheader, and insert new PHI nodes to merge values together. There could be users of the original value that are updated to use the PHI result. And those users were not necessarily depending on a PHI node earlier, so they weren't cleaned up when just forgetting all SCEV:s for the loop nest. So we need to explicitly forget those values to avoid invalid cached SCEV expressions. Reviewed By: fhahn, mkazantsev Differential Revision: https://reviews.llvm.org/D110813	2021-10-07 19:36:30 +02:00
Chris Jackson	a61c0adba1	[DebugInfo][LSR] Limit the size of SCEV translated to DIExpression SCEV-based salvaging will use excessive resources if it encounters very long SCEV expressions. This patch places a limit on the length of SCEV expression that salvaging will attempt to translate. Reviewed by: Orlando Differential Revision: https://reviews.llvm.org/D110558	2021-10-07 15:38:28 +00:00
Itay Bookstein	40ec1c0f16	[IR][NFC] Rename getBaseObject to getAliaseeObject To better reflect the meaning of the now-disambiguated {GlobalValue, GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction (D109792), the function is renamed to getAliaseeObject.	2021-10-06 19:33:10 -07:00
Kuba Mracek	7329abf2f8	[GlobalDCE] In VFE, replace the whole 'sub' expression of unused relative-pointer-based vtable slots Differential Revision: https://reviews.llvm.org/D109114	2021-10-06 15:55:55 -07:00
Arthur Eubanks	72dddce652	More size_t -> uint64_t fixes after `05392466` Fixes some bots where the two differ.	2021-10-06 15:13:47 -07:00
Arthur Eubanks	ab7d421869	size_t -> uint64_t after `05392466` Fixes some bots where the two differ.	2021-10-06 13:52:04 -07:00
Arthur Eubanks	05392466f0	Reland [IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 13:29:23 -07:00
Arthur Eubanks	569346f274	Revert "Reland [IR] Increase max alignment to 4GB" This reverts commit `8d64314ffe`.	2021-10-06 11:38:11 -07:00
Arthur Eubanks	1b76312e98	Update some types after D110451 To fix mismatched size_t vs uint64_t on some platforms.	2021-10-06 11:27:48 -07:00
Arthur Eubanks	8d64314ffe	Reland [IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 11:03:51 -07:00
Arthur Eubanks	72cf8b6044	Revert "[IR] Increase max alignment to 4GB" This reverts commit `df84c1fe78`. Breaks some bots	2021-10-06 10:21:35 -07:00
Arthur Eubanks	df84c1fe78	[IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 09:54:14 -07:00
Simon Pilgrim	0dcd2b40e6	[TTI] Remove default condition type and predicate arguments from getCmpSelInstrCost We need to be better at exposing the comparison predicate to getCmpSelInstrCost calls as some targets (e.g. X86 SSE) have very different costs for different comparisons (PR48337), and we can't always rely on the optional Instruction argument. This initial commit requires explicit condition type and predicate arguments. The next step will be to review a lot of the existing getCmpSelInstrCost calls which have used BAD_ICMP_PREDICATE even when the predicate is known. Differential Revision: https://reviews.llvm.org/D111024	2021-10-06 15:40:35 +01:00
Sanjay Patel	db231ebdb0	[InstCombine] fold fake vector extract to shift+trunc We already handle more complicated cases like: extelt (bitcast (inselt poison, X, 0)) --> trunc (lshr X) But we missed this simpler pattern: https://alive2.llvm.org/ce/z/D55h64 / https://alive2.llvm.org/ce/z/GKzzRq This is part of solving: https://llvm.org/PR52057 I made the transform depend on legal/desirable int type to avoid creating a shift of an illegal type (for example i128). I'm not sure if that restriction is actually necessary, but we can change that as a follow-up if the backend can deal with integer ops on too-wide illegal types. The pile of AVX512 test changes are all neutral AFAICT - the x86 backend seems to know how to turn that into the expected "kmov" instructions. Differential Revision: https://reviews.llvm.org/D111082	2021-10-06 08:12:05 -04:00
Simon Pilgrim	21661607ca	[llvm] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine) As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.	2021-10-06 12:04:30 +01:00
kpyzhov	7e390dfea7	[AMDGPU] Correction to `095c48fdf3`. Differential Revision: https://reviews.llvm.org/D110337	2021-10-05 21:47:25 -04:00
Sanjay Patel	bc72baa047	[InstCombine] add folds for logical nand/nor This is noted as a regression in: https://llvm.org/PR52077	2021-10-05 18:31:20 -04:00
Mircea Trofin	7d541eb4d4	[inliner] Mandatory inlining decisions produce remarks This also removes the need to disable the mandatory inlining phase in tests. In a departure from the previous remark, we don't output a 'cost' in this case, because there's no such thing. We just report that inlining happened because of the attribute. Differential Revision: https://reviews.llvm.org/D110891	2021-10-05 14:01:25 -07:00
Sanjay Patel	668beb8ae8	[InstCombine] refactor folds of 'not' instructions; NFC This removes repeated calls to m_Not, so hopefully a little more efficient. Also, we may need to enhance some of these blocks to allow logical and/or (select of bools).	2021-10-05 16:36:57 -04:00
Alexey Bataev	bebe702dbe	[SLP]Detect reused scalars in all possible gathers for better vectorization cost. Some initially gathered nodes missed the check for the reused scalars, which leads to high gather cost. Such nodes still can be represented as m gathers + shuffle instead of n gathers, where m < n. Differential Revision: https://reviews.llvm.org/D111153	2021-10-05 09:43:03 -07:00
kpyzhov	095c48fdf3	[AMDGPU] Use "hostcall" module flag instead of searching for ockl_hostcall_internal() declaration. The current way to detect hostcalls by looking for "ockl_hostcall_internal()" function in the module seems to be not reliable enough. The LTO may rename the "ockl_hostcall_internal()" function when an application is compiled with "-fgpu-rdc", and MetadataStreamer pass to fail to detect hostcalls, therefore it does not set the "hidden_hostcall_buffer" kernel argument. This change adds a new module flag: hostcall that can be used to detect whether GPU functions use host calls for printf. Differential revision: https://reviews.llvm.org/D110337	2021-10-05 09:56:04 -04:00
Sjoerd Meijer	cdfc678572	[SCCPSolver] Fix use-after-free in markArgInFuncSpecialization In SCCPSolver::markArgInFuncSpecialization, the ValueState map may be reallocated after the initial ValueLatticeElement reference is grabbed, but before its use in copy initialization. This causes a use-after-free. To fix this, this commit changes the behavior to create the new ValueLatticeElement before assigning the old one to it. Patch by: https://github.com/duck-37/ Differential Revision: https://reviews.llvm.org/D111112	2021-10-05 12:56:32 +01:00
wlei	fb29d812e4	[CSSPGO] Rename the field of SampleContextFrame Differential Revision: https://reviews.llvm.org/D110980	2021-10-04 19:06:59 -07:00
Arthur Eubanks	7f28b4d5b7	[LICM] Bail if checking a global/constant for invariant.start When we check if a load is loop invariant by finding a dominating invariant.start call, we strip bitcasts until we get to an i8* Value, and look for an invariant.start use of the i8* Value. We may accidentally end up at an i8 global and look at a global's uses, which we shouldn't do in a loop pass. Although we could make this logic work with globals, that's not currently intended. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D111098	2021-10-04 14:14:25 -07:00
Sanjay Patel	6a2a84c253	[InstCombine] add helper for "is desirable int type"; NFC This splits out the logic from shouldChangeType() that currently allows 8/16/32-bit transforms even if those types are not listed as legal in the data layout. This could be useful as a predicate for vector insert/extract transforms. Note that this leaves the subsequent checks in shouldChangeType() unchanged. We may want to merge the checks for i1 and/or "ToLegal" into "isDesirable", but that may alter existing transforms.	2021-10-04 14:30:18 -04:00
Christopher Tetreault	df1f03280c	[SimpleLoopUnswitch] Allow threshold to be specified zero or more times Differential Revision: https://reviews.llvm.org/D110594	2021-10-04 09:19:26 -07:00
Bjorn Pettersson	7f84fa4ad4	[TargetLibraryInfo] Refactor size_t checks in isValidProtoForLibFunc. NFC In TargetLibraryInfoImpl::isValidProtoForLibFunc we no longer need the IsSizeTTy lambda function and the SizeTTy object. Instead we just follow the regular structure of checking for integer types given an exepected number of bits.	2021-10-04 15:46:39 +02:00
Joseph Huber	f074a6a041	[OpenMP] Add options to change Attributor max iterations in OpenMPOpt This patch adds a new command line option `openmp-opt-max-iterations` that controls the maximum number of iterations the attributor will run for when compiling OpenMP target device code. This patch also adds a remark to indicate when the attributor failed because it did not run for enough iterations. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110749	2021-10-04 09:39:04 -04:00
Jingu Kang	4288b6520a	[LoopBoundSplit] Use SCEVAddRecExpr instead of SCEV for AddRecSCEV (NFC) Differential Revision: https://reviews.llvm.org/D109682	2021-10-04 10:17:44 +01:00
David Sherwood	28388645a3	[NFC] Simple tidy-up in LoopVectorizationCostModel::selectEpilogueVectorizationFactor Avoid creating EpilogueVectorizationForceVF twice.	2021-10-04 10:14:22 +01:00
Jay Foad	a9bceb2b05	[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC. Stop using APInt constructors and methods that were soft-deprecated in D109483. This fixes all the uses I found in llvm, except for the APInt unit tests which should still test the deprecated methods. Differential Revision: https://reviews.llvm.org/D110807	2021-10-04 08:57:44 +01:00
Sanjay Patel	f32c0fe8e5	[InstCombine] fold cast of right-shift if high bits are not demanded (3rd try) The first two tries at this were reverted because they caused an infinite loop in instcombine. That should be fixed after a series of patches that ended with removing the faulty opposing transform: `3fabd98e5b` Original commit message: (masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C Narrowing the shift should be better for analysis and can lead to follow-on transforms as shown. Attempt at a general proof in Alive2: https://alive2.llvm.org/ce/z/tRnnSF Here are a couple of the specific tests: https://alive2.llvm.org/ce/z/bCnTp- https://alive2.llvm.org/ce/z/TfaHnb Differential Revision: https://reviews.llvm.org/D110170	2021-10-03 10:37:22 -04:00
Dávid Bolvanský	5f2f611880	Fixed more warnings in LLVM produced by -Wbitwise-instead-of-logical	2021-10-03 13:58:10 +02:00
hyeongyu kim	cf284f6c5e	[LSV] Change the default value of InstertElement to poison This patch is changing the InsertElement's placeholder to poison without changing the LSV's behavior. Regardless of whether `StoreTy` is FixedVectorType or not, the poison value will be overwritten with a different value. Therefore, whether the InsertElement's placeholder is poison or undef will not affect the result of the program. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D111005	2021-10-03 17:57:34 +09:00
Daniil Suchkov	45bd8d9477	[SimpleLoopUnswitch] Don't unswitch constant conditions Added an additional check for constants after simplification of "select _, true, false" pattern. We need to prevent attempts to unswitch constant conditions for two reasons: a) Doing that doesn't make any sense, in the best case it will just burn some compile time. b) SimpleLoopUnswitch isn't designed to unswitch constant conditions (due to (a)), so attempting that can cause miscompiles. The attached testcase is an example of such miscompile. Also added an assertion that'll make sure we aren't trying to replace constants, so it will help us prevent such bugs in future. The assertion from D110751 is another layer of protection against such cases. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D110752	2021-10-01 21:30:54 +00:00
Sanjay Patel	3fabd98e5b	[InstCombine] fold (trunc (X>>C1)) << C to shift+mask directly This is no-externally-visible-functional-difference-intended. That is, the test diffs show identical instructions other than name changes (those are included specifically to verify the logic). The existing transforms created extra instructions and relied on subsequent folds to get to the final result, but that could conflict with other transforms like the proposed D110170 (and caused that patch to be reverted twice so far because of infinite combine loops).	2021-10-01 14:22:44 -04:00
Arthur Eubanks	a7b4ce9cfd	[NFC][AttributeList] Replace index_begin/end with an iterator We expose the fact that we rely on unsigned wrapping to iterate through all indexes. This can be confusing. Rather, keeping it as an implementation detail through an iterator is less confusing and is less code. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D110885	2021-10-01 10:17:41 -07:00
Kazu Hirata	4f0225f6d2	[Transforms] Migrate from getNumArgOperands to arg_size (NFC) Note that getNumArgOperands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-01 09:57:40 -07:00
Kerry McLaughlin	c1d46d3461	[SLPVectorizer] Fix crash in isShuffle with scalable vectors D104809 changed `buildTree_rec` to check for extract element instructions with scalable types. However, if the extract is extended or truncated, these changes do not apply and we assert later on in isShuffle(), which attempts to cast the type of the extract to FixedVectorType. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D110640	2021-10-01 10:56:44 +01:00
Krasimir Georgiev	685f1bfd0a	Revert "[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns" It appears to cause stage2 clang build failures, e.g., https://lab.llvm.org/buildbot/#/builders/74/builds/7145. This reverts commit `1fb37334bd`.	2021-10-01 11:39:43 +02:00
David Sherwood	1fb37334bd	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-01 08:41:03 +01:00
Arnold Schwaighofer	2df2b27d94	[cora async] Cleanup undefined llvm.coro.async.resume In situations where the coroutine function is not split we can just replace the async.resume by null. rdar://82591919 Differential Revision: https://reviews.llvm.org/D110191	2021-09-30 13:26:53 -07:00
Sanjay Patel	3fcb00df5d	[InstCombine] restrict shift-trunc-shift fold to opposite direction shifts This is NFCI because the pattern with 2 left-shifts should get folded independently by smaller folds. The motivation is to refine this block to avoid infinite loops seen with D110170.	2021-09-30 15:06:13 -04:00
Adrian Prantl	9232ca4712	Improve the effectiveness of BDCE's debug info salvaging This patch improves the effectiveness of BDCE's debug info salvaging by processing the instructions in reverse order and delaying dropAllReferences until after debug info salvaging. This allows salvaging of entire chains of deleted instructions! Previously we would remove all references from an instruction, which would make it impossible to use that instruction to salvage a later instruction in the instruction stream, because its operands were already removed. This reapplies the previous patch with a fix for a use-after-free. Differential Revision: https://reviews.llvm.org/D110568	2021-09-30 09:28:49 -07:00
Kazu Hirata	f631173d80	[llvm] Migrate from arg_operands to args (NFC) Note that arg_operands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-09-30 08:51:21 -07:00
Anna Thomas	6f2d01376d	[LoopPredication] Remove unused variable After rG452714f8f8037ff37f9358317651d1652e231db2, the Function `F` retrieved in LoopPredication is not used. Remove this unused variable to stop some buildbots (ASAN, clang-ppc) from failing.	2021-09-30 10:40:47 -04:00
Anna Thomas	452714f8f8	[BPI] Keep BPI available in loop passes through LoopStandardAnalysisResults This is analogous to D86156 (which preserves "lossy" BFI in loop passes). Lossy means that the analysis preserved may not be up to date with regards to new blocks that are added in loop passes, but BPI will not contain stale pointers to basic blocks that are deleted by the loop passes. This is achieved through BasicBlockCallbackVH in BPI, which calls eraseBlock that updates the data structures in BPI whenever a basic block is deleted. This patch does not have any changes in the upstream pipeline, since none of the loop passes in the pipeline use BPI currently. However, since BPI wasn't previously preserved in loop passes, the loop predication pass was invoking BPI on the entire function every time it ran in an LPM. This caused massive compile time in our downstream LPM invocation which contained loop predication. See updated test with an invocation of a loop-pipeline containing loop predication and -debug-pass turned ON. Reviewed-By: asbirlea, modimo Differential Revision: https://reviews.llvm.org/D110438	2021-09-30 10:27:05 -04:00
Joseph Huber	c11ebfea6d	[OpenMP][NFC] Fix linting messages in OpenMPOpt Summary: This patch addresses some linting messages I keep getting in my editor when working on OpenMPOpt.	2021-09-29 16:07:33 -04:00
Joseph Huber	87ce7e65f2	[OpenMP] Add missing distribute definitions to AAKernelInfo Summary: The RTL functions added in https://reviews.llvm.org/D110429 were mistakenly left out from the list of safe runtime calls in AAKernelInfo. This patch adds them in.	2021-09-29 16:06:34 -04:00
Sjoerd Meijer	367df18050	[LoopFlatten] Bail if we can't perform flattening after IV widening It can happen that after widening of the IV, flattening may not be possible, e.g. when it is deemed unprofitable. We were not properly checking this, which resulted in flattening being applied when it shouldn't, also leading to incorrect results (miscompilation). This should fix PR51980 (https://bugs.llvm.org/show_bug.cgi?id=51980) Differential Revision: https://reviews.llvm.org/D110712	2021-09-29 19:53:34 +01:00
Sanjay Patel	ea56dcb730	[InstCombine] fix miscompile from dropRedundantMaskingOfLeftShiftInput() The test is from https://llvm.org/PR51351. There are 2 related logic bugs from over-generalizing "lshr" to "any shr", but I'm not sure how to expose the difference for "MaskC" because instsimplify already folds ashr of -1. I'll extend instsimplify to catch the MaskD pattern as a follow-up, but this patch should be enough to avoid the miscompile.	2021-09-29 11:43:18 -04:00
Florian Hahn	0b4a4cc72d	[IndVarSimplify] Forget phi value after changing incoming value. This fixes an issue exposed by D71539, where IndVarSimplify tries to access an invalid cached SCEV expression after making changes to the underlying PHI instruction earlier. When changing the incoming value of a PHI, forget the cached SCEV for the PHI.	2021-09-29 14:44:13 +01:00
Djordje Todorovic	f8dfc35256	NFC: [Debugify] Fix a typo when checking variables in the original mode	2021-09-29 04:35:10 -07:00
Jinsong Ji	25c30324e9	[AIX] Change the linkage of profiling counter/data to be private We generate symbols like `profc`/`profd` for each function, and put them into csects. When there are weak functions, we generate weak symbols for the functions as well, with ELF (and some others), linker (binder) will discard and only keep one copy of the weak symbols. However, on AIX, the current binder can NOT discard the weak symbols if we put all of them into the same csect, as binder can NOT discard a subset of a csect. This creates a unique challenge for using those symbols to calculate some relative offsets. This patch changed the linkage of `profc`/`profd` symbols to be private, so that all the profc/profd for each weak symbol will be local to objects, and all kept in the csect, so we won't have problem. Although only one of the counters will be used, all the pointer in the profd is correct. The downside is that we won't be able to discard the duplicated counters and profile data, but those can not be discarded even if we keep the weak linkage, due to the binder limitation of not discarding a subsect of the csect either . Reviewed By: Whitney, MaskRay Differential Revision: https://reviews.llvm.org/D110422	2021-09-29 00:47:25 +00:00
Sanjay Patel	98fde3489a	[InstCombine] reduce redundant code for shl-binop folds This is NFCI (no-functional-change-intended), but there are benign diffs possible with commutable ops as seen in the test diffs. The transforms were repeated for the commutative opcodes, but that should not be necessary if we canonicalize the patterns that we're matching. If both operands of the binop match, that should get folded eventually. The transform that starts with a mask op seems to over-constrain the use checks, so that could be a potential enhancement.	2021-09-28 17:06:45 -04:00
Nikita Popov	abbbc480a1	Revert "Improve the effectiveness of BDCE's debug info salvaging" This reverts commit `f6954bf804`. This breaks the test-suite O3 build: /home/nikic/llvm-test-suite/build-O3/tools/timeit --summary Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o.time /home/nikic/llvm-project/build/bin/clang++ -DNDEBUG -O3 -w -Werror=date-time -save-stats=obj -save-stats=obj -std=c++11 -MD -MT Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o -MF Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o.d -o Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o -c ../Bitcode/Benchmarks/Halide/local_laplacian/local_laplacian.bc While deleting: i64 % Use still stuck around after Def is destroyed: %12620 = mul i64 %12619, <badref> clang++: /home/nikic/llvm-project/llvm/lib/IR/Value.cpp:103: llvm::Value::~Value(): Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"' failed.	2021-09-28 21:52:27 +02:00
Adrian Prantl	f6954bf804	Improve the effectiveness of BDCE's debug info salvaging This patch improves the effectiveness of BDCE's debug info salvaging by processing the instructions in reverse order and delaying dropAllReferences until after debug info salvaging. This allows salvaging of entire chains of deleted instructions! Previously we would remove all references from an instruction, which would make it impossible to use that instruction to salvage a later instruction in the instruction stream, because its operands were already removed. Differential Revision: https://reviews.llvm.org/D110568	2021-09-28 10:24:51 -07:00
Adrian Prantl	9637b045e6	Improve the effectiveness of ADCE's debug info salvaging This patch improves the effectiveness of ADCE's debug info salvaging by processing the instructions in reverse order and delaying dropAllReferences until after debug info salvaging. This allows salvaging of entire chains of deleted instructions! Previously we would remove all references from an instruction, which would make it impossible to use that instruction to salvage a later instruction in the instruction stream, because its operands were already removed. Differential Revision: https://reviews.llvm.org/D110462	2021-09-28 10:24:50 -07:00
Adrian Prantl	1b998a5f0c	Add salvageDebugInfo support for truncating/extending ptr/int conversions. This patch enables debug info salvaging for truncating/extending ptr int conversions. The testcase uncovered a bug in adce, which is addressed separately. rdar://80227769 Differential Revision: https://reviews.llvm.org/D110461	2021-09-28 10:24:50 -07:00
Alex Richardson	ebb3dc0833	[InstCombine] Fold ptrtoint(gep i8 null, x) -> x This commit is the InstCombine follow-up to the previous constant-folding change that enables noticeable optimizations for CHERI-enabled targets. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D110247	2021-09-28 17:57:37 +01:00
Alexey Bataev	f701505c45	[SLP]Improve vectorization of phi nodes by trying wider vectors. Try to improve vectorization of the PHI nodes by trying to vectorize similar instructions at the size of the widest possible vectors, then aggregating with compatible type PHIs and trying to vectoriza again and only if this failed, try smaller sizes of the vector factors for compatible PHI nodes. This restores performance of several benchmarks after tuning of the fp/int conversion instructions costs. Differential Revision: https://reviews.llvm.org/D108740	2021-09-28 07:20:36 -07:00
Sjoerd Meijer	0ea77502e2	[LoopFlatten] Updating Phi nodes after IV widening In rG6a076fa9539e, a problem with updating the old/narrow phi nodes after IV widening was introduced. If after widening of the IV the transformation is not applied, the narrow phi node was incorrectly modified, which should only happen if flattening happens. This can be seen in the added test widen-iv2.ll, which incorrectly had 1 incoming value, but should have its original 2 incoming values, which is now restored. Differential Revision: https://reviews.llvm.org/D110234	2021-09-28 15:09:20 +01:00
Sanjay Patel	1f8bead678	[InstCombine] reduce code for swapped predicate; NFC	2021-09-28 10:00:35 -04:00

... 2 3 4 5 6 ...

28968 Commits