llvm-project

Commit Graph

Author	SHA1	Message	Date
Bardia Mahjour	ebfe4de2c0	[DDG] Fix duplicate edge removal during pi-block formation When creating pi-blocks we try to avoid creating duplicate edges between outside nodes and the pi-block when an edge is of the same kind and direction as another one that has already been created. We do this by keeping track of the edges in an enumerated array called EdgeAlreadyCreated. The problem is that this array is declared local to the loop that iterates over the nodes in the pi-block, so the information gets lost every time a new inside-node is iterated over. The fix is to move the declaration to the outer loop. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D94094	2021-01-07 10:31:11 -05:00
Simon Pilgrim	fa6d897799	[Analysis] MemoryDepChecker::couldPreventStoreLoadForward - remove dead store. NFCI. As we're breaking from the loop when clamping MaxVF, clang static analyzer was warning that the VF iterator was being updated and never used.	2021-01-07 14:21:54 +00:00
Kazu Hirata	cfeecdf7b6	[llvm] Use llvm::all_of (NFC)	2021-01-06 18:27:36 -08:00
Juneyoung Lee	3a60a1f165	[InstSimplify] Fold insertelement vec, poison, idx into vec This is a simple patch that adds folding from `insertelement vec, poison, idx` into `vec`. Alive2 proof: https://alive2.llvm.org/ce/z/2y2vbC Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93994	2021-01-07 10:10:14 +09:00
Alina Sbirlea	63aeaf754a	[DominatorTree] Add support for mixed pre/post CFG views. Add support for mixed pre/post CFG views. Update usages of the MemorySSAUpdater to use the new DT API by requesting the DT updates to be done by the MSSAUpdater. Differential Revision: https://reviews.llvm.org/D93371	2021-01-06 14:53:09 -08:00
Nikita Popov	f6f6f6375d	[BasicAA] Fix BatchAA results for phi-phi assumptions Change the way NoAlias assumptions in BasicAA are handled. Instead of handling this inside the phi-phi code, always initially insert a NoAlias result into the map and keep track whether it is used. If it is used, then we require that we also get back NoAlias from the recursive queries. Otherwise, the entry is changed to MayAlias. Additionally, keep track of all location pairs we inserted that may still be based on assumptions higher up. If it turns out one of those assumptions is incorrect, we flush them from the cache. The compile-time impact for the new implementation is significantly higher than the previous iteration of this patch: https://llvm-compile-time-tracker.com/compare.php?from=c0bb9859de6991cc233e2dedb978dd118da8c382&to=c07112373279143e37568b5bcd293daf81a35973&stat=instructions However, it should avoid the exponential runtime cases we run into if we don't cache assumption-based results entirely. This also produces better results in some cases, because NoAlias assumptions can now start at any root, rather than just phi-phi pairs. This is not just relevant for analysis quality, but also for BatchAA consistency: Otherwise, results would once again depend on query order, though at least they wouldn't be wrong. This ended up both more complicated and more expensive than I hoped, but I wasn't able to come up with another solution that satisfies all the constraints. Differential Revision: https://reviews.llvm.org/D91936	2021-01-06 22:15:30 +01:00
Nikita Popov	221c3b174b	[InstSimplify] Canonicalize non-demanded shuffle op to poison (NFCI) I don't believe this has an observable effect, because the only thing we care about here is replacing the operand with a constant so following folds can apply. This change is just to make the representation follow canonical unary shuffle form.	2021-01-06 21:22:27 +01:00
Nikita Popov	d042f2db5b	[InstSimplify] Fold call null/undef to poison Calling null or undef results in immediate undefined behavior. Return poison instead of undef in this case, similar to what we do for immediate UB due to division by zero.	2021-01-06 21:09:30 +01:00
Arthur Eubanks	54c01057b6	Fix non-assert builds after D93828	2021-01-06 11:42:03 -08:00
Nikita Popov	a6df39236f	[InstSimplify] Fold out-of-bounds shift to poison Make InstSimplify return poison rather than undef for out-of-bounds shifts, as specified by LandRef: > If op2 is (statically or dynamically) equal to or larger than the > number of bits in op1, this instruction returns a poison value. Differential Revision: https://reviews.llvm.org/D93998	2021-01-06 20:41:37 +01:00
Arthur Eubanks	7fea561eb1	[CGSCC][Coroutine][NewPM] Properly support function splitting/outlining Previously when trying to support CoroSplit's function splitting, we added in a hack that simply added the new function's node into the original function's SCC (https://reviews.llvm.org/D87798). This is incorrect since it might be in its own SCC. Now, more similar to the previous design, we have callers explicitly notify the LazyCallGraph that a function has been split out from another one. In order to properly support CoroSplit, there are two ways functions can be split out. One is the normal expected "outlining" of one function into a new one. The new function may only contain references to other functions that the original did. The original function must reference the new function. The new function may reference the original function, which can result in the new function being in the same SCC as the original function. The weird case is when the original function indirectly references the new function, but the new function directly calls the original function, resulting in the new SCC being a parent of the original function's SCC. This form of function splitting works with CoroSplit's Switch ABI. The second way of splitting is more specific to CoroSplit. CoroSplit's Retcon and Async ABIs split the original function into multiple functions that all reference each other and are referenced by the original function. In order to keep the LazyCallGraph in a valid state, all new functions must be processed together, else some nodes won't be populated. To keep things simple, this only supports the case where all new edges are ref edges, and every new function references every other new function. There can be a reference back from any new function to the original function, putting all functions in the same RefSCC. This also adds asserts that all nodes in a (Ref)SCC can reach all other nodes to prevent future incorrect hacks. The original hacks in https://reviews.llvm.org/D87798 are no longer necessary since all new functions should have been registered before calling updateCGAndAnalysisManagerForPass. This fixes all coroutine tests when opt's -enable-new-pm is true by default. This also fixes PR48190, which was likely due to the previous hack breaking SCC invariants. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D93828	2021-01-06 11:19:15 -08:00
Kazu Hirata	cd088ba7e6	[llvm] Use llvm::lower_bound and llvm::upper_bound (NFC)	2021-01-05 21:15:59 -08:00
Juneyoung Lee	29f8628d1f	[Constant] Add containsPoisonElement This patch - Adds containsPoisonElement that checks existence of poison in constant vector elements, - Renames containsUndefElement to containsUndefOrPoisonElement to clarify its behavior & updates its uses properly With this patch, isGuaranteedNotToBeUndefOrPoison's tests w.r.t constant vectors are added because its analysis is improved. Thanks! Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94053	2021-01-06 12:10:33 +09:00
Kazu Hirata	65cd3cbb3f	[Inliner] Compute the full cost for the cost benefit analsysis This patch teaches the inliner to compute the full cost for a call site where the newly introduced cost benefit analysis is enabled. Note that the cost benefit analysis requires the full cost to be computed. However, without this patch or the -inline-cost-full option, the early termination logic would kick in when the cost exceeds the threshold, so we don't get to perform the cost benefit analysis. For this reason, we would need to specify four clang options: -mllvm -inline-cost-full -mllvm -inline-enable-cost-benefit-analysis This patch eliminates the need to specify -inline-cost-full. Differential Revision: https://reviews.llvm.org/D93658	2021-01-05 12:48:49 -08:00
Whitney Tsang	314ccc0013	[LoopNest] Remove unused include. Differential Revision: https://reviews.llvm.org/D93665	2021-01-05 20:05:31 +00:00
Whitney Tsang	c005518936	[LoopNest] Allow empty basic blocks without loops Allow loop nests with empty basic blocks without loops in different levels as perfect. Reviewers: Meinersbur Differential Revision: https://reviews.llvm.org/D93665	2021-01-05 15:09:38 +00:00
Xun Li	3e2b42489f	Remove RefSCC::handleTrivialEdgeInsertion This function no longer does anything useful. It probably did something originally but latter changes removed them and didn't clean up this function. The checks are already done in the callers as well. Differential Revision: https://reviews.llvm.org/D94055	2021-01-04 20:21:01 -08:00
Juneyoung Lee	f665a8c5b8	[InstSimplify] gep with poison operand is poison This is a tiny update to fold gep poison into poison. :) Alive2 proofs: https://alive2.llvm.org/ce/z/7Nwdri https://alive2.llvm.org/ce/z/sDP4sC	2021-01-05 11:07:49 +09:00
Sanjay Patel	36263a7ccc	[LoopUtils] remove redundant opcode parameter; NFC While here, rename the inaccurate getRecurrenceBinOp() because that was also used to get CmpInst opcodes. The recurrence/reduction kind should always refer to the expected opcode for a reduction. SLP appears to be the only direct caller of createSimpleTargetReduction(), and that calling code ideally should not be carrying around both an opcode and a reduction kind. This should allow us to generalize reduction matching to use intrinsics instead of only binops.	2021-01-04 17:05:28 -05:00
Juneyoung Lee	abbef2fd46	[ValueTracking] isGuaranteedNotToBePoison should return true on undef This is a one-line fix to isGuaranteedNotToBePoison to return true if undef is given.	2021-01-05 06:50:02 +09:00
Whitney Tsang	de6d43f16c	Revert "[LoopNest] Allow empty basic blocks without loops" This reverts commit `9a17bff4f7`.	2021-01-04 20:42:21 +00:00
Whitney Tsang	9a17bff4f7	[LoopNest] Allow empty basic blocks without loops Allow loop nests with empty basic blocks without loops in different levels as perfect. Reviewers: Meinersbur Differential Revision: https://reviews.llvm.org/D93665	2021-01-04 19:59:50 +00:00
Kazu Hirata	848e8f938f	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-04 11:42:44 -08:00
Caroline Concatto	060cfd9795	[AArch64][SVE]Add cost model for masked gather and scatter for scalable vector. A new TTI interface has been added 'Optional <unsigned>getMaxVScale' that returns the maximum vscale for a given target. When known getMaxVScale is used to compute the cost of masked gather scatter for scalable vector. Depends on D92094 Differential Revision: https://reviews.llvm.org/D93030	2021-01-04 13:59:58 +00:00
Nikita Popov	3715c99be9	[InstSimplify] Fold nnan/ninf violation to poison As the comment already indicates, performing an operation with nnan/ninf flags on a nan/inf or undef results in poison. Now that we have a proper poison value, we no longer need to relax it to undef.	2021-01-03 22:05:40 +01:00
Nikita Popov	766cf7f32e	[InstSimplify] Fold division by zero to poison Div/rem by zero is immediate undefined behavior and anything goes. Currently we fold it to undef, this patch changes it to fold to poison instead, which is slightly stronger. Differential Revision: https://reviews.llvm.org/D93995	2021-01-03 20:52:45 +01:00
Kazu Hirata	ba82c0b315	[llvm] Call *(Set\|Map)::erase directly (NFC) We can erase an item in a set or map without checking its membership first.	2021-01-03 09:57:47 -08:00
Nikita Popov	f094d65bea	[InstSimplify] Fix addo/subo with undef (PR43188) We can't fold the first result to undef, because not all values may be reachable under the constraint that no overflow occurred. Use the same folds we do for saturated math instead. Proofs: uaddo: https://alive2.llvm.org/ce/z/zf55N_ saddo: https://alive2.llvm.org/ce/z/a_xPgS usubo: https://alive2.llvm.org/ce/z/DmRqwt ssubo: https://alive2.llvm.org/ce/z/8ag7U-	2021-01-03 18:51:49 +01:00
Nikita Popov	c6ad00d709	[InstSimplify] Return poison for out of bounds extractelement This is the same change as D93990, but for extractelement rather than insertelement. > If idx exceeds the length of val for a fixed-length vector, the > result is a poison value. For a scalable vector, if the value of > idx exceeds the runtime length of the vector, the result is a > poison value.	2021-01-03 18:15:58 +01:00
Juneyoung Lee	2139958b53	[InstSimplify] Return poison if insertelement touches out of bounds This is a simple patch that updates InstSimplify to return poison if the index is/can be out-of-bounds Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93990	2021-01-04 00:43:02 +09:00
Gil Rapaport	d9c0b128e3	[SCEV] Simplify trunc to zero based on known bits Let getTruncateExpr() short-circuit to zero when the value being truncated is known to have at least as many trailing zeros as the target type. Differential Revision: https://reviews.llvm.org/D93973	2021-01-03 13:57:12 +02:00
Sanjay Patel	c74e8539ff	[Analysis] flatten enums for recurrence types This is almost all mechanical search-and-replace and no-functional-change-intended (NFC). Having a single enum makes it easier to match/reason about the reduction cases. The goal is to remove `Opcode` from reduction matching code in the vectorizers because that makes it harder to adapt the code to handle intrinsics. The code in RecurrenceDescriptor::AddReductionVar() is the only place that required closer inspection. It uses a RecurrenceDescriptor and a second InstDesc to sometimes overwrite part of the struct. It seem like we should be able to simplify that logic, but it's not clear exactly which cmp+sel patterns that we are trying to handle/avoid.	2021-01-01 12:20:16 -05:00
Nikita Popov	14e540febc	[LVI] Handle unions of conditions LVI previously handled "if (L && R)" conditions, but not "if (L \|\| R)" conditions. The latter case can still produce useful information if L and R both constrain the same variable. This adds support for handling the "if (L \|\| R)" case as well. The only difference is that we take the union instead of the intersection of the lattice values.	2021-01-01 16:46:21 +01:00
Andrew Litteken	0d21e66014	[IRSim] Letting call instructions be legal for similarity identification. Here we let non-intrinsic calls be considered legal and valid for similarity only if the call is not indirect, and has a name. For two calls to be considered similar, they must have the same name, the same function types, and the same set of parameters, including tail calls and calling conventions. Tests are found in unittests/Analysis/IRSimilarityIdentifierTest.cpp. Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87312	2020-12-31 20:52:45 -06:00
Sanjay Patel	eaab71106b	[Analysis] reduce code for matching min/max; NFC This might also make it easier to adapt if we want to match min/max intrinsics rather than cmp+sel idioms. The 'const' part is to potentially avoid confusion in calling code. There's some surprising and possibly wrong behavior related to matching min/max reductions differently than other reductions.	2020-12-31 17:19:37 -05:00
Andrew Litteken	d974ac0224	[IRSim] Letting gep instructions be legal for similarity identification. GetElementPtr instructions require the extra check that all operands after the first must only be constants and be exactly the same to be considered similar. Tests are found in unittests/Analysis/IRSimilarityIdentifierTest.cpp.	2020-12-31 14:41:14 -06:00
Juneyoung Lee	509fa8e02e	[SCEV] recognize logical and/or pattern This patch makes SCEV recognize 'select A, B, false' and 'select A, true, B'. This is a performance improvement that will be helpful after unsound select -> and/or transformation is removed, as discussed in D93065. SCEV's answers for the select form should be a bit more conservative than the equivalent `and A, B` / `or A, B`. Take this example: https://alive2.llvm.org/ce/z/NsP9ue . To check whether it is valid for SCEV's computeExitLimit to return min(n, m) as ExactNotTaken value, I put llvm.assume at tgt. It fails because the exit limit becomes poison if n is zero and m is poison. This is problematic if e.g. the exit value of i is replaced with min(n, m). If either n or m is constant, we can revive the analysis again. I added relevant tests and put alive2 links there. If and is used instead, this is okay: https://alive2.llvm.org/ce/z/K9rbJk . Hence the existing analysis is sound. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93882	2021-01-01 04:37:57 +09:00
Kazu Hirata	b557c32ae9	[MemorySSA, BPF] Use isa instead of dyn_cast (NFC)	2020-12-31 09:39:13 -08:00
Kazu Hirata	a87c7003ac	[Analysis] Remove unused code recursivelySimplifyInstruction (NFC) The last use of the function, located in RemovePredecessorAndSimplify, was removed on Dec 25, 2020 in commit `46bea9b297`. The last use of RemovePredecessorAndSimplify was removed on Sep 29, 2010 in commit `99c985c37d`.	2020-12-30 17:45:40 -08:00
Juneyoung Lee	9b29610228	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923	2020-12-30 22:36:08 +09:00
Luo, Yuanke	981a0bd858	[X86] Add x86_amx type for intel AMX. The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it is used by load/store instruction. So amx intrinsics only operate on type x86_amx. It can help to separate amx intrinsics from llvm IR instructions (+-*/). Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981. Differential Revision: https://reviews.llvm.org/D91927	2020-12-30 13:52:13 +08:00
Kazu Hirata	f76e83bfbb	[Analysis] Use llvm::append_range (NFC)	2020-12-29 19:23:21 -08:00
Florian Hahn	b980bed34b	[MSSAUpdater] Skip renaming when inserting def in unreachable block. This fixes a updater crash when moving memory defs between unreachable blocks. Fixes PR48616.	2020-12-29 18:22:12 +00:00
Kazu Hirata	2883cd98f3	[CFGPrinter] Use succ_empty (NFC)	2020-12-28 19:55:20 -08:00
Juneyoung Lee	0f2c180163	[ValueTracking] Implement impliesPoison This PR adds impliesPoison(ValAssumedPoison, V) that returns true if V is poison under the assumption that ValAssumedPoison is poison. For example, impliesPoison('icmp X, 10', 'icmp X, Y') return true because 'icmp X, Y' is poison if 'icmp X, 10' is poison. impliesPoison can be used for sound optimization of select, as discussed in D77868. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D78152	2020-12-29 06:50:38 +09:00
Sanjay Patel	236c4524a7	[InstSimplify] remove ctpop of 1 (low) bit https://llvm.org/PR48608 As noted in the test comment, we could handle a more general case in instcombine and remove this, but I don't have evidence that we need to do that. https://alive2.llvm.org/ce/z/MRW9gD	2020-12-28 16:06:20 -05:00
Nikita Popov	dcd21572f9	[ValueTracking] Fix isKnownNonEqual() with constexpr mul Confusingly, BinaryOperator is not an Operator, OverflowingBinaryOperator is... We were implicitly assuming that the multiply is an Instruction here. This fixes the assertion failure reported in https://reviews.llvm.org/D92726#2472827.	2020-12-28 18:32:57 +01:00
Juneyoung Lee	860199dfbe	[ValueTracking] Use m_LogicalAnd/Or to look into conditions This patch updates isImpliedCondition/isKnownNonZero to look into select form of and/or as well. See llvm.org/pr48353 and D93065 for more context Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93845	2020-12-28 08:32:45 +09:00
Nikita Popov	0af42d3dc7	[PatternMatch][LVI] Handle select-form and/or in LVI Following the discussion in D93065, this adds m_LogicalAnd() and m_LogicalOr() matchers, that match A && B and A \|\| B logical operations, either as bitwise operations or select expressions. As an example usage, LVI is adapted to use these matchers for its condition reasoning. The plan here is to switch other parts of LLVM that reason about and/or of conditions to also support the select forms, and then merge D93065 (or a variant thereof) to disable the poison-unsafe select to and/or transform. Differential Revision: https://reviews.llvm.org/D93827	2020-12-27 17:39:02 +01:00
Nikita Popov	b218407512	[ValueTracking] Handle more non-trivial conditions in isKnownNonZero() In `35676a4f9a` I've added handling for non-trivial dominating conditions that imply non-zero on the true branch. This adds the same support for the false branch. The changes in pr45360.ll change block ordering and naming, but don't change the control flow. The urem is still guaraded by a non-zero check correctly.	2020-12-26 15:48:04 +01:00
Nikita Popov	c795dd1926	[BasicAA] Pass AC/DT to isKnownNonEqual() This allows us to handle assumes etc in the recursive isKnownNonZero() checks.	2020-12-25 18:29:20 +01:00
Nikita Popov	35676a4f9a	[InstCombine] Generalize icmp handling in isKnownNonZero() The dominating condition handling in isKnownNonZero() currently only takes into account conditions of the form "x != 0" or "x == 0". However, there are plenty of other conditions that imply non-zero, a common one being "x s> 0". Peculiarly, the handling for assumes was already dealing with more general non-zero-ness conditions, so this just reuses the same logic for the dominating condition case.	2020-12-25 16:49:23 +01:00
Nikita Popov	a3614a31c4	[BasicAA] Pass context instruction to isKnownNonZero() This allows us to handle additional cases like assumes.	2020-12-25 12:58:19 +01:00
Nikita Popov	b96a6ea0a9	[BasicAA] Make sure context instruction is symmetric D71264 started using a context instruction in a computeKnownBits() call. However, if aliasing between two GEPs is checked, then the choice of context instruction will be different for alias(GEP1, GEP2) and alias(GEP2, GEP1), which is not supposed to happen. Resolve this by remembering which GEP a certain VarIndex belongs to, and use that as the context instruction. This makes the choice of context instruction predictable and symmetric. It should be noted that this choice of context instruction is non-optimal (just like the previous choice): The AA query result is only valid at points that are reachable from both instructions. Using either one of them is conservatively correct, but a larger context may also be valid to use. Differential Revision: https://reviews.llvm.org/D93183	2020-12-25 11:35:46 +01:00
Kazu Hirata	200b15af45	[Analysis] Remove spliceFunction (NFC) The function was introduced without a user on Jan 3, 2011 in commit `0f87ca7733`. We still don't have a user yet.	2020-12-23 21:57:25 -08:00
Andrew Litteken	48ad8194a5	[IRSim] Adding support for isomorphic predicates Some predicates, can be considered the same as long as the operands are flipped. For example, a > b gives the same result as b > a. This maps instructions in a greater than form, to their appropriate less than form, swapping the operands in the IRInstructionData only, allowing for more flexible matching. Tests: llvm/test/Transforms/IROutliner/outlining-isomorphic-predicates.ll llvm/unittests/Analysis/IRSimilarityIdentifierTest.cpp Reviewers: jroelofs, paquette Recommit of commit `0503926602` Differential Revision: https://reviews.llvm.org/D87310	2020-12-23 19:42:35 -06:00
Andrew Litteken	45a4f34bd1	Revert "[IRSim] Adding support for isomorphic predicates" Reverting due to unit test errors between commits. This reverts commit `0503926602`.	2020-12-23 15:14:19 -06:00
Andrew Litteken	0503926602	[IRSim] Adding support for isomorphic predicates Some predicates, can be considered the same as long as the operands are flipped. For example, a > b gives the same result as b > a. This maps instructions in a greater than form, to their appropriate less than form, swapping the operands in the IRInstructionData only, allowing for more flexible matching. Tests: llvm/test/Transforms/IROutliner/outlining-isomorphic-predicates.ll llvm/unittests/Analysis/IRSimilarityIdentifierTest.cpp Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87310	2020-12-23 15:02:00 -06:00
Andrew Litteken	cce473e0c5	[IRSim] Adding commutativity matching to structure checking Certain instructions, such as adds and multiplies can have the operands flipped and still be considered the same. When we are analyzing structure, this gives slightly more flexibility to create a mapping from one region to another. We can add both operands in a corresponding instruction to an operand rather than just the exact match. We then try to eliminate items from the set, until there is only one valid mapping between the regions of code. We do this for adds, multiplies, and equality checking. However, this is not done for floating point instructions, since the order can still matter in some cases. Tests: llvm/test/Transforms/IROutliner/outlining-commutative-fp.ll llvm/test/Transforms/IROutliner/outlining-commutative.ll llvm/unittests/Analysis/IRSimilarityIdentifierTest.cpp Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87311	2020-12-23 15:02:00 -06:00
Evgeniy Brevnov	9fb074e7bb	[BPI] Improve static heuristics for "cold" paths. Current approach doesn't work well in cases when multiple paths are predicted to be "cold". By "cold" paths I mean those containing "unreachable" instruction, call marked with 'cold' attribute and 'unwind' handler of 'invoke' instruction. The issue is that heuristics are applied one by one until the first match and essentially ignores relative hotness/coldness of other paths. New approach unifies processing of "cold" paths by assigning predefined absolute weight to each block estimated to be "cold". Then we propagate these weights up/down IR similarly to existing approach. And finally set up edge probabilities based on estimated block weights. One important difference is how we propagate weight up. Existing approach propagates the same weight to all blocks that are post-dominated by a block with some "known" weight. This is useless at least because it always gives 50\50 distribution which is assumed by default anyway. Worse, it causes the algorithm to skip further heuristics and can miss setting more accurate probability. New algorithm propagates the weight up only to the blocks that dominates and post-dominated by a block with some "known" weight. In other words, those blocks that are either always executed or not executed together. In addition new approach processes loops in an uniform way as well. Essentially loop exit edges are estimated as "cold" paths relative to back edges and should be considered uniformly with other coldness/hotness markers. Reviewed By: yrouban Differential Revision: https://reviews.llvm.org/D79485	2020-12-23 22:47:36 +07:00
Kazu Hirata	e6fde1ae7d	[MemorySSA] Use is_contained (NFC)	2020-12-22 19:58:54 -08:00
Siddhesh Poyarekar	6fcb039956	Fold comparison of __builtin_object_size expression with -1 for non-const size When __builtin_dynamic_object_size returns a non-constant expression, it cannot be -1 since that is an invalid return value for object size. However since passes running after the substitution don't know this, they are unable to optimize away the comparison and hence the comparison and branch stays in there. This change generates an appropriate call to llvm.assume to help the optimizer folding the test. glibc is considering adopting __builtin_dynamic_object_size for additional protection[1] and this change will help reduce branching overhead in fortified implementations of all of the functions that don't have the __builtin___*_chk type builtins, e.g. __ppoll_chk. Also remove the test limit-max-iterations.ll because it was deemed unnecessary during review. [1] https://sourceware.org/pipermail/libc-alpha/2020-November/120191.html Differential Revision: https://reviews.llvm.org/D93015	2020-12-22 10:56:31 +01:00
Nikita Popov	82bd64fff6	[AA] byval argument is identified function local byval arguments should mostly get the same treatment as noalias arguments in alias analysis. This was not the case for the isIdentifiedFunctionLocal() function. Marking byval arguments as identified function local means that they cannot alias with other arguments, which I believe is correct. Differential Revision: https://reviews.llvm.org/D93602	2020-12-21 20:18:23 +01:00
Sanjay Patel	38ca7face6	[InstSimplify] reduce logic with inverted add/sub ops https://llvm.org/PR48559 This could be part of a larger ValueTracking API, but I don't see that currently. https://rise4fun.com/Alive/gR0 Name: and Pre: C1 == ~C2 %sub = add i8 %x, C1 %sub1 = sub i8 C2, %x %r = and i8 %sub, %sub1 => %r = 0 Name: or Pre: C1 == ~C2 %sub = add i8 %x, C1 %sub1 = sub i8 C2, %x %r = or i8 %sub, %sub1 => %r = -1 Name: xor Pre: C1 == ~C2 %sub = add i8 %x, C1 %sub1 = sub i8 C2, %x %r = xor i8 %sub, %sub1 => %r = -1	2020-12-21 08:51:43 -05:00
Kazu Hirata	3285ee143b	[Analysis, IR, CodeGen] Use llvm::erase_if (NFC)	2020-12-20 09:19:35 -08:00
Nikita Popov	6fa1230594	[MemLoc] Fix debug print for LocationSize	2020-12-20 17:52:48 +01:00
Kazu Hirata	a6516a820d	[Analysis] Remove dead function getInstTypePair (NFC) The last use of getInstTypePair with two parameters was removed on on Jan 9, 2015 in commit `33d7f9de33`. It seems to be unused since then.	2020-12-19 10:57:35 -08:00
Kazu Hirata	805d59593f	[Analysis, CodeGen, IR] Use contains (NFC)	2020-12-18 19:08:17 -08:00
Roman Lebedev	e9289dc25f	[InstSimplify] Don't miscompile `X == 0 ? abs(X) : -abs(X) --> -abs(X)` xform The transform wasn't checking that the LHS of the comparison is the `X` in question... This is the miscompile that was holding up D87188. Thanks to Dave Green for producing an actionable reproducer!	2020-12-18 21:18:13 +03:00
Florian Hahn	a74941da71	Revert "[BasicAA] Handle two unknown sizes for GEPs" Temporarily revert commit `8b1c4e310c`. After `8b1c4e310c` the compile-time for `MultiSource/Benchmarks/MiBench/consumer-lame` dramatically increases with -O3 & LTO, causing issues for builders with that configuration. I filed PR48553 with a smallish reproducer that shows a 10-100x compile time increase.	2020-12-18 17:59:12 +00:00
Cullen Rhodes	7c8796f9db	[TTI] Add supportsScalableVectors target hook This is split off from D91718 and adds a new target hook supportsScalableVectors that can be queried to check if scalable vectors are supported by the backend. For AArch64 this returns true if SVE is enabled. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D93060	2020-12-18 10:37:01 +00:00
Kazu Hirata	9895c7012d	[InlineCost] Implement cost-benefit-based inliner This patch adds an alternative cost metric for the inliner to take into account both the cost (i.e. size) and cycle count savings into account. Without this patch, we decide to inline a given call site if the size of inlining the call site is below the threshold that is computed according to the hotness of the call site. This patch adds a new cost metric, turned off by default, to take over the handling of hot call sites. Specifically, with the new cost metric, we decide to inline a given call site if the ratio of cycle savings to size exceeds a threshold. The cycle savings are computed from call site costs, parameter propagation, folded conditional branches, etc, all weighted by their respective profile counts. The size is primarily the callee size, but we subtract call site costs and the size of basic blocks that are never executed. The new cost metric implicitly takes advantage of the machine function splitter recently introduced by Snehasish Kumar, which dramatically reduces the cost of duplicating (e.g. inlining) cold basic blocks by placing cold basic blocks of hot functions in the .text.split section. We evaluated the new cost metric on clang bootstrap and SPECInt 2017. For clang bootstrap, we observe 0.69% runtime improvement. For SPECInt we report the change in IntRate the C/C++ benchmarks. All benchmarks apart from perlbench and omnetpp improve, on average by 0.21% with the max for mcf at 1.96%. Benchmark % Change 500.perlbench_r -0.45 502.gcc_r 0.13 505.mcf_r 1.96 520.omnetpp_r -0.28 523.xalancbmk_r 0.49 525.x264_r 0.00 531.deepsjeng_r 0.00 541.leela_r 0.35 557.xz_r 0.21 Differential Revision: https://reviews.llvm.org/D92780	2020-12-18 00:37:24 -08:00
Kazu Hirata	ed6a135246	[IVDescriptors] Remove getConsecutiveDirection (NFC) The last use of the function was removed on Sep 18, 2016 in commit `5f8cc0c346`. The function was later moved to llvm/lib/Analysis/IVDescriptors.cpp on Sep 12, 2018 in commit `7e98d69847`.	2020-12-17 20:19:15 -08:00
dfukalov	9ed8e0caab	[NFC] Reduce include files dependency and AA header cleanup (part 2). Continuing work started in https://reviews.llvm.org/D92489: Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h". Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92852	2020-12-17 14:04:48 +03:00
Bardia Mahjour	6eff12788e	[DDG] Data Dependence Graph - DOT printer - recommit This is being recommitted to try and address the MSVC complaint. This patch implements a DDG printer pass that generates a graph in the DOT description language, providing a more visually appealing representation of the DDG. Similar to the CFG DOT printer, this functionality is provided under an option called -dot-ddg and can be generated in a less verbose mode under -dot-ddg-only option. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D90159	2020-12-16 12:37:36 -05:00
Whitney Tsang	fa3693ad0b	[LoopNest] Handle loop-nest passes in LoopPassManager Per http://llvm.org/OpenProjects.html#llvm_loopnest, the goal of this patch (and other following patches) is to create facilities that allow implementing loop nest passes that run on top-level loop nests for the New Pass Manager. This patch extends the functionality of LoopPassManager to handle loop-nest passes by specializing the definition of LoopPassManager that accepts both kinds of passes in addPass. Only loop passes are executed if L is not a top-level one, and both kinds of passes are executed if L is top-level. Currently, loop nest passes should have the following run method: PreservedAnalyses run(LoopNest &, LoopAnalysisManager &, LoopStandardAnalysisResults &, LPMUpdater &); Reviewed By: Whitney, ychen Differential Revision: https://reviews.llvm.org/D87045	2020-12-16 17:07:14 +00:00
Max Kazantsev	8b330f1f69	[SCEV] Add missing type check into getRangeForAffineNoSelfWrappingAR We make type widening without checking if it's needed. Bail if the max iteration count is wider than AR's type.	2020-12-15 14:50:32 +07:00
Kazu Hirata	ddc5a5920e	[Analysis] Use llvm::erase_value (NFC)	2020-12-14 22:40:13 -08:00
Bardia Mahjour	a29ecca781	Revert "[DDG] Data Dependence Graph - DOT printer" This reverts commit `fd4a10732c`, to investigate the failure on windows: http://lab.llvm.org:8011/#/builders/127/builds/3274	2020-12-14 16:54:20 -05:00
Bardia Mahjour	fd4a10732c	[DDG] Data Dependence Graph - DOT printer This patch implements a DDG printer pass that generates a graph in the DOT description language, providing a more visually appealing representation of the DDG. Similar to the CFG DOT printer, this functionality is provided under an option called -dot-ddg and can be generated in a less verbose mode under -dot-ddg-only option. Differential Revision: https://reviews.llvm.org/D90159	2020-12-14 16:41:14 -05:00
Philip Reames	f5fe8493e5	[LAA] Relax restrictions on early exits in loop structure his is a preparation patch for supporting multiple exits in the loop vectorizer, by itself it should be mostly NFC. This patch moves the loop structure checks from LAA to their respective consumers (where duplicates don't already exist). Moving the checks does end up changing some of the optimization warnings and debug output slightly, but nothing that appears to be a regression. Why do this? Well, after auditing the code, I can't actually find anything in LAA itself which relies on having all instructions within a loop execute an equal number of times. This patch simply makes this explicit so that if one consumer - say LV in the near future (hopefully) - wants to handle a broader class of loops, it can do so. Differential Revision: https://reviews.llvm.org/D92066	2020-12-14 12:44:01 -08:00
Stanislav Mekhanoshin	87d7757bbe	[SLP] Control maximum vectorization factor from TTI D82227 has added a proper check to limit PHI vectorization to the maximum vector register size. That unfortunately resulted in at least a couple of regressions on SystemZ and x86. This change reverts PHI handling from D82227 and replaces it with a more general check in SLPVectorizerPass::tryToVectorizeList(). Moved to tryToVectorizeList() it allows to restart vectorization if initial chunk fails. However, this function is more general and handles not only PHI but everything which SLP handles. If vectorization factor would be limited to maximum vector register size it would limit much more vectorization than before leading to further regressions. Therefore a new TTI callback getMaximumVF() is added with the default 0 to preserve current behavior and limit nothing. Then targets can decide what is better for them. The callback gets ElementSize just like a similar getMinimumVF() function and the main opcode of the chain. The latter is to avoid regressions at least on the AMDGPU. We can have loads and stores up to 128 bit wide, and <2 x 16> bit vector math on some subtargets, where the rest shall not be vectorized. I.e. we need to differentiate based on the element size and operation itself. Differential Revision: https://reviews.llvm.org/D92059	2020-12-14 08:49:40 -08:00
Nikita Popov	22dba707b0	[AC] Handle (X+C1)<C2 assumes (PR48408) InstCombine canonicalizes X>C && X<C' style comparisons into (X+C1)<C2. This type of expression is recognized by some analyses like LVI, but currently not when used inside assumptions, because AssumptionCache does not track affected values for it.	2020-12-13 21:00:32 +01:00
Nikita Popov	bb939ebfd7	[BasicAA] Handle known non-zero variable index BasicAA currently handles cases like ScaleV0 + (-Scale)V1 where V0 != V1, but does not handle the simpler case of Scale*V with V != 0. Add it based on an isKnownNonZero() call. I'm not passing a context instruction for now, because the existing approach of always using GEP1 for context could result in symmetry issues. Differential Revision: https://reviews.llvm.org/D93162	2020-12-13 13:20:05 +01:00
Kazu Hirata	9293b251b5	[Analysis/Interval] Remove isLoop (NFC) The last use of isLoop was removed on Apr 29, 2002 in commit `09bbb5c015` as part of an effort to remove "old induction varaible cannonicalization pass built on top of interval analysis".	2020-12-12 10:09:35 -08:00
Nikita Popov	d716eab197	[BasicAA] Make non-equal index handling simpler to extend (NFC)	2020-12-12 15:00:47 +01:00
Kazu Hirata	eb44682d67	[Analysis] Use is_contained (NFC)	2020-12-11 21:19:31 -08:00
Mircea Trofin	f76b7f22f0	[MLGO] Fix build break as result of new InstructionCost (D91174)	2020-12-11 20:28:39 -08:00
Nikita Popov	8b1c4e310c	[BasicAA] Handle two unknown sizes for GEPs If we have two unknown sizes and one GEP operand and one non-GEP operand, then we currently simply return MayAlias. The comment says we can't do anything useful ... but we can! We can still check that the underlying objects are different (and do so for the GEP-GEP case). To reduce the compile-time impact, this a) checks this early, before doing the relatively expensive GEP decomposition that will not be used and b) doesn't do the check if the other operand is a phi or select. In that case, the phi/select will already recurse, so this would just do two slightly different recursive walks that arrive at the same roots. Compile-time is still a bit of a mixed bag: https://llvm-compile-time-tracker.com/compare.php?from=624af932a808b363a888139beca49f57313d9a3b&to=845356e14adbe651a553ed11318ddb5e79a24bcd&stat=instructions On average this is a small improvement, but sqlite with ThinLTO has a 0.5% regression (lencod has a 1% improvement). The BasicAA test case checks this by using two memsets with unknown size. However, the more interesting case where this is useful is the LoopVectorize test case, as analysis of accesses in loops tends to always us unknown sizes. Differential Revision: https://reviews.llvm.org/D92401	2020-12-11 18:45:53 +01:00
David Sherwood	9b76160e53	[Support] Introduce a new InstructionCost class This is the first in a series of patches that attempts to migrate existing cost instructions to return a new InstructionCost class in place of a simple integer. This new class is intended to be as light-weight and simple as possible, with a full range of arithmetic and comparison operators that largely mirror the same sets of operations on basic types, such as integers. The main advantage to using an InstructionCost is that it can encode a particular cost state in addition to a value. The initial implementation only has two states - Normal and Invalid - but these could be expanded over time if necessary. An invalid state can be used to represent an unknown cost or an instruction that is prohibitively expensive. This patch adds the new class and changes the getInstructionCost interface to return the new class. Other cost functions, such as getUserCost, etc., will be migrated in future patches as I believe this to be less disruptive. One benefit of this new class is that it provides a way to unify many of the magic costs in the codebase where the cost is set to a deliberately high number to prevent optimisations taking place, e.g. vectorization. It also provides a route to represent the extremely high, and unknown, cost of scalarization of scalable vectors, which is not currently supported. Differential Revision: https://reviews.llvm.org/D91174	2020-12-11 08:12:54 +00:00
Arthur Eubanks	c80e193587	[NFC] Inline maxDevirtIterationsReached() This was separated in the past because the cl::opt was in the .cpp file but DevirtSCCRepeatedPass::run() was in the .h file. Now that DevirtSCCRepeatedPass::run() is in the .cpp file, get rid of the tiny maxDevirtIterationsReached(), it's bad for readability.	2020-12-10 22:12:29 -08:00
Xun Li	31e60b9133	[coroutine] should disable inline before calling coro split This is a rework of D85812, which didn't land. When callee coroutine function is inlined into caller coroutine function before coro-split pass, llvm will emits "coroutine should have exactly one defining @llvm.coro.begin". It seems that coro-early pass can not handle this quiet well. So we believe that unsplited coroutine function should not be inlined. This patch fix such issue by not inlining function if it has attribute "coroutine.presplit" (it means the function has not been splited) to fix this issue test plan: check-llvm, check-clang In D85812, there was suggestions on moving the macros to Attributes.td to avoid circular header dependency issue. I believe it's not worth doing just to be able to use one constant string in one place. Today, there are already 3 possible attribute values for "coroutine.presplit": `c6543cc6b8/llvm/lib/Transforms/Coroutines/CoroInternal.h (L40-L42)` If we move them into Attributes.td, we would be adding 3 new attributes to EnumAttr, just to support this, which I think is an overkill. Instead, I think the best way to do this is to add an API in Function class that checks whether this function is a coroutine, by checking the attribute by name directly. Differential Revision: https://reviews.llvm.org/D92706	2020-12-08 08:53:08 -08:00
Philip Reames	2656885390	Teach isKnownNonEqual how to recurse through invertible multiplies Build on the work started in `8f07629`, and add the multiply case. In the process, more clearly describe the requirement for the operation we're looking through. Differential Revision: https://reviews.llvm.org/D92726	2020-12-07 14:52:08 -08:00
Florian Hahn	f19876c536	[ConstraintElimination] Bail out if system gets too big. For some inputs, the constraint system can grow quite large during solving, because it replaces complex constraints with one or more simpler constraints. This adds a cut-off to avoid compile-time explosion on problematic inputs.	2020-12-06 20:19:15 +00:00
Nikita Popov	5e69e2ebad	[BasicAA] Migrate "same base pointer" logic to decomposed GEPs BasicAA has some special bit of logic for "same base pointer" GEPs that performs a structural comparison: It only looks at two GEPs with the same base (as opposed to two GEP chains with a MustAlias base) and compares their indexes in a limited way. I generalized part of this code in D91027, and this patch merges the remainder into the normal decomposed GEP logic. What this code ultimately wants to do is to determine that gep %base, %idx1 and gep %base, %idx2 don't alias if %idx1 != %idx2, and the access size fits within the stride. We can express this in terms of a decomposed GEP expression with two indexes scale%idx1 + -scale%idx2 where %idx1 != %idx2, and some appropriate checks for sizes and offsets. This makes the reasoning slightly more powerful, and more importantly brings all the GEP logic under a common umbrella. Differential Revision: https://reviews.llvm.org/D92723	2020-12-06 10:27:35 +01:00
Philip Reames	8f076291be	Add recursive decomposition reasoning to isKnownNonEqual The basic idea is that by looking through operand instructions which don't change the equality result that we can push the existing known bits comparison down past instructions which would obscure them. We have analogous handling in InstSimplify for most - though weirdly not all - of these cases starting from an icmp root. It's a bit unfortunate to duplicate logic, but since my actual goal is to extend BasicAA, the icmp logic doesn't help. (And just makes it hard to test here.) The BasicAA change will be posted separately for review. Differential Revision: https://reviews.llvm.org/D92698	2020-12-05 15:58:19 -08:00
Philip Reames	bfda69416c	[BasicAA] Fix a bug with relational reasoning across iterations Due to the recursion through phis basicaa does, the code needs to be extremely careful not to reason about equality between values which might represent distinct iterations. I'm generally skeptical of the correctness of the whole scheme, but this particular patch fixes one particular instance which is demonstrateable incorrect. Interestingly, this appears to be the second attempted fix for the same issue. The former fix is incomplete and doesn't address the actual issue. Differential Revision: https://reviews.llvm.org/D92694	2020-12-05 14:10:21 -08:00
Florian Hahn	4e5c0c2a63	[ConstraintElimination] Wrap dump() call in LLVM_DEBUG (NFC). ConstraintSystem::dump only generates output with -debug, but there's no need to call it without -debug.	2020-12-05 13:14:53 +00:00
Florian Hahn	4ceecc820b	[ConstraintElimination] Handle constraints with all zero var coeffs. Constraints where all variable coefficients are 0 do not add any useful information. When checking, we can check if they are always true/false.	2020-12-05 12:06:53 +00:00
Nikita Popov	f8afba5f7a	[AA] Add statistics for alias results (NFC) Count how many NoAlias/MustAlias/MayAlias we get from top-level queries.	2020-12-05 11:09:15 +01:00
Arthur Eubanks	7f6f9f4cf9	[NewPM] Make pass adaptors less templatey Currently PassBuilder.cpp is by far the file that takes longest to compile. This is due to tons of templates being instantiated per pass. Follow PassManager by using wrappers around passes to avoid making the adaptors templated on the pass type. This allows us to move various adaptors' run methods into .cpp files. This reduces the compile time of PassBuilder.cpp on my machine from 66 to 39 seconds. It also reduces the size of opt from 685M to 676M. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D92616	2020-12-04 08:30:50 -08:00
Arthur Eubanks	2f0de58294	[NewPM] Support --print-before/after in NPM This changes --print-before/after to be a list of strings rather than legacy passes. (this also has the effect of not showing the entire list of passes in --help-hidden after --print-before/after, which IMO is great for making it less verbose). Currently PrintIRInstrumentation passes the class name rather than pass name to llvm::shouldPrintBeforePass(), meaning llvm::shouldPrintBeforePass() never functions as intended in the NPM. There is no easy way of converting class names to pass names outside of within an instance of PassBuilder. This adds a map of pass class names to their short names in PassRegistry.def within PassInstrumentationCallbacks. It is populated inside the constructor of PassBuilder, which takes a PassInstrumentationCallbacks. Add a pointer to PassInstrumentationCallbacks inside PrintIRInstrumentation and use the newly created map. This is a bit hacky, but I can't think of a better way since the short id to class name only exists within PassRegistry.def. This also doesn't handle passes not in PassRegistry.def but rather added via PassBuilder::registerPipelineParsingCallback(). llvm/test/CodeGen/Generic/print-after.ll doesn't seem very useful now with this change. Reviewed By: ychen, jamieschmeiser Differential Revision: https://reviews.llvm.org/D87216	2020-12-03 16:52:14 -08:00
Philip Reames	0129cd5035	Use deref facts derived from minimum object size of allocations This change should be fairly straight forward. If we've reached a call, check to see if we can tell the result is dereferenceable from information about the minimum object size returned by the call. To control compile time impact, I'm only adding the call for base facts in the routine. getObjectSize can also do recursive reasoning, and we don't want that general capability here. As a follow up patch (without separate review), I will plumb through the missing TLI parameter. That will have the effect of extending this to known libcalls - malloc, new, and the like - whereas currently this only covers calls with the explicit allocsize attribute. Differential Revision: https://reviews.llvm.org/D90341	2020-12-03 15:01:14 -08:00
modimo	1860331932	[MemCpyOpt] Correctly merge alias scopes during call slot optimization When MemCpyOpt performs call slot optimization it will concatenate the `alias.scope` metadata between the function call and the memcpy. However, scoped AA relies on the domains in metadata to be maintained in a caller-callee relationship. Naive concatenation breaks this assumption leading to bad AA results. The fix is to take the intersection of domains then union the scopes within those domains. The original bug came from a case of rust bad codegen which uses this bad aliasing to perform additional memcpy optimizations. As show in the added test case `%src` got forwarded past its lifetime leading to a dereference of garbage data. Testing ninja check-llvm Reviewed By: jeroen.dobbelaere Differential Revision: https://reviews.llvm.org/D91576	2020-12-03 09:23:37 -08:00
dfukalov	2ce38b3f03	[NFC] Reduce include files dependency. 1. Removed #include "...AliasAnalysis.h" in other headers and modules. 2. Cleaned up includes in AliasAnalysis.h. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92489	2020-12-03 18:25:05 +03:00
Nick Desaulniers	bc044a88ee	[Inline] prevent inlining on stack protector mismatch It's common for code that manipulates the stack via inline assembly or that has to set up its own stack canary (such as the Linux kernel) would like to avoid stack protectors in certain functions. In this case, we've been bitten by numerous bugs where a callee with a stack protector is inlined into an attribute((no_stack_protector)) caller, which generally breaks the caller's assumptions about not having a stack protector. LTO exacerbates the issue. While developers can avoid this by putting all no_stack_protector functions in one translation unit together and compiling those with -fno-stack-protector, it's generally not very ergonomic or as ergonomic as a function attribute, and still doesn't work for LTO. See also: https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/ https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u SSP attributes can be ordered by strength. Weakest to strongest, they are: ssp, sspstrong, sspreq. Callees with differing SSP attributes may be inlined into each other, and the strongest attribute will be applied to the caller. (No change) After this change: * A callee with no SSP attributes will no longer be inlined into a caller with SSP attributes. * The reverse is also true: a callee with an SSP attribute will not be inlined into a caller with no SSP attributes. * The alwaysinline attribute overrides these rules. Functions that get synthesized by the compiler may not get inlined as a result if they are not created with the same stack protector function attribute as their callers. Alternative approach to https://reviews.llvm.org/D87956. Fixes pr/47479. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: rnk, MaskRay Differential Revision: https://reviews.llvm.org/D91816	2020-12-02 11:00:16 -08:00
jasonliu	a65d8c5d72	[XCOFF][AIX] Generate LSDA data and compact unwind section on AIX Summary: AIX uses the existing EH infrastructure in clang and llvm. The major differences would be 1. AIX do not have CFI instructions. 2. AIX uses a new personality routine, named __xlcxx_personality_v1. It doesn't use the GCC personality rountine, because the interoperability is not there yet on AIX. 3. AIX do not use eh_frame sections. Instead, it would use a eh_info section (compat unwind section) to store the information about personality routine and LSDA data address. Reviewed By: daltenty, hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D91455	2020-12-02 18:42:44 +00:00
Sanjay Patel	9d6d24c250	[JumpThreading][VectorUtils] avoid infinite loop on unreachable IR https://llvm.org/PR48362 It's possible that we could stub this out sooner somewhere within JumpThreading, but I'm not sure how to do that, and then we would still have potential danger in other callers. I can't find a way to trigger this using 'instsimplify', however, because that already has a bailout on unreachable blocks.	2020-12-02 13:39:33 -05:00
Fangrui Song	a5309438fe	static const char *const foo => const char foo[] By default, a non-template variable of non-volatile const-qualified type having namespace-scope has internal linkage, so no need for `static`.	2020-12-01 10:33:18 -08:00
Caroline Concatto	4b0ef2b075	[NFC][CostModel]Extend class IntrinsicCostAttributes to use ElementCount Type This patch replaces the attribute `unsigned VF` in the class IntrinsicCostAttributes by `ElementCount VF`. This is a non-functional change to help upcoming patches to compute the cost model for scalable vector inside this class. Differential Revision: https://reviews.llvm.org/D91532	2020-12-01 11:12:51 +00:00
Wei Wang	93dc1b5b8c	[Remarks][2/2] Expand remarks hotness threshold option support in more tools This is the #2 of 2 changes that make remarks hotness threshold option available in more tools. The changes also allow the threshold to sync with hotness threshold from profile summary with special value 'auto'. This change expands remarks hotness threshold option -fdiagnostics-hotness-threshold in clang and *-remarks-hotness-threshold in other tools to utilize hotness threshold from profile summary. Remarks hotness filtering relies on several driver options. Table below lists how different options are correlated and affect final remarks outputs: \| profile \| hotness \| threshold \| remarks printed \| \|---------\|---------\|-----------\|-----------------\| \| No \| No \| No \| All \| \| No \| No \| Yes \| None \| \| No \| Yes \| No \| All \| \| No \| Yes \| Yes \| None \| \| Yes \| No \| No \| All \| \| Yes \| No \| Yes \| None \| \| Yes \| Yes \| No \| All \| \| Yes \| Yes \| Yes \| >=threshold \| In the presence of profile summary, it is often more desirable to directly use the hotness threshold from profile summary. The new argument value 'auto' indicates threshold will be synced with hotness threshold from profile summary during compilation. The "auto" threshold relies on the availability of profile summary. In case of missing such information, no remarks will be generated. Differential Revision: https://reviews.llvm.org/D85808	2020-11-30 21:55:50 -08:00
Nick Desaulniers	91aff1d8ba	[InlineCost] prefer range-for. NFC Prefer range-for over iterators when such methods exist. Precommitted from https://reviews.llvm.org/D91816. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D92350	2020-11-30 16:07:40 -08:00
Mircea Trofin	5fe10263ab	[llvm][inliner] Reuse the inliner pass to implement 'always inliner' Enable performing mandatory inlinings upfront, by reusing the same logic as the full inliner, instead of the AlwaysInliner. This has the following benefits: - reduce code duplication - one inliner codebase - open the opportunity to help the full inliner by performing additional function passes after the mandatory inlinings, but before th full inliner. Performing the mandatory inlinings first simplifies the problem the full inliner needs to solve: less call sites, more contextualization, and, depending on the additional function optimization passes run between the 2 inliners, higher accuracy of cost models / decision policies. Note that this patch does not yet enable much in terms of post-always inline function optimization. Differential Revision: https://reviews.llvm.org/D91567	2020-11-30 12:03:39 -08:00
Nikita Popov	e987fbdd85	[BasicAA] Generalize recursive phi alias analysis For recursive phis, we skip the recursive operands and check that the remaining operands are NoAlias with an unknown size. Currently, this is limited to inbounds GEPs with positive offsets, to guarantee that the recursion only ever increases the pointer. Make this more general by only requiring that the underlying object of the phi operand is the phi itself, i.e. it it based on itself in some way. To compensate, we need to use a beforeOrAfterPointer() location size, as we no longer have the guarantee that the pointer is strictly increasing. This allows us to handle some additional cases like negative geps, geps with dynamic offsets or geps that aren't inbounds. Differential Revision: https://reviews.llvm.org/D91914	2020-11-29 10:25:23 +01:00
Nikita Popov	1dea8ed8b7	[BasicAA] Remove unnecessary known size requirement The size requirement on V2 was present because it was not clear whether an unknown size would allow an access before the start of V2, which could then overlap. This is clarified since D91649: In this part of BasicAA, all accesses can occur only after the base pointer, even if they have unknown size. This makes the positive and negative offset cases symmetric. Differential Revision: https://reviews.llvm.org/D91482	2020-11-28 10:17:12 +01:00
Nikita Popov	8351f9b5ce	[ValueTracking] Fix assert on shufflevector of pointers In this case getScalarSizeInBits() is not well-defined. Use the existing TyBits variable that handles vectors of pointers correctly.	2020-11-27 21:19:31 +01:00
Martin Storsjö	fa10383664	Revert "[BasicAA] Fix BatchAA results for phi-phi assumptions" This reverts commit `8166ed1a7a`, as it caused some compilations to hang/loop indefinitely, see https://reviews.llvm.org/D91936 for details.	2020-11-27 21:50:59 +02:00
Cullen Rhodes	7b8d50b141	[InstSimplify] Clarify use of FixedVectorType in SimplifySelectInst Folding a select of vector constants that include undef elements only applies to fixed vectors, but there's no earlier check the type is not scalable so it crashes for scalable vectors. This adds a check so this optimization is only attempted for fixed vectors. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92046	2020-11-27 09:55:29 +00:00
Kazu Hirata	60e749aa23	[InlineCost] Fix indentation (NFC)	2020-11-26 18:00:55 -08:00
Nikita Popov	8166ed1a7a	[BasicAA] Fix BatchAA results for phi-phi assumptions Add a flag that disables caching when computing aliasing results potentially based on a phi-phi NoAlias assumption. We'll still insert cache entries temporarily to catch infinite recursion, but will drop them afterwards, so they won't persist in BatchAA. Differential Revision: https://reviews.llvm.org/D91936	2020-11-26 21:43:50 +01:00
Nikita Popov	4df8efce80	[AA] Split up LocationSize::unknown() Currently, we have some confusion in the codebase regarding the meaning of LocationSize::unknown(): Some parts (including most of BasicAA) assume that LocationSize::unknown() only allows accesses after the base pointer. Some parts (various callers of AA) assume that LocationSize::unknown() allows accesses both before and after the base pointer (but within the underlying object). This patch splits up LocationSize::unknown() into LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer() to make this completely unambiguous. I tried my best to determine which one is appropriate for all the existing uses. The test changes in cs-cs.ll in particular illustrate a previously clearly incorrect AA result: We were effectively assuming that argmemonly functions were only allowed to access their arguments after the passed pointer, but not before it. I'm pretty sure that this was not intentional, and it's certainly not specified by LangRef that way. Differential Revision: https://reviews.llvm.org/D91649	2020-11-26 18:39:55 +01:00
Max Kazantsev	035955f925	Revert "Return "[SCEV] Use isBasicBlockEntryGuardedByCond in isLoopBackedgeGuardedByCond", 2nd try" This reverts commit `f690986f31`. Compile time then and again...	2020-11-26 18:12:51 +07:00
Max Kazantsev	f690986f31	Return "[SCEV] Use isBasicBlockEntryGuardedByCond in isLoopBackedgeGuardedByCond", 2nd try Reverted because the compile time impact is still too high. isKnownViaNonRecursiveReasoning is used twice, we can do it just once. Differential Revision: https://reviews.llvm.org/D92152	2020-11-26 17:45:13 +07:00
Max Kazantsev	91d6b6b5fb	Revert "[SCEV] Use isBasicBlockEntryGuardedByCond in isLoopBackedgeGuardedByCond" This reverts commit `3d4c0460ec`. Compile time impact is still high. Need to understand why. Differential Revision: https://reviews.llvm.org/D92153	2020-11-26 17:28:30 +07:00
Max Kazantsev	3d4c0460ec	[SCEV] Use isBasicBlockEntryGuardedByCond in isLoopBackedgeGuardedByCond Previously we tried to using isKnownPredicateAt, but it makes an extra query to isKnownPredicate, which has negative impact on compile time. Let's try to use more lightweight isBasicBlockEntryGuardedByCond. Differential Revision: https://reviews.llvm.org/D92152	2020-11-26 17:08:38 +07:00
Max Kazantsev	3b6481eae2	Revert "[SCEV] Use isKnownPredicateAt in isLoopBackedgeGuardedByCond" This reverts commit `14f2ad0e3c`. Reverting to investigate compile time drop. Differential Revision: https://reviews.llvm.org/D92152	2020-11-26 16:42:43 +07:00
Max Kazantsev	14f2ad0e3c	[SCEV] Use isKnownPredicateAt in isLoopBackedgeGuardedByCond A piece of code in `isLoopBackedgeGuardedByCond` basically duplicates the dominators traversal from `isBlockEntryGuardedByCond` called from `isKnownPredicateAt`, but it's less powerful because it does not give context to `isImpliedCond`. This patch reuses the `isKnownPredicateAt `function there, reducing the amount of code duplication and making it more powerful. Differential Revision: https://reviews.llvm.org/D92152 Reviewed By: skatkov	2020-11-26 13:20:02 +07:00
Max Kazantsev	f10500e220	[IndVars] Use isLoopBackedgeGuardedByCond for last iteration check Use more context to prove contextual facts about the last iteration. It is only executed when the backedge is taken, so we can use `isLoopBackedgeGuardedByCond` to make this check. Differential Revision: https://reviews.llvm.org/D91535 Reviewed By: skatkov	2020-11-26 12:37:21 +07:00
Joe Ellis	06654a5348	[SVE] Fix TypeSize warning in RuntimePointerChecking::insert The TypeSize warning would occur because RuntimePointerChecking::insert was not scalable vector aware. The fix is to use ScalarEvolution::getSizeOfExpr to grab the size of types. Differential Revision: https://reviews.llvm.org/D90171	2020-11-25 16:59:03 +00:00
Cullen Rhodes	1ba4b82f67	[LAA] NFC: Rename [get]MaxSafeRegisterWidth -> [get]MaxSafeVectorWidthInBits MaxSafeRegisterWidth is a misnomer since it actually returns the maximum safe vector width. Register suggests it relates directly to a physical register where it could be a vector spanning one or more physical registers. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91727	2020-11-25 13:06:26 +00:00
Max Kazantsev	9130651126	Revert "[SCEV] Generalize no-self-wrap check in isLoopInvariantExitCondDuringFirstIterations" This reverts commit `7dcc889917`. This patch introduced a logical error that breaks whole logic of this analysis. All checks we are making are supposed to be loop-independent, so that we could safely remove the range check. The 'nw' fact is loop-dependent, so we can remove the check basing on facts from this very check. Motivating examples will follow-up.	2020-11-25 13:26:17 +07:00
Philip Reames	10ddb927c1	[SCEV] Use isa<> pattern for testing for CouldNotCompute [NFC] Some older code - and code copied from older code - still directly tested against the singelton result of SE::getCouldNotCompute. Using the isa<SCEVCouldNotCompute> form is both shorter, and more readable.	2020-11-24 18:47:49 -08:00
Philip Reames	b3a8a15343	[LAA] Minor code style tweaks [NFC]	2020-11-24 15:49:27 -08:00
Janek van Oirschot	42eaf4fe0a	[HardwareLoops] Change order of SCEV expression construction for InitLoopCount. Putting the +1 before the zero-extend will allow scalar evolution to fold the expression in some cases such as the one shown in PowerPC's `shrink-wrap.ll` test. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D91724	2020-11-24 18:01:42 +00:00
Max Kazantsev	02fdbc3567	Revert "[NFC][SCEV] Generalize monotonicity check for full and limited iteration space" This reverts commit `2734a9ebf4`. This patch appeared to not be a NFC. It introduced an execution path where monotonicity check on limited space started relying in existing nsw/nuw flags, which is illegal. The motivating test will follow-up.	2020-11-24 17:56:59 +07:00
Arthur Eubanks	aff058b1a9	Reland [CGSCC] Detect devirtualization in more cases The devirtualization wrapper misses cases where if it wraps a pass manager, an individual pass may devirtualize an indirect call created by a previous pass. For example, inlining may create a new indirect call which is devirtualized by instcombine. Currently the devirtualization wrapper will not see that because it only checks cgscc edges at the very beginning and end of the pass (manager) it wraps. This fixes some tests testing this exact behavior in the legacy PM. Instead of checking WeakTrackingVHs for CallBases at the very beginning and end of the pass it wraps, check every time updateCGAndAnalysisManagerForPass() is called. check-llvm and check-clang with -abort-on-max-devirt-iterations-reached on by default doesn't show any failures outside of tests specifically testing it so it doesn't needlessly rerun passes more than necessary. (The NPM -O2/3 pipeline run the inliner/function simplification pipeline under a devirtualization repeater pass up to 4 times by default). http://llvm-compile-time-tracker.com/?config=O3&stat=instructions&remote=aeubanks shows that 7zip has ~1% compile time regression. I looked at it and saw that there indeed was devirtualization happening that was not previously caught, so now it reruns the CGSCC pipeline on some SCCs, which is WAI. The initial land assumed CallBase WeakTrackingVHs would always be CallBases, but they can be RAUW'd with undef. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D89587	2020-11-23 21:28:59 -08:00
Yichao Yu	4bc88a0e9a	Enable support for floating-point division reductions Similar to fsub, fdiv can also be vectorized using fmul. Also http://llvm.org/viewvc/llvm-project?view=revision&revision=215200 Differential Revision: https://reviews.llvm.org/D34078 Co-authored-by: Jameson Nash <jameson@juliacomputing.com>	2020-11-23 20:00:58 -05:00
Arthur Eubanks	6a2799cf8e	Revert "[CGSCC] Detect devirtualization in more cases" This reverts commit `14a68b4aa9`. Causes building self hosted clang to crash when using NPM.	2020-11-23 13:21:05 -08:00
Arthur Eubanks	3c811ce4f3	[NPM] Share pass building options with legacy PM We should share options when possible. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91741	2020-11-23 13:04:05 -08:00
Arthur Eubanks	7167e5203a	Port -print-memderefs to NPM There is lots of code duplication, but hopefully it won't matter soon. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D91683	2020-11-23 11:56:22 -08:00
Arthur Eubanks	14a68b4aa9	[CGSCC] Detect devirtualization in more cases The devirtualization wrapper misses cases where if it wraps a pass manager, an individual pass may devirtualize an indirect call created by a previous pass. For example, inlining may create a new indirect call which is devirtualized by instcombine. Currently the devirtualization wrapper will not see that because it only checks cgscc edges at the very beginning and end of the pass (manager) it wraps. This fixes some tests testing this exact behavior in the legacy PM. Instead of checking WeakTrackingVHs for CallBases at the very beginning and end of the pass it wraps, check every time updateCGAndAnalysisManagerForPass() is called. check-llvm and check-clang with -abort-on-max-devirt-iterations-reached on by default doesn't show any failures outside of tests specifically testing it so it doesn't needlessly rerun passes more than necessary. (The NPM -O2/3 pipeline run the inliner/function simplification pipeline under a devirtualization repeater pass up to 4 times by default). http://llvm-compile-time-tracker.com/?config=O3&stat=instructions&remote=aeubanks shows that 7zip has ~1% compile time regression. I looked at it and saw that there indeed was devirtualization happening that was not previously caught, so now it reruns the CGSCC pipeline on some SCCs, which is WAI. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D89587	2020-11-23 11:55:20 -08:00
Jay Foad	000400ca0a	Fix speling in comments. NFC.	2020-11-23 14:43:24 +00:00
Mikael Holmen	faf848ac32	[Inline] Fix in handling of ptrtoint in InlineCost ConstantOffsetPtrs contains mappings from a Value to a base pointer and an offset. The offset is typed and has a size, and at least when dealing with ptrtoint, it could happen that we had a mapping from a ptrtoint with type i32 to an offset with type i16. This could later cause problems, showing up in PR 47969 and PR 38500. In PR 47969 we ended up in an assert complaining that trunc i16 to i16 is invalid and in Pr 38500 that a cmp on an i32 and i16 value isn't valid. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D90610	2020-11-23 14:33:06 +01:00
Max Kazantsev	48d7cc6ae2	[SCEV] Fix incorrect treatment of max taken count. PR48225 SCEV makes a logical mistake when handling EitherMayExit in case when both conditions must be met to exit the loop. The mistake looks like follows: "if condition `A` fails within at most `X` first iterations, and `B` fails within at most `Y` first iterations, then `A & B` fails at most within `min (X, Y)` first iterations". This is wrong, because both of them must fail at the same time. Simple example illustrating this is following: we have an IV with step 1, condition `A` = "IV is even", condition `B` = "IV is odd". Both `A` and `B` will fail within first two iterations. But it doesn't mean that both of them will fail within first two first iterations at the same time, which would mean that IV is neither even nor odd at the same time within first 2 iterations. We can only do so for known exact BE counts, but not for max. Differential Revision: https://reviews.llvm.org/D91942 Reviewed By: nikic	2020-11-23 16:52:39 +07:00
Max Kazantsev	47e31d1b5e	[NFC] Reduce code duplication in binop processing in computeExitLimitFromCondCached Handling of `and` and `or` vastly uses copy-paste. Factored out into a helper function as preparation step for further fix (see PR48225). Differential Revision: https://reviews.llvm.org/D91864 Reviewed By: nikic	2020-11-23 13:18:12 +07:00
Nikita Popov	6f5ef648a5	[BasicAA] Avoid unnecessary cache update (NFC) If the final recursive query returns MayAlias as well, there is no need to update the cache (which already stores MayAlias).	2020-11-22 20:10:45 +01:00
Sanjay Patel	c5a4d80fd4	[ValueTracking][MemCpyOpt] avoid crash on inttoptr with vector pointer type (PR48075)	2020-11-22 12:54:18 -05:00
Simon Pilgrim	24d6e60488	[Analysis] Remove unused system header includes Cleanup unused system headers and fix an implicit dependency	2020-11-22 10:32:37 +00:00
Nikita Popov	ded5928866	[BasicAA] Remove unnecessary sextOrSelf (NFC) We are doing a sextOrTrunc directly afterwards, so this seems useless. There is a multiplication in between, but truncating before or after the multiplication should not make a difference.	2020-11-21 21:32:56 +01:00
Nikita Popov	0d114f56d7	[BasicAA] Return DecomposedGEP (NFC) Instead of requiring the caller to initialize the DecomposedGEP structure and then passing it in by reference, make DecomposeGEPExpression() responsible for initializing and returning the structure.	2020-11-21 21:05:26 +01:00
Nikita Popov	f4412c5ae4	[BasicAA] Remove some intermediate variables (NFC) Use DecompGEP1.Offset instead of GEP1BaseOffset, etc. I found the asymmetry of modifying DecompGEP1.VarIndices, but not modifying DecompGEP1.Offset odd here.	2020-11-21 20:36:25 +01:00
Nikita Popov	913a99c474	[BasicAA] Remove stale FIXME (NFC) If aliasGEP returns MayAlias, the code does fall through to aliasPHI etc, so this FIXME is no longer applicable.	2020-11-21 20:07:26 +01:00
Kazu Hirata	226beb494c	[Analysis] Use llvm::is_contained (NFC)	2020-11-20 18:08:05 -08:00
Hongtao Yu	f3c445697d	[CSSPGO] IR intrinsic for pseudo-probe block instrumentation This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story. A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues: 1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality. 2. The counter atomics may not be fully cleaned up from the code stream eventually. 3. Extra work is needed for re-targeting. We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality. Let's now look at an example. Given the following LLVM IR: ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 br i1 %cmp, label %bb1, label %bb2 bb1: br label %bb3 bb2: br label %bb3 bb3: ret void } ``` The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID. ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 call void @llvm.pseudoprobe(i64 837061429793323041, i64 1) br i1 %cmp, label %bb1, label %bb2 bb1: call void @llvm.pseudoprobe(i64 837061429793323041, i64 2) br label %bb3 bb2: call void @llvm.pseudoprobe(i64 837061429793323041, i64 3) br label %bb3 bb3: call void @llvm.pseudoprobe(i64 837061429793323041, i64 4) ret void } ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D86490	2020-11-20 10:39:24 -08:00
Nikita Popov	e8dc6e9a32	[MemLoc] Use hasValue() method more (NFC) Followup to `7de7c40898`. I previously removed a number of == comparisons to LocationSize::unknown(), but missed these != comparisons.	2020-11-19 22:29:44 +01:00
Nikita Popov	7de7c40898	[MemLoc] Use hasValue() method (NFC) Instead of comparing to LocationSize::unknown(), prefer calling the hasValue() method instead, which is less reliant on implementation details.	2020-11-19 21:53:50 +01:00
Nikita Popov	393b9e9db3	[MemLoc] Require LocationSize argument (NFC) When constructing a MemoryLocation by hand, require that a LocationSize is explicitly specified. D91649 will split up LocationSize::unknown() into two different states, and callers should make an explicit choice regarding the kind of MemoryLocation they want to have.	2020-11-19 21:45:52 +01:00
Artur Pilipenko	887c7660bd	[BasicAA] Deoptimize intrinsics don't modify memory Similarly to assumes and guards deoptimize intrinsics are marked as writing to ensure proper control dependencies but they never modify any particular memory location. Differential Revision: https://reviews.llvm.org/D91658	2020-11-19 12:08:33 -08:00
Nikita Popov	22ec72f803	[Lint] Use MemoryLocation Instead of separately passing pointer and size, make use of MemoryLocation. This allows us to also reuse all the existing logic for determining the MemoryLocation correponding to an instruction or call argument. Not quite NFC because used locations may be more precise in some cases.	2020-11-19 20:55:25 +01:00
Leonard Chan	a97f62837f	[llvm][IR] Add dso_local_equivalent Constant The `dso_local_equivalent` constant is a wrapper for functions that represents a value which is functionally equivalent to the global passed to this. That is, if this accepts a function, calling this constant should have the same effects as calling the function directly. This could be a direct reference to the function, the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the resolved function as a call target. When lowered, the returned address must have a constant offset at link time from some other symbol defined within the same binary. The address of this value is also insignificant. The name is leveraged from `dso_local` where use of a function or variable is resolved to a symbol in the same linkage unit. In this patch: - Addition of `dso_local_equivalent` and handling it - Update Constant::needsRelocation() to strip constant inbound GEPs and take advantage of `dso_local_equivalent` for relative references This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959) which makes vtables readonly. This works by replacing the dynamic relocations for function pointers in them with static relocations that represent the offset between the vtable and virtual functions. If a function is externally defined, `dso_local_equivalent` can be used as a generic wrapper for the function to still allow for this static offset calculation to be done. See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details. Differential Revision: https://reviews.llvm.org/D77248	2020-11-19 10:26:17 -08:00
Simon Pilgrim	fceaff41d6	[ValueTracking] computeKnownBitsFromShiftOperator - move shift amount analysis to top of the function. NFCI. These are all lightweight to compute and helps avoid issues with Known being used to hold both the shift amount and then the shifted result. Minor cleanup for D90479.	2020-11-19 13:50:49 +00:00
Mircea Trofin	8ab2353a4c	[NFC][TFUtils] also include output specs lookup logic in loadOutputSpecs The lookup logic is also reusable. Also refactored the API to return the loaded vector - this makes it more clear what state it is in in the case of error (as it won't be returned). Differential Revision: https://reviews.llvm.org/D91759	2020-11-18 21:20:21 -08:00
Mircea Trofin	b51e844f7a	[NFC][TFUtils] Extract out the output spec loader It's generic for the 'development mode', not specific to the inliner case. Differential Revision: https://reviews.llvm.org/D91751	2020-11-18 20:03:20 -08:00
Nikita Popov	cd3c22c47e	[BasicAA] Generalize base offset modulus handling The GEP aliasing implementation currently has two pieces of code that solve two different subsets of the same basic problem: If you have GEPs with offsets 4x + 0 and 4y + 1 (assuming access size 1), then they do not alias regardless of whether x and y are the same. One implementation is in aliasSameBasePointerGEPs(), which looks at this in a limited structural way. It requires both GEP base pointers to be exactly the same, then (optionally) a number of equal indexes, then an unknown index, then a non-equal index into a struct. This set of limitations works, but it's overly restrictive and hides the core property we're trying to exploit. The second implementation is part of aliasGEP() itself and tries to find a common modulus in the scales, so it can then check that the constant offset doesn't overlap under modular arithmetic. The second implementation has the right idea of what the general problem is, but effectively only considers power of two factors in the scales (while aliasSameBasePointerGEPs also works with non-pow2 struct sizes.) What this patch does is to adjust the aliasGEP() implementation to instead find the largest common factor in all the scales (i.e. the GCD) and use that as the modulus. Differential Revision: https://reviews.llvm.org/D91027	2020-11-18 21:48:49 +01:00
Nikita Popov	85ccdcaa50	[BasicAA] Remove assert in AA evaluator As reported in https://reviews.llvm.org/D91383#2401825, this assert breaks external -aa-eval tests. We'll have to fix this case before re-enabling it.	2020-11-18 20:04:38 +01:00
Simon Pilgrim	eef203dbdf	[Analysis] CGSCCPassManager.cpp - fix Wshadow warnings. NFCI.	2020-11-18 09:59:31 +00:00
Wei Wang	3279347da0	[BPI] Look through bitcasts in calcZeroHeuristic Constant hoisting may hide the constant value behind bitcast for And's operand. Track down the constant to make the BFI result consistent regardless of hoisting. Differential Revision: https://reviews.llvm.org/D91450	2020-11-17 09:33:05 -08:00
Nikita Popov	cb4fc25c91	[BasicAA] Make alias GEP positive offset handling symmetric aliasGEP() currently implements some special handling for the case where all variable offsets are positive, in which case the constant offset can be taken as the minimal offset. However, it does not perform the same handling for the all-negative case. This means that the alias-analysis result between two GEPs is asymmetric: If GEP1 - GEP2 is all-positive, then GEP2 - GEP1 is all-negative, and the first will result in NoAlias, while the second will result in MayAlias. Apart from producing sub-optimal results for one order, this also violates our caching assumption. In particular, if BatchAA is used, the cached result depends on the order of the GEPs in the first query. This results in an inconsistency in BatchAA and AA results, which is how I noticed this issue in the first place. Differential Revision: https://reviews.llvm.org/D91383	2020-11-17 18:05:34 +01:00
Sander de Smalen	f571fe6df5	Reland [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost. This relands https://reviews.llvm.org/D91059 and reverts commit `30fded75b4`. GetRegUsage now returns 0 when Ty is not a valid vector element type.	2020-11-17 13:45:10 +00:00
Philip Reames	0f41a2fe83	test commit for new client	2020-11-16 17:26:52 -08:00
Michael Liao	f375885ab8	[InferAddrSpace] Teach to handle assumed address space. - In certain cases, a generic pointer could be assumed as a pointer to the global memory space or other spaces. With a dedicated target hook to query that address space from a given value, infer-address-space pass could infer and propagate that to all its users. Differential Revision: https://reviews.llvm.org/D91121	2020-11-16 17:06:33 -05:00
Philip Reames	257d33c815	[SCEV] Factor out part of wrap flag detection logic [NFC](try 2) This is a cut down version of 1ec6e1 which was reverted due to a compile time issue. The key changes made from that patch: 1) only infer the flags needed along each path, 2) be careful to preserve order of checks, and 3) avoid computing NW flags at all since we need to prove the stronger property (does not cross 0) in the caller anyways. Assuming this doesn't trip regressions, I'm going to try weakening (1). My end objective is to move flag inference into addrec construction. If I can't weaken (1) without compile time impact, I'll have a problem.	2020-11-16 12:07:21 -08:00
Kazu Hirata	147ccc848a	[JumpThreading] Call eraseBlock when folding a conditional branch This patch teaches the jump threading pass to call BPI->eraseBlock when it folds a conditional branch. Without this patch, BranchProbabilityInfo could end up with stale edge probabilities for the basic block containing the conditional branch -- one edge probability with less than 1.0 and the other for a removed edge. This patch is one of the steps before we can safely re-apply D91017. Differential Revision: https://reviews.llvm.org/D91511	2020-11-15 22:29:30 -08:00
Kazu Hirata	c5cc2d8b94	[BranchProbabilityInfo] Use predecessors(BB) and successors(BB) (NFC)	2020-11-15 19:26:38 -08:00
Nikita Popov	3b7f84d97f	[AA] Add missing AAQI parameter This alias() call did not pass on the AAQueryInfo.	2020-11-15 20:29:53 +01:00
Nikita Popov	9ace4b337f	Revert "[SCEV] Factor out part of wrap flag detection logic [NFC-ish]" This reverts commit `1ec6e1eb8a`. This change causes a significant compile-time regression: https://llvm-compile-time-tracker.com/compare.php?from=dd0b8b94d0796bd895cc998dd163b4fbebceb0b8&to=1ec6e1eb8a084bffae8a40236eb9925d8026dd07&stat=instructions I assume that this is due to the non-NFC part of the change, which now performs expensive nowrap inference even for nowrap flags that are not used by the particular code.	2020-11-15 10:19:44 +01:00
Philip Reames	1ec6e1eb8a	[SCEV] Factor out part of wrap flag detection logic [NFC-ish] In an effort to make code around flag determination more readable, and (possibly) prepare for a follow up change, factor out some of the flag detection logic. In the process, reduce the number of locations we mutate wrap flags by a couple. Note that this isn't NFC. The old code tried for NSW xor (NUW \|\| NW). This is, two different paths computed different sets of wrap flags. The new code will try for all three. The result is that some expressions end up with a few extra flags set.	2020-11-14 19:21:05 -08:00
Nikita Popov	0b72444211	[BasicAA] Remove unnecessary size limitation We're dropping a common offset from both GEPs here. It's not necessary for the access sizes to be the same as well.	2020-11-14 16:51:31 +01:00
Nikita Popov	9a85643cd3	[KnownBits] Combine abs() implementations ValueTracking was using a more powerful abs() implementation. Roll it into KnownBits::abs(). Also add an exhaustive test for abs(), in both the poisoning and non-poisoning variants.	2020-11-13 22:23:50 +01:00
Nikita Popov	f3124a46c1	[SCEV] Fix nsw flags for GEP expressions The SCEV code for constructing GEP expressions currently assumes that the addition of the base and all the offsets is nsw if the GEP is inbounds. While the addition of the offsets is indeed nsw, the addition to the base address is not, as the base address is interpreted as an unsigned value. Fix the GEP expression code to not assume nsw for the base+offset calculation. However, do assume nuw if we know that the offset is non-negative. With this, we use the same behavior as the construction of GEP addrecs does. (Modulo the fact that we disregard SCEV unification, as the pre-existing FIXME points out). Differential Revision: https://reviews.llvm.org/D90648	2020-11-13 18:19:32 +01:00
Nikita Popov	92b708902e	[ValueTracking] Don't set nsw flag for inbounds addition When computing the known bits for a GEP, don't set the nsw flag when adding an offset to an address. The nsw flag only applies to pure offset additions (see also D90708). The nsw flag is only used in a very minor way by the code, to the point that I was not able to come up with a test case where it makes a difference. Differential Revision: https://reviews.llvm.org/D90637	2020-11-13 17:58:21 +01:00
Piotr Sobczak	47dec5aa60	[DivergenceAnalysis] Use addRequiredTransitive For querying divergence the chained analysis passes are required to be alive, for instance LoopInfoWrapperPass. Ensure that by using addRequiredTransitive. Differential Revision: https://reviews.llvm.org/D91335	2020-11-13 14:40:00 +01:00
Simon Pilgrim	49623fa77a	[ValueTracking] computeKnownBitsFromShiftOperator use KnownBits direct for constant shift amounts. Let KnownBits shift handlers deal with out-of-range shift amounts.	2020-11-13 10:54:35 +00:00
serge-sans-paille	9218ff50f9	llvmbuildectomy - replace llvm-build by plain cmake No longer rely on an external tool to build the llvm component layout. Instead, leverage the existing `add_llvm_componentlibrary` cmake function and introduce `add_llvm_component_group` to accurately describe component behavior. These function store extra properties in the created targets. These properties are processed once all components are defined to resolve library dependencies and produce the header expected by llvm-config. Differential Revision: https://reviews.llvm.org/D90848	2020-11-13 10:35:24 +01:00
Max Kazantsev	0a1d394bf3	[NFC] Refactor loop-invariant getters to return Optional	2020-11-13 15:03:10 +07:00
Nikita Popov	c00545dc32	[BasicAA] Remove checks for GEP decomposition limit reached The GEP aliasing code currently checks for the GEP decomposition limit being reached (i.e., we did not reach the "final" underlying object). As far as I can see, these checks are not necessary. It is perfectly fine to work with a GEP whose base can still be further decomposed. Looking back through the commit history, these checks were originally introduced in `1a444489e9`. However, I believe that the problem this was intended to address was later properly fixed with `1726fc698c`, and the checks are no longer necessary since then (and were not the right fix in the first place). Differential Revision: https://reviews.llvm.org/D91010	2020-11-12 20:43:38 +01:00
Jamie Schmeiser	5f672fefeb	Reland: Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source Summary: Expand the print-memoryssa and print<memoryssa> passes with a new hidden option -cfg-dot-mssa that names a file. When set, a dot-cfg style file will be generated into the named file with the memoryssa comments retained and those blocks containing them shown in light pink. The option does nothing in isolation. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea), dblaikie (David Blaikie) Differential Revision: https://reviews.llvm.org/D90638	2020-11-12 17:39:14 +00:00
Simon Pilgrim	f72d350bfb	[ValueTracking] Update computeKnownBitsFromShiftOperator callbacks to take KnownBits shift amount. NFCI. We were creating this internally, but will need to support general KnownBits amounts as part of D90479.	2020-11-12 16:56:55 +00:00
Simon Pilgrim	8996742741	[KnownBits] Add KnownBits::makeConstant helper. NFCI. Helper for cases where we need to create a KnownBits from a (fully known) constant value.	2020-11-12 16:16:04 +00:00
Anh Tuyen Tran	a20b3620bb	Revert "Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source" This reverts commit `45d459e752` due to build issue in Poly.	2020-11-12 15:48:14 +00:00
Jamie Schmeiser	45d459e752	Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source Summary: Expand the print-memoryssa and print<memoryssa> passes with a new hidden option -cfg-dot-mssa that names a file. When set, a dot-cfg style file will be generated into the named file with the memoryssa comments retained and those blocks containing them shown in light pink. The option does nothing in isolation. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea), dblaikie (David Blaikie) Differential Revision: https://reviews.llvm.org/D90638	2020-11-12 15:41:16 +00:00
Simon Pilgrim	11c106544b	[ValueTracking] Update computeKnownBitsFromShiftOperator callbacks to use KnownBits shift handling. NFCI.	2020-11-12 15:31:26 +00:00
Max Kazantsev	2734a9ebf4	[NFC][SCEV] Generalize monotonicity check for full and limited iteration space A piece of logic of `isLoopInvariantExitCondDuringFirstIterations` is actually a generalized predicate monotonicity check. This patch moves it into the corresponding method and generalizes it a bit. Differential Revision: https://reviews.llvm.org/D90395 Reviewed By: apilipenko	2020-11-12 12:37:07 +07:00
Arthur Eubanks	d9cbceb041	[CGSCC][Inliner] Handle new non-trivial edges in updateCGAndAnalysisManagerForPass Previously the inliner did a bit of a hack by adding ref edges for all new edges introduced by performing an inline before calling updateCGAndAnalysisManagerForPass(). This was because updateCGAndAnalysisManagerForPass() didn't handle new non-trivial call edges. This adds handling of non-trivial call edges to updateCGAndAnalysisManagerForPass(). The inliner called updateCGAndAnalysisManagerForFunctionPass() since it was handling adding newly introduced edges (so updateCGAndAnalysisManagerForPass() would only have to handle promotion), but now it needs to call updateCGAndAnalysisManagerForCGSCCPass() since updateCGAndAnalysisManagerForPass() is now handling the new call edges and function passes cannot add new edges. We follow the previous path of adding trivial ref edges then letting promotion handle changing the ref edges to call edges and the CGSCC updates. So this still does not allow adding call edges that result in an addition of a non-trivial ref edge. This is in preparation for better detecting devirtualization. Previously since the inliner itself would add ref edges, updateCGAndAnalysisManagerForPass() would think that promotion and thus devirtualization had happened after any sort of inlining. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91046	2020-11-11 13:43:49 -08:00
Sander de Smalen	30fded75b4	Revert "[LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost." This reverts commits: * [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost. `b873aba394`. * [LoopVectorizer] Silence warning in GetRegUsage. `9ff701100a`.	2020-11-11 14:41:55 +00:00
Simon Pilgrim	f6a326adef	[ValueTracking] computeKnownBitsFromShiftOperator - merge zero/one callbacks to single KnownBits callback. NFCI. Another cleanup for D90479 - handle the Known Ones/Zeros in a single callback, which will make it much easier to jump over to the KnownBits shift handling.	2020-11-11 14:22:42 +00:00
Simon Pilgrim	1a62ca65c1	[KnownBits] Add KnownBits::commonBits helper. NFCI. We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........	2020-11-11 12:15:54 +00:00
Sander de Smalen	b873aba394	[LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost. This is more accurate than dividing the bitwidth based on the element count by the maximum register size, as it can just reuse whatever has been calculated for legalization of these types. This change is also necessary when calculating register usage for scalable vectors, where the legalization of these types cannot be done based on the widest register size, because that does not take the 'vscale' component into account. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D91059	2020-11-11 10:18:50 +00:00
Max Kazantsev	7dcc889917	[SCEV] Generalize no-self-wrap check in isLoopInvariantExitCondDuringFirstIterations Lift limitation on step being `+/- 1`. In fact, the only thing it is needed for is proving no-self-wrap. We can instead check this flag directly. Theoretically it can increase the scope of the transform, but I could not construct such test easily. Differential Revision: https://reviews.llvm.org/D91126 Reviewed By: apilipenko	2020-11-11 11:17:13 +07:00
Kazu Hirata	21fbe2ee68	Revert "[BranchProbabilityInfo] Use SmallVector (NFC)" This reverts commit `2f1038c7b6`.	2020-11-10 19:17:13 -08:00

... 2 3 4 5 6 ...

10249 Commits