llvm-project

Commit Graph

Author	SHA1	Message	Date
Max Kazantsev	1acab00229	[LVI] Support for ashr in LVI Enhance LVI to analyze the ‘ashr’ binary operation. This leverages the infrastructure in ConstantRange for the ashr operation. Patch by Surya Kumari Jangala! Differential Revision: https://reviews.llvm.org/D40886 llvm-svn: 320983	2017-12-18 14:23:30 +00:00
Igor Laevsky	7bd3fb15e1	[TargetLibraryInfo] Discard library functions with incorrectly sized integers Differential Revision: https://reviews.llvm.org/D41184 llvm-svn: 320964	2017-12-18 10:31:58 +00:00
Hal Finkel	2ff24731bb	[SimplifyLibCalls] Inline calls to cabs when it's safe to do so When unsafe algerbra is allowed calls to cabs(r) can be replaced by: sqrt(creal(r)creal(r) + cimag(r)cimag(r)) Patch by Paul Walker, thanks! Differential Revision: https://reviews.llvm.org/D40069 llvm-svn: 320901	2017-12-16 01:26:25 +00:00
Sanjay Patel	600d24b49c	[TargetLibraryInfo] fix documentation comment; NFC llvm-svn: 320842	2017-12-15 18:54:29 +00:00
Haicheng Wu	a446151552	[InlineCost] Find repeated loads in the callee SROA analysis of InlineCost can figure out that some stores can be removed after inlining and then the repeated loads clobbered by these stores are also free. This patch finds these clobbered loads and adjust the inline cost accordingly. Differential Revision: https://reviews.llvm.org/D33946 llvm-svn: 320814	2017-12-15 14:34:41 +00:00
Serguei Katkov	67da7696a0	[SCEV] Fix the movement of insertion point in expander. PR35406. We cannot move the insertion point to header if SCEV contains div/rem operations due to they may go over check for zero denominator. Reviewers: sanjoy, mkazantsev, sebpop Reviewed By: sebpop Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41229 llvm-svn: 320789	2017-12-15 05:24:42 +00:00
Zachary Turner	260fe3eca6	Fix many -Wsign-compare and -Wtautological-constant-compare warnings. Most of the -Wsign-compare warnings are due to the fact that enums are signed by default in the MS ABI, while the tautological comparison warnings trigger on x86 builds where sizeof(size_t) is 4 bytes, so N > numeric_limits<unsigned>::max() is always false. Differential Revision: https://reviews.llvm.org/D41256 llvm-svn: 320750	2017-12-14 22:07:03 +00:00
Bjorn Pettersson	33c9d5535f	[ScalarEvolution] Fix base condition in isNormalAddRecPHI. Summary: The function is meant to recurse until it comes upon the phi it's looking for. However, with the current condition, it will recurse until it finds anything _but_ the phi. The function will even fail for simple cases like: %i = phi i32 [ %inc, %loop ], ... ... %inc = add i32 %i, 1 because the base condition will not happen when the phi is recursed to, and the recursion will end with a 'false' result since the previous instruction is a phi. Reviewers: sanjoy, atrick Reviewed By: sanjoy Subscribers: Ka-Ka, bjope, llvm-commits Committing on behalf of: Bevin Hansson (bevinh) Differential Revision: https://reviews.llvm.org/D40946 llvm-svn: 320700	2017-12-14 14:47:52 +00:00
Haicheng Wu	3739e14ab4	[InlineCost] Tracking Values through PHI Nodes This patch fix this FIXME in visitPHI() FIXME: We should potentially be tracking values through phi nodes, especially when they collapse to a single value due to deleted CFG edges during inlining. Differential Revision: https://reviews.llvm.org/D38594 llvm-svn: 320699	2017-12-14 14:36:18 +00:00
Dorit Nuzman	4750c785b3	[LV] Support efficient vectorization of an induction with redundant casts D30041 extended SCEVPredicateRewriter to improve handling of Phi nodes whose update chain involves casts; PSCEV can now build an AddRecurrence for some forms of such phi nodes, under the proper runtime overflow test. This means that we can identify such phi nodes as an induction, and the loop-vectorizer can now vectorize such inductions, however inefficiently. The vectorizer doesn't know that it can ignore the casts, and so it vectorizes them. This patch records the casts in the InductionDescriptor, so that they could be marked to be ignored for cost calculation (we use VecValuesToIgnore for that) and ignored for vectorization/widening/scalarization (i.e. treated as TriviallyDead). In addition to marking all these casts to be ignored, we also need to make sure that each cast is mapped to the right vector value in the vector loop body (be it a widened, vectorized, or scalarized induction). So whenever an induction phi is mapped to a vector value (during vectorization/widening/ scalarization), we also map the respective cast instruction (if exists) to that vector value. (If the phi-update sequence of an induction involves more than one cast, then the above mapping to vector value is relevant only for the last cast of the sequence as we allow only the "last cast" to be used outside the induction update chain itself). This is the last step in addressing PR30654. llvm-svn: 320672	2017-12-14 07:56:31 +00:00
Michael Zolotukhin	b45595bd00	Remove redundant includes from lib/Analysis. llvm-svn: 320617	2017-12-13 21:30:41 +00:00
Igor Laevsky	e0edb66475	Reintroduce r320049, r320014 and r319894. OpenGL issues should be fixed by now. llvm-svn: 320568	2017-12-13 11:21:18 +00:00
Mohammad Shahid	dbd30edb7f	[SLP] Vectorize jumbled memory loads. Summary: This patch tries to vectorize loads of consecutive memory accesses, accessed in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 which was reverted back due to some basic issue with representing the 'use mask' of jumbled accesses. This patch fixes the mask representation by recording the 'use mask' in the usertree entry. Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh Reviewed By: Ayal Subscribers: mgrang, dcaballe, hans, mzolotukhin Differential Revision: https://reviews.llvm.org/D36130 llvm-svn: 320548	2017-12-13 03:08:29 +00:00
Igor Laevsky	d63560b817	Revert r320049, r320014 and r319894 They were causing failures of the piglit OpenGL tests with AMD GPUs using the Mesa radeonsi driver. llvm-svn: 320466	2017-12-12 10:03:39 +00:00
Dorit Nuzman	5809e70540	[SCEV] Fix wrong Equal predicate created in getAddRecForPhiWithCasts CreateAddRecFromPHIWithCastsImpl() adds an IncrementNUSW overflow predicate which allows the PSCEV rewriter to rewrite this scev expression: (zext i8 {0, + , (trunc i32 step to i8)} to i32) into {0, +, (sext i8 (trunc i32 step to i8) to i32)} But then it adds the wrong Equal predicate: %step == (zext i8 (trunc i32 %step to i8) to i32). instead of: %step == (sext i8 (trunc i32 %step to i8) to i32) This is fixed here. Differential Revision: https://reviews.llvm.org/D40641 llvm-svn: 320298	2017-12-10 11:13:35 +00:00
Simon Dardis	70dbd5fbd0	Infer lowest bits of an integer Multiply when the low bits of the operands are known When the lowest bits of the operands to an integer multiply are known, the low bits of the result are deducible. Code to deduce known-zero bottom bits already existed, but this change improves on that by deducing known-ones. Patch by: Pedro Ferreira Reviewers: craig.topper, sanjoy, efriedma Differential Revision: https://reviews.llvm.org/D34029 llvm-svn: 320269	2017-12-09 23:25:57 +00:00
Evgeniy Stepanov	c667c1f47a	Hardware-assisted AddressSanitizer (llvm part). Summary: This is LLVM instrumentation for the new HWASan tool. It is basically a stripped down copy of ASan at this point, w/o stack or global support. Instrumenation adds a global constructor + runtime callbacks for every load and store. HWASan comes with its own IR attribute. A brief design document can be found in clang/docs/HardwareAssistedAddressSanitizerDesign.rst (submitted earlier). Reviewers: kcc, pcc, alekseyshl Subscribers: srhines, mehdi_amini, mgorny, javed.absar, eraman, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D40932 llvm-svn: 320217	2017-12-09 00:21:41 +00:00
Xinliang David Li	d91057bf52	Revert r320104: infinite loop profiling bug fix Causes unexpected memory issue with New PM this time. The new PM invalidates BPI but not BFI, leaving the reference to BPI from BFI invalid. Abandon this patch. There is a more general solution which also handles runtime infinite loop (but not statically). llvm-svn: 320180	2017-12-08 19:38:07 +00:00
Max Kazantsev	63a3de057e	[NFC] Rename variable from Cond to Pred to make it more sound llvm-svn: 320144	2017-12-08 12:54:32 +00:00
Max Kazantsev	9c08b7a053	[SCEV] Fix predicate usage in computeExitLimitFromICmp In this method, we invoke `SimplifyICmpOperands` which takes the `Cond` predicate by reference and may change it along with `LHS` and `RHS` SCEVs. But then we invoke `computeShiftCompareExitLimit` with Values from which the SCEVs have been derived, these Values have not been modified while `Cond` could be. One of possible outcomes of this is that we may falsely prove that an infinite loop ends within some finite number of iterations. In this patch, we save the original `Cond` and pass it along with original operands. This logic may be removed in future once `computeShiftCompareExitLimit` works with SCEVs instead of value operands. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D40953 llvm-svn: 320142	2017-12-08 12:19:45 +00:00
Alina Sbirlea	193429f0c8	[ModRefInfo] Make enum ModRefInfo an enum class [NFC]. Summary: Make enum ModRefInfo an enum class. Changes to ModRefInfo values should be done using inline wrappers. This should prevent future bit-wise opearations from being added, which can be more error-prone. Reviewers: sanjoy, dberlin, hfinkel, george.burgess.iv Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40933 llvm-svn: 320107	2017-12-07 22:41:34 +00:00
Xinliang David Li	4b0027f671	[PGO] detect infinite loop and form MST properly Differential Revision: http://reviews.llvm.org/D40873 llvm-svn: 320104	2017-12-07 22:23:28 +00:00
Alina Sbirlea	d6037ebeeb	[ModRefInfo] Replace remaining bit-wise operations with wrappers. llvm-svn: 319993	2017-12-07 00:43:19 +00:00
Alina Sbirlea	18fea013de	[ModRefInfo] Do not use ModRefInfo result in if conditions as this makes assumptions about the values in the enum. Replace with wrapper returning bool [NFC]. llvm-svn: 319949	2017-12-06 19:56:37 +00:00
Alina Sbirlea	5beb1838bb	[ModRefInfo] Use createModRefInfo wrapper to create a ModRefInfo from FunctionModRefBehavior. llvm-svn: 319941	2017-12-06 19:23:03 +00:00
Zvi Rackover	2e6e88f689	InstructionSimplify: 'extractelement' with an undef index is undef Summary: An undef extract index can be arbitrarily chosen to be an out-of-range index value, which would result in the instruction being undef. This change closes a gap identified while working on lowering vector permute intrinsics with variable index vectors to pure LLVM IR. Reviewers: arsenm, spatel, majnemer Reviewed By: arsenm, spatel Subscribers: fhahn, nhaehnle, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D40231 llvm-svn: 319910	2017-12-06 17:51:46 +00:00
Igor Laevsky	03655c7636	[InstSimplify] Fold insertelement into undef if index is out of bounds Differential Revision: https://reviews.llvm.org/D40650 llvm-svn: 319894	2017-12-06 14:04:45 +00:00
Max Kazantsev	d4f5987c58	[SCEV][NFC] Check NoWrap flags before lexicographical comparison of SCEVs Lexicographical comparison of SCEV trees is potentially expensive for big expression trees. We can define ordering between them for AddRecs and N-ary operations by SCEV NoWrap flags to make non-equality check cheaper. This change does not prevent grouping eqivalent SCEVs together and is not supposed to have any meaningful impact on behavior of any transforms. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D40645 llvm-svn: 319889	2017-12-06 12:44:56 +00:00
Max Kazantsev	1c66ae6303	[SCEV][NFC] Share value cache between SCEVs in GroupByComplexity Current implementation of `compareSCEVComplexity` is being unreasonable with `SCEVUnknown`s: every time it sees one, it creates a new value cache and tries to prove equality of two values using it. This cache reallocates and gets lost from SCEV to SCEV. This patch changes this behavior: now we create one cache for all values and share it between SCEVs. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D40597 llvm-svn: 319880	2017-12-06 08:58:16 +00:00
Hans Wennborg	146a9c3e51	Revert r319482 and r319483 "[memcpyopt] Teach memcpyopt to optimize across basic blocks" This caused PR35519. > [memcpyopt] Teach memcpyopt to optimize across basic blocks > > This teaches memcpyopt to make a non-local memdep query when a local query > indicates that the dependency is non-local. This notably allows it to > eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%. > > Fixes PR28958. > > Differential Revision: https://reviews.llvm.org/D38374 > > [memcpyopt] Commit file missed in r319482. > > This change was meant to be included with r319482 but was accidentally > omitted. llvm-svn: 319873	2017-12-06 01:47:55 +00:00
Alina Sbirlea	1e7440df80	[ModRefInfo] Initialize ArgMask to MRI_NoModRef. llvm-svn: 319831	2017-12-05 20:51:20 +00:00
Alina Sbirlea	63d2250a42	Modify ModRefInfo values using static inline method abstractions [NFC]. Summary: The aim is to make ModRefInfo checks and changes more intuitive and less error prone using inline methods that abstract the bit operations. Ideally ModRefInfo would become an enum class, but that change will require a wider set of changes into FunctionModRefBehavior. Reviewers: sanjoy, george.burgess.iv, dberlin, hfinkel Subscribers: nlopes, llvm-commits Differential Revision: https://reviews.llvm.org/D40749 llvm-svn: 319821	2017-12-05 20:12:23 +00:00
Igor Laevsky	cec8f47e77	[InstCombine] Don't crash on out of bounds shifts Differential Revision: https://reviews.llvm.org/D40649 llvm-svn: 319761	2017-12-05 12:18:15 +00:00
Sanjoy Das	adf3751730	[SCEV] Use a "Discovered" set instead of a "Visited" set; NFC Suggested by Max Kazantsev in https://reviews.llvm.org/D39361 llvm-svn: 319679	2017-12-04 19:22:01 +00:00
Sanjoy Das	7e36337935	[SCEV] A different fix for PR33494 Summary: I don't think rL309080 is the right fix for PR33494 -- caching ExitLimit only hides the problem[0]. The real issue is that because of how we forget SCEV expressions ScalarEvolution::getBackedgeTakenInfo, in the test case for PR33494 computing the backedge for any loop invalidates the trip count for every other loop. This effectively makes the SCEV cache useless. I've instead made the SCEV expression invalidation in ScalarEvolution::getBackedgeTakenInfo less aggressive to fix this issue. [0]: One way to think about this is that rL309080 essentially augmented the backedge-taken-count cache with another equivalent exit-limit cache. The bug went away because we were explicitly not clearing the exit-limit cache in getBackedgeTakenInfo. But instead of doing all of that, we can just avoid clearing the backedge-taken-count cache. Reviewers: mkazantsev, mzolotukhin Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D39361 llvm-svn: 319678	2017-12-04 19:22:00 +00:00
Sam McCall	d0d43e6f14	Revert "[ValueTracking] Pass only a single lambda to computeKnownBitsFromShiftOperator by using KnownBits struct instead of separate APInts. NFCI" This reverts commit r319624, which seems to cause a miscompile (breaks the multistage PPC buildbots) llvm-svn: 319652	2017-12-04 12:51:49 +00:00
Craig Topper	199acd88e3	[ValueTracking] Pass only a single lambda to computeKnownBitsFromShiftOperator by using KnownBits struct instead of separate APInts. NFCI llvm-svn: 319624	2017-12-02 23:42:17 +00:00
Adam Nemet	9303f62255	[opt-remarks] If hotness threshold is set, ignore remarks without hotness These are blocks that haven't not been executed during training. For large projects this could make a significant difference. For the project, I was looking at, I got an order of magnitude decrease in the size of the total YAML files with this and r319235. Differential Revision: https://reviews.llvm.org/D40678 Re-commit after fixing the failing testcase in rL319576, rL319577 and rL319578. llvm-svn: 319581	2017-12-01 20:41:38 +00:00
Fedor Sergeev	3b459c3847	IR printing improvement for loop passes - handle -print-module-scope Summary: Adding support for -print-module-scope similar to how it is being done for function passes. This option causes loop-pass printer to emit a whole-module IR instead of just a loop itself. Reviewers: sanjoy, silvas, weimingz Reviewed By: sanjoy Subscribers: apilipenko, skatkov, llvm-commits Differential Revision: https://reviews.llvm.org/D40247 llvm-svn: 319566	2017-12-01 18:33:58 +00:00
Adam Nemet	57783730fd	Revert "[opt-remarks] If hotness threshold is set, ignore remarks without hotness" This reverts commit r319556. Something is not working with this when used with sample-based profiling. Investigating... llvm-svn: 319562	2017-12-01 18:12:29 +00:00
Adam Nemet	8d1fc2b65b	[opt-remarks] If hotness threshold is set, ignore remarks without hotness These are blocks that haven't not been executed during training. For large projects this could make a significant difference. For the project, I was looking at, I got an order of magnitude decrease in the size of the total YAML files with this and r319235. Differential Revision: https://reviews.llvm.org/D40678 llvm-svn: 319556	2017-12-01 17:02:04 +00:00
Florian Hahn	30932a3c16	[InstSimplify] More fcmp cases when comparing against negative constants. Summary: For known positive non-zero value X: fcmp uge X, -C => true fcmp ugt X, -C => true fcmp une X, -C => true fcmp oeq X, -C => false fcmp ole X, -C => false fcmp olt X, -C => false Patch by Paul Walker. Reviewers: majnemer, t.p.northover, spatel, RKSimon Reviewed By: spatel Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D40012 llvm-svn: 319538	2017-12-01 12:34:16 +00:00
Zachary Turner	8065f0b975	Mark all library options as hidden. These command line options are not intended for public use, and often don't even make sense in the context of a particular tool anyway. About 90% of them are already hidden, but when people add new options they forget to hide them, so if you were to make a brand new tool today, link against one of LLVM's libraries, and run tool -help you would get a bunch of junk that doesn't make sense for the tool you're writing. This patch hides these options. The real solution is to not have libraries defining command line options, but that's a much larger effort and not something I'm prepared to take on. Differential Revision: https://reviews.llvm.org/D40674 llvm-svn: 319505	2017-12-01 00:53:10 +00:00
Dan Gohman	59e4c0b938	[memcpyopt] Teach memcpyopt to optimize across basic blocks This teaches memcpyopt to make a non-local memdep query when a local query indicates that the dependency is non-local. This notably allows it to eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%. Fixes PR28958. Differential Revision: https://reviews.llvm.org/D38374 llvm-svn: 319482	2017-11-30 22:10:53 +00:00
Davide Italiano	9d939c8f19	[InlineCost] Prefer getFunction() to two calls to getParent(). Improves clarity, also slightly cheaper. NFCI. llvm-svn: 319481	2017-11-30 22:10:35 +00:00
Max Kazantsev	9545a408b6	[SCEV][NFC] Break from loop after we found first non-Phi in getAddRecExprPHILiterally llvm-svn: 319306	2017-11-29 10:54:16 +00:00
Max Kazantsev	1c3b622820	[SCEV][NFC] Remove condition that can never happen due to check few lines above llvm-svn: 319293	2017-11-29 06:10:36 +00:00
Max Kazantsev	6e78ad35cc	[SCEV][NFC] More efficient caching in CompareValueComplexity Currently, we use a set of pairs to cache responces like `CompareValueComplexity(X, Y) == 0`. If we had proved that `CompareValueComplexity(S1, S2) == 0` and `CompareValueComplexity(S2, S3) == 0`, this cache does not allow us to prove that `CompareValueComplexity(S1, S3)` is also `0`. This patch replaces this set with `EquivalenceClasses` that merges Values into equivalence sets so that any two values from the same set are equal from point of `CompareValueComplexity`. This, in particular, allows us to prove the fact from example above. Differential Revision: https://reviews.llvm.org/D40429 llvm-svn: 319153	2017-11-28 08:26:43 +00:00
Max Kazantsev	cf9b1b24ce	[SCEV][NFC] More efficient caching in CompareSCEVComplexity Currently, we use a set of pairs to cache responces like `CompareSCEVComplexity(X, Y) == 0`. If we had proved that `CompareSCEVComplexity(S1, S2) == 0` and `CompareSCEVComplexity(S2, S3) == 0`, this cache does not allow us to prove that `CompareSCEVComplexity(S1, S3)` is also `0`. This patch replaces this set with `EquivalenceClasses` any two values from the same set are equal from point of `CompareSCEVComplexity`. This, in particular, allows us to prove the fact from example above. Differential Revision: https://reviews.llvm.org/D40428 llvm-svn: 319149	2017-11-28 07:48:12 +00:00
Sanjay Patel	0de1a4bc2d	[PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend on arg rather than result This should fix PR31455: https://bugs.llvm.org/show_bug.cgi?id=31455 Differential Revision: https://reviews.llvm.org/D28314 llvm-svn: 319094	2017-11-27 21:15:43 +00:00
Sanjay Patel	4ca9968155	[InstSimplify] use m_APFloat to simplify fcmp folds; NFCI llvm-svn: 319043	2017-11-27 16:37:09 +00:00
Jatin Bhateja	7410eead0c	[SCEV] Adding a check on outgoing branches of a terminator instr for SCEVBackedgeConditionFolder, NFC. Summary: For a given loop, getLoopLatch returns a non-null value when a loop has only one latch block. In the modified context adding an assertion to check that both the outgoing branches of a terminator instruction (of latch) does not target same header. + few minor code reorganization. Reviewers: jbhateja Reviewed By: jbhateja Subscribers: sanjoy Differential Revision: https://reviews.llvm.org/D40460 llvm-svn: 318997	2017-11-26 15:08:41 +00:00
Jatin Bhateja	a1da5e4ce7	[SCEV] NFC : Removing unnecessary check on outgoing branches of a branch instr. Summary: For a given loop, getLoopLatch returns a non-null value when a loop has only one latch block. In the modified context a check on both the outgoing branches of a terminator instruction (of latch) to same header is redundant. Reviewers: jbhateja Reviewed By: jbhateja Subscribers: sanjoy Differential Revision: https://reviews.llvm.org/D40460 llvm-svn: 318991	2017-11-26 02:01:01 +00:00
Fedor Sergeev	61975b49fe	IR printing improvement for loop passes Summary: Loop-pass printing is somewhat deficient since it does not provide the context around the loop (e.g. preheader). This context information becomes pretty essential when analyzing transformations that move stuff out of the loop. Extending printLoop to cover preheader and exit blocks (if any). Reviewers: sanjoy, silvas, weimingz Reviewed By: sanjoy Subscribers: apilipenko, skatkov, llvm-commits Differential Revision: https://reviews.llvm.org/D40246 llvm-svn: 318878	2017-11-22 20:59:53 +00:00
Max Kazantsev	23044fa639	[SCEV] Strengthen variance condition in calculateLoopDisposition Given loops `L1` and `L2` with AddRecs `AR1` and `AR2` varying in them respectively. When identifying loop disposition of `AR2` w.r.t. `L1`, we only say that it is varying if `L1` contains `L2`. But there is also a possible situation where `L1` and `L2` are consecutive sibling loops within the parent loop. In this case, `AR2` is also varying w.r.t. `L1`, but we don't correctly identify it. It can lead, for exaple, to attempt of incorrect folding. Consider: AR1 = {a,+,b}<L1> AR2 = {c,+,d}<L2> EXAR2 = sext(AR1) MUL = mul AR1, EXAR2 If we incorrectly assume that `EXAR2` is invariant w.r.t. `L1`, we can end up trying to construct something like: `{a * {c,+,d}<L2>,+,b * {c,+,d}<L2>}<L1>`, which is incorrect because `AR2` is not available on entrance of `L1`. Both situations "`L1` contains `L2`" and "`L1` preceeds sibling loop `L2`" can be handled with one check: "header of `L1` dominates header of `L2`". This patch replaces the old insufficient check with this one. Differential Revision: https://reviews.llvm.org/D39453 llvm-svn: 318819	2017-11-22 06:21:39 +00:00
Hans Wennborg	70e22d121d	Fix r318786 llvm-svn: 318787	2017-11-21 18:00:01 +00:00
Nuno Lopes	5c122882ed	removed unused private method decl. NFC llvm-svn: 318786	2017-11-21 17:53:19 +00:00
Alina Sbirlea	ff8b8aea2e	Add MemorySSA as loop dependency, disabled by default [NFC]. Summary: First step in adding MemorySSA as dependency for loop pass manager. Adding the dependency under a flag. New pass manager: MSSA pointer in LoopStandardAnalysisResults can be null. Legacy and new pass manager: Use cl::opt EnableMSSALoopDependency. Disabled by default. Reviewers: sanjoy, davide, gberry Subscribers: mehdi_amini, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D40274 llvm-svn: 318772	2017-11-21 15:45:46 +00:00
Sanjay Patel	eb731b09f3	[InstSimplify] fold and/or of fcmp ord/uno when operand is known nnan The 'ord' and 'uno' predicates have a logic operation for NAN built into their definitions: FCMP_ORD = 7, ///< 0 1 1 1 True if ordered (no nans) FCMP_UNO = 8, ///< 1 0 0 0 True if unordered: isnan(X) \| isnan(Y) So we can simplify patterns like this: (fcmp ord (known NNAN), X) && (fcmp ord X, Y) --> fcmp ord X, Y (fcmp uno (known NNAN), X) \|\| (fcmp uno X, Y) --> fcmp uno X, Y It might be better to split this into (X uno 0) \| (Y uno 0) as a canonicalization, but that would be another patch. Differential Revision: https://reviews.llvm.org/D40130 llvm-svn: 318627	2017-11-19 15:34:27 +00:00
Chandler Carruth	693eedb138	[PM/Unswitch] Teach SimpleLoopUnswitch to do non-trivial unswitching, making it no longer even remotely simple. The pass will now be more of a "full loop unswitching" pass rather than anything substantively simpler than any other approach. I plan to rename it accordingly once the dust settles. The key ideas of the new loop unswitcher are carried over for non-trivial unswitching: 1) Fully unswitch a branch or switch instruction from inside of a loop to outside of it. 2) Update the CFG and IR. This avoids needing to "remember" the unswitched branches as well as avoiding excessively cloning and reliance on complex parts of simplify-cfg to cleanup the cfg. 3) Update the analyses (where we can) rather than just blowing them away or relying on something else updating them. Sadly, #3 is somewhat compromised here as the dominator tree updates were too complex for me to want to reason about. I will need to make another attempt to do this now that we have a nice dynamic update API for dominators. However, we do adhere to #3 w.r.t. LoopInfo. This approach also adds an important principls specific to non-trivial unswitching: not all of the loop will be duplicated when unswitching. This fact allows us to compute the cost in terms of how much duplicate code is inserted rather than just on raw size. Unswitching conditions which essentialy partition loops will work regardless of the total loop size. Some remaining issues that I will be addressing in subsequent commits: - Handling unstructured control flow. - Unswitching 'switch' cases instead of just branches. - Moving to the dynamic update API for dominators. Some high-level, interesting limitationsV that folks might want to push on as follow-ups but that I don't have any immediate plans around: - We could be much more clever about not cloning things that will be deleted. In fact, we should be able to delete nothing and do a minimal number of clones. - There are many more interesting selection criteria for which branch to unswitch that we might want to look at. One that I'm interested in particularly are a set of conditions which all exit the loop and which can be merged into a single unswitched test of them. Differential revision: https://reviews.llvm.org/D34200 llvm-svn: 318549	2017-11-17 19:58:36 +00:00
Volodymyr Sapsai	8b46ff1648	[ThinLTO] Remove too aggressive assertion in building function call graph. The assertion was introduced in r317853 but there are cases when a call isn't handled either as direct or indirect. In this case we add a reference graph edge but not a call graph edge. Reviewers: tejohnson Reviewed By: tejohnson Subscribers: mehdi_amini, inglorion, eraman, hiraditya, efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D40056 llvm-svn: 318540	2017-11-17 18:28:05 +00:00
Javed Absar	da30c30a5e	[SCEV] simplify loop. NFC. Change loop to range-based llvm-svn: 318401	2017-11-16 13:49:27 +00:00
Max Kazantsev	87f4a3de45	[SCEV][NFC] Introduce isSafeToExpandAt function to SCEVExpander This function checks that: 1) It is safe to expand a SCEV; 2) It is OK to materialize it at the specified location. For example, attempt to expand a loop's AddRec to the same loop's preheader should fail. Differential Revision: https://reviews.llvm.org/D39236 llvm-svn: 318377	2017-11-16 05:10:56 +00:00
Mikael Holmen	6e60297ee6	[Lint] Don't warn about passing alloca'd value to tail call if using byval Summary: This fixes PR35241. When using byval, the data is effectively copied as part of the call anyway, so the pointer returned by the alloca will not be leaked to the callee and thus there is no reason to issue a warning. Reviewers: rnk Reviewed By: rnk Subscribers: Ka-Ka, llvm-commits Differential Revision: https://reviews.llvm.org/D40009 llvm-svn: 318279	2017-11-15 07:46:48 +00:00
Reid Kleckner	e021f703db	Fix clang -Wsometimes-uninitialized warning in SCEV code I don't believe this was a problem in practice, as it's likely that the boolean wasn't checked unless the backend condition was non-null. llvm-svn: 318073	2017-11-13 18:43:11 +00:00
Sanjay Patel	20df88a754	[ValueTracking] use 'auto' with 'dyn_cast'; NFC llvm-svn: 318058	2017-11-13 17:56:23 +00:00
Sanjay Patel	9e3d8f4b39	[ValueTracking] simplify code in CannotBeNegativeZero() with match(); NFCI llvm-svn: 318055	2017-11-13 17:40:47 +00:00
Jatin Bhateja	c61ade1ca0	[SCEV] Handling for ICmp occuring in the evolution chain. Summary: If a compare instruction is same or inverse of the compare in the branch of the loop latch, then return a constant evolution node. This shall facilitate computations of loop exit counts in cases where compare appears in the evolution chain of induction variables. Will fix PR 34538 Reviewers: sanjoy, hfinkel, junryoungju Reviewed By: sanjoy, junryoungju Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38494 llvm-svn: 318050	2017-11-13 16:43:24 +00:00
Volodymyr Sapsai	a73960213e	[ThinLTO] Fix missing call graph edges for calls with bitcasts. This change doesn't fix the root cause of the miscompile PR34966 as the root cause is in the linker ld64. This change makes call graph more complete allowing to have better module imports/exports. rdar://problem/35344706 Reviewers: tejohnson Reviewed By: tejohnson Subscribers: mehdi_amini, inglorion, eraman, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D39356 llvm-svn: 317853	2017-11-10 00:47:47 +00:00
Nuno Lopes	2ee4b30276	revert r317812 [BasicAA] fix build break by converting the previously introduced assert into an if stmt The code has a bug, but some tests regress. I'll discuss this further on the mailing list. llvm-svn: 317815	2017-11-09 17:35:36 +00:00
Nuno Lopes	9f82a2b60e	[BasicAA] fix build break by converting the previously introduced assert into an if stmt Apparently V1Size == -1 doest imply V2Size == -1, which is a bit surprising to me. llvm-svn: 317812	2017-11-09 17:06:42 +00:00
Nuno Lopes	eb1a603dd1	[BasicAA] add assertion for corner case in aliasGEP() llvm-svn: 317803	2017-11-09 16:16:46 +00:00
Dan Gohman	2c74fe977d	Add an @llvm.sideeffect intrinsic This patch implements Chandler's idea [0] for supporting languages that require support for infinite loops with side effects, such as Rust, providing part of a solution to bug 965 [1]. Specifically, it adds an `llvm.sideeffect()` intrinsic, which has no actual effect, but which appears to optimization passes to have obscure side effects, such that they don't optimize away loops containing it. It also teaches several optimization passes to ignore this intrinsic, so that it doesn't significantly impact optimization in most cases. As discussed on llvm-dev [2], this patch is the first of two major parts. The second part, to change LLVM's semantics to have defined behavior on infinite loops by default, with a function attribute for opting into potential-undefined-behavior, will be implemented and posted for review in a separate patch. [0] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088103.html [1] https://bugs.llvm.org/show_bug.cgi?id=965 [2] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118632.html Differential Revision: https://reviews.llvm.org/D38336 llvm-svn: 317729	2017-11-08 21:59:51 +00:00
Craig Topper	81d772c28a	[ValueTracking] Use APInt::isNullValue/isOneValue which are more efficient for large APInts. llvm-svn: 317712	2017-11-08 19:38:45 +00:00
Ivan A. Kosarev	d60a3cc395	[Analysis] Fix merging TBAA tags with different final access types There are cases when we have to merge TBAA access tags with the same base access type, but different final access types. For example, accesses to different members of the same structure may be vectorized into a single load or store instruction. Since we currently assume that the tags to merge always share the same final access type, we incorrectly return a tag that describes an access to one of the original final access types as the generic tag. This patch fixes that by producing generic tags for the common type and not the final access types of the original tags. Resolves: PR35225: Wrong tbaa metadata after load store vectorizer due to recent change https://bugs.llvm.org/show_bug.cgi?id=35225 Differential Revision: https://reviews.llvm.org/D39732 llvm-svn: 317682	2017-11-08 11:42:21 +00:00
Nuno Lopes	17921d9e21	BasicAA: fix bug where we would return partialalias instead of noalias My fix is conservative and will make us return may-alias instead. The test case is: check(gep(x, 0), n, gep(x, n), -1) with n == sizeof(x) Here, the first value accesses the whole object, but the second access doesn't access anything. The semantics of -1 is read until the end of the object, which in this case means read nothing. No test case, since isn't trivial to exploit this one, but I've proved it correct. llvm-svn: 317680	2017-11-08 10:59:00 +00:00
Sanjay Patel	86d24f1668	[ValueTracking] readonly (const) is a requirement for converting sqrt to llvm.sqrt; nnan is not As discussed in D39204, this is effectively a revert of rL265521 which required nnan to vectorize sqrt libcalls based on the old LangRef definition of llvm.sqrt. Now that the definition has been updated so the libcall and intrinsic have the same semantics apart from potentially setting errno, we can remove the nnan requirement. We have the right check to know that errno is not set: if (!ICS.onlyReadsMemory()) ...ahead of the switch. This will solve https://bugs.llvm.org/show_bug.cgi?id=27435 assuming that's being built for a target with -fno-math-errno. Differential Revision: https://reviews.llvm.org/D39642 llvm-svn: 317519	2017-11-06 22:40:09 +00:00
Dorit Nuzman	eb13dd3eac	[LV/LAA] Avoid specializing a loop for stride=1 when this predicate implies a single-iteration loop This fixes PR34681. Avoid adding the "Stride == 1" predicate when we know that Stride >= Trip-Count. Such a predicate will effectively optimize a single or zero iteration loop, as Trip-Count <= Stride == 1. Differential Revision: https://reviews.llvm.org/D38785 llvm-svn: 317438	2017-11-05 16:53:15 +00:00
Sean Fertile	4595a915f6	[LTO][ThinLTO] Use the linker resolutions to mark global values as dso_local. Now that we have a way to mark GlobalValues as local we can use the symbol resolutions that the linker plugin provides as part of lto/thinlto link step to refine the compilers view on what symbols will end up being local. Originally commited as r317374, but reverted in r317395 to update some missed tests. Differential Revision: https://reviews.llvm.org/D35702 llvm-svn: 317408	2017-11-04 17:04:39 +00:00
Sean Fertile	39770ca0a1	Revert "[LTO][ThinLTO] Use the linker resolutions to mark global values ..." Changes more tests then expected on one of the build bots. reverting to investigate. This reverts https://llvm.org/svn/llvm-project/llvm/trunk@317374 llvm-svn: 317395	2017-11-04 01:54:20 +00:00
Sean Fertile	36528c2a9b	[LTO][ThinLTO] Use the linker resolutions to mark global values as dso_local. Now that we have a way to mark GlobalValues as local we can use the symbol resolutions that the linker plugin provides as part of lto/thinlto link step to refine the compilers view on what symbols will end up being local. Differential Revision: https://reviews.llvm.org/D35702 llvm-svn: 317374	2017-11-03 21:45:55 +00:00
Ivan A. Kosarev	4b77f463d0	[Analysis] Refine matching and merging of TBAA tags This patch combines the code that matches and merges TBAA access tags. The aim is to simplify future changes and making sure that these operations produce consistent results. Differential Revision: https://reviews.llvm.org/D39463 llvm-svn: 317311	2017-11-03 10:26:25 +00:00
Hiroshi Yamauchi	dce9def3dd	Irreducible loop metadata for more accurate block frequency under PGO. Summary: Currently the block frequency analysis is an approximation for irreducible loops. The new irreducible loop metadata is used to annotate the irreducible loop headers with their header weights based on the PGO profile (currently this is approximated to be evenly weighted) and to help improve the accuracy of the block frequency analysis for irreducible loops. This patch is a basic support for this. Reviewers: davidxl Reviewed By: davidxl Subscribers: mehdi_amini, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D39028 llvm-svn: 317278	2017-11-02 22:26:51 +00:00
Yichao Yu	6fefc0d65e	Allow inaccessiblememonly and inaccessiblemem_or_argmemonly to be overwriten on call site with operand bundle Summary: Similar to argmemonly, readonly and readnone. Fix PR35128 Reviewers: andrew.w.kaylor, chandlerc, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D39434 llvm-svn: 317201	2017-11-02 12:18:33 +00:00
Geoff Berry	eed6531ea2	[BranchProbabilityInfo] Handle irreducible loops. Summary: Compute the strongly connected components of the CFG and fall back to use these for blocks that are in loops that are not detected by LoopInfo when computing loop back-edge and exit branch probabilities. Reviewers: dexonsmith, davidxl Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D39385 llvm-svn: 317094	2017-11-01 15:16:50 +00:00
Philip Reames	5552f503d5	Undo accidental commit These files shouldn't have been submitted in 316967 llvm-svn: 316968	2017-10-31 00:04:09 +00:00
Philip Reames	9c3cbeea39	[CGP] Fix crash on i96 bit multiply Issue found by llvm-isel-fuzzer on OSS fuzz, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3725 If anyone actually cares about > 64 bit arithmetic, there's a lot more to do in this area. There's a bunch of obviously wrong code in the same function. I don't have the time to fix all of them and am just using this to understand what the workflow for fixing fuzzer cases might look like. llvm-svn: 316967	2017-10-30 23:59:51 +00:00
Clement Courbet	b2c3eb8cf1	[CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2). - Targets that want to support memcmp expansions now return the list of supported load sizes. - Expansion codegen does not assume that all power-of-two load sizes smaller than the max load size are valid. For examples, this is not the case for x86(32bit)+sse2. Fixes PR34887. llvm-svn: 316905	2017-10-30 14:19:33 +00:00
Artur Gainullin	af7ba8ff6b	Improve clamp recognition in ValueTracking. Summary: ValueTracking was recognizing not all variations of clamp. Swapping of true value and false value of select was added to fix this problem. The first patch was reverted because it caused miscompile in NVPTX target. Added corresponding test cases. Reviewers: spatel, majnemer, efriedma, reames Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D39240 llvm-svn: 316795	2017-10-27 20:53:41 +00:00
Max Kazantsev	52d0a49046	Revert rL316568 because of sudden performance drop on ARM llvm-svn: 316739	2017-10-27 04:17:44 +00:00
Sanjoy Das	8499ebf2e9	[SCEV] Fix an assertion failure in the max backedge taken count Max backedge taken count is always expected to be a constant; and this is usually true by construction -- it is a SCEV expression with constant inputs. However, if the max backedge expression ends up being computed to be a udiv with a constant zero denominator[0], SCEV does not fold the result to a constant since there is no constant it can fold it to (SCEV has no representation for "infinity" or "undef"). However, in computeMaxBECountForLT we already know the denominator is positive, and thus at least 1; and we can use this fact to avoid dividing by zero. [0]: We can end up with a constant zero denominator if the signed range of the stride is more precise than the unsigned range. llvm-svn: 316615	2017-10-25 21:41:00 +00:00
Sanjoy Das	f15a861601	Add a comment to clarify a future change llvm-svn: 316614	2017-10-25 21:40:59 +00:00
Max Kazantsev	b6d40067af	[SCEV] Enhance SCEVFindUnsafe for division This patch allows SCEVFindUnsafe algorithm to tread division by any non-positive value as safe. Previously, it could only recognize non-zero constants. Differential Revision: https://reviews.llvm.org/D39228 llvm-svn: 316568	2017-10-25 11:07:43 +00:00
Mikael Holmen	279790b674	[MemDep] DBG intrinsics don't impact abort limit for call site dependence analysis Summary: Memory dependence analysis no longer counts DbgInfoIntrinsics towards the limit where to abort the analysis. Before, a bunch of calls to dbg.value could affect the generated code, meaning that with -g we could generate different code than without. Reviewers: chandlerc, Prazek, davide, efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39181 llvm-svn: 316551	2017-10-25 06:15:32 +00:00
Artem Belevich	cb8f6328dc	[NVPTX] allow address space inference for volatile loads/stores. If particular target supports volatile memory access operations, we can avoid AS casting to generic AS. Currently it's only enabled in NVPTX for loads and stores that access global & shared AS. Differential Revision: https://reviews.llvm.org/D39026 llvm-svn: 316495	2017-10-24 20:31:44 +00:00
Craig Topper	8e8b6efdc8	[ValueTracking] Remove unnecessary temporary APInt from computeNumSignBitsVectorConstant. We can just use getNumSignBits instead of inverting negative numbers. llvm-svn: 316266	2017-10-21 16:35:41 +00:00
Craig Topper	b98ee58511	[ValueTracking] Simplify the known bits code for constant vectors a little. Neither of these cases really require a temporary APInt outside the loop. For the ConstantDataSequential case the APInt will never be larger than 64-bits so its fine to just call getElementAsAPInt. For ConstantVector we can get the APInt by reference and only make a copy where the inversion is needed. llvm-svn: 316265	2017-10-21 16:35:39 +00:00
Nikolai Bozhenov	fa8c5514c5	[ValueTracking] Enabling ValueTracking patch by default (recommit #2 after checking for timeout issue). The original patch was an improvement to IR ValueTracking on non-negative integers. It has been checked in to trunk (D18777, r284022). But was disabled by default due to performance regressions. Perf impact has improved. The patch would be enabled by default. Reviewers: reames, hfinkel Differential Revision: https://reviews.llvm.org/D34101 Patch by: Olga Chupina <olga.chupina@intel.com> llvm-svn: 316208	2017-10-20 10:08:47 +00:00
Nikolai Bozhenov	8dcab54cb4	Revert r315992 because of a found miscompilation failure llvm-svn: 316164	2017-10-19 15:36:18 +00:00
Sanjoy Das	2f27456c82	Revert "[ScalarEvolution] Handling for ICmp occuring in the evolution chain." This reverts commit r316054. There was some confusion over the review process: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171016/495884.html llvm-svn: 316129	2017-10-18 22:00:57 +00:00
Nikolai Bozhenov	9723f12491	Fixup patch for revision rL316070. Added check that type of CmpConst and source type of trunc are equal for correct matching of the case when we can set widened C constant equal to CmpConstant. %cond = cmp iN %x, CmpConst %tr = trunc iN %x to iK %narrowsel = select i1 %cond, iK %t, iK C Patch by: Gainullin, Artur <artur.gainullin@intel.com> llvm-svn: 316082	2017-10-18 14:24:50 +00:00
Nikolai Bozhenov	74c047eabb	Improve lookThroughCast function. Summary: When we have the following case: %cond = cmp iN %x, CmpConst %tr = trunc iN %x to iK %narrowsel = select i1 %cond, iK %t, iK C We could possibly match only min/max pattern after looking through cast. So it is more profitable if widened C constant will be equal CmpConst. That is why just set widened C constant equal to CmpConst, because there is a further check in this function that trunc CmpConst == C. Also description for lookTroughCast function was added. Reviewers: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38536 Patch by: Artur Gainullin <artur.gainullin@intel.com> llvm-svn: 316070	2017-10-18 09:28:09 +00:00
Jatin Bhateja	1fc49627e4	[ScalarEvolution] Handling for ICmp occuring in the evolution chain. Summary: If a compare instruction is same or inverse of the compare in the branch of the loop latch, then return a constant evolution node. Currently scope of evaluation is limited to SCEV computation for PHI nodes. This shall facilitate computations of loop exit counts in cases where compare appears in the evolution chain of induction variables. Will fix PR 34538 Reviewers: sanjoy, hfinkel, junryoungju Reviewed By: junryoungju Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38494 llvm-svn: 316054	2017-10-18 01:36:16 +00:00
Nikolai Bozhenov	346f4329c4	Improve clamp recognition in ValueTracking. Summary: ValueTracking was recognizing not all variations of clamp. Swapping of true value and false value of select was added to fix this problem. This change breaks the canonical form of cmp inside the matchMinMax function, that is why additional checks for compare predicates is needed. Added corresponding test cases. Reviewers: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38531 Patch by: Artur Gainullin <artur.gainullin@intel.com> llvm-svn: 315992	2017-10-17 11:50:48 +00:00
Sanjoy Das	3a5e25278a	Revert "[SCEV] Maintain and use a loop->loop invalidation dependency" This reverts commit r315713. It causes PR34968. I think I know what the problem is, but I don't think I'll have time to fix it this week. llvm-svn: 315962	2017-10-17 01:03:56 +00:00
Anna Thomas	79503c035f	[SCEV] Rename getMaxBECount and update comments. NFC Post commit review comments at D38825. llvm-svn: 315920	2017-10-16 17:47:17 +00:00
Sanjay Patel	b7d1238cfc	[ValueTracking] fix typos, formatting; NFC llvm-svn: 315909	2017-10-16 14:46:37 +00:00
Aaron Ballman	615eb47035	Reverting r315590; it did not include changes for llvm-tblgen, which is causing link errors for several people. Error LNK2019 unresolved external symbol "public: void __cdecl `anonymous namespace'::MatchableInfo::dump(void)const " (?dump@MatchableInfo@?A0xf4f1c304@@QEBAXXZ) referenced in function "public: void __cdecl `anonymous namespace'::AsmMatcherEmitter::run(class llvm::raw_ostream &)" (?run@AsmMatcherEmitter@?A0xf4f1c304@@QEAAXAEAVraw_ostream@llvm@@@Z) llvm-tblgen D:\llvm\2017\utils\TableGen\AsmMatcherEmitter.obj 1 llvm-svn: 315854	2017-10-15 14:32:27 +00:00
Hongbin Zheng	73f650435b	[LoopInfo][Refactor] Make SetLoopAlreadyUnrolled a member function of the Loop Pass, NFC. This avoid code duplication and allow us to add the disable unroll metadata elsewhere. Differential Revision: https://reviews.llvm.org/D38928 llvm-svn: 315850	2017-10-15 07:31:02 +00:00
Matthew Simpson	2284937bbc	[IPSCCP] Move common functions to ValueLatticeUtils (NFC) This patch moves some common utility functions out of IPSCCP and makes them available globally. The functions determine if interprocedural data-flow analyses can propagate information through function returns, arguments, and global variables. Differential Revision: https://reviews.llvm.org/D37638 llvm-svn: 315719	2017-10-13 17:53:44 +00:00
Sanjoy Das	c70a7a02ea	[SCEV] Maintain and use a loop->loop invalidation dependency Summary: This change uses the loop use list added in the previous change to remember the loops that appear in the trip count expressions of other loops; and uses it in forgetLoop. This lets us not scan every loop in the function on a forgetLoop call. With this change we no longer invalidate clear out backedge taken counts on forgetValue. I think this is fine -- the contract is that SCEV users must call forgetLoop(L) if their change to the IR could have changed the trip count of L; solely calling forgetValue on a value feeding into the backedge condition of L is not enough. Moreover, I don't think we can strengthen forgetValue to be sufficient for invalidating trip counts without significantly re-architecting SCEV. For instance, if we have the loop: I = *Ptr; E = I + 10; do { // ... } while (++I != E); then the backedge taken count of the loop is 9, and it has no reference to either I or E, i.e. there is no way in SCEV today to re-discover the dependency of the loop's trip count on E or I. So a SCEV client cannot change E to (say) "I + 20", call forgetValue(E) and expect the loop's trip count to be updated. Reviewers: atrick, sunfish, mkazantsev Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38435 llvm-svn: 315713	2017-10-13 17:13:44 +00:00
Anna Thomas	a2ca902033	[SCEV] Teach SCEV to find maxBECount when loop endbound is variant Summary: This patch teaches SCEV to calculate the maxBECount when the end bound of the loop can vary. Note that we cannot calculate the exactBECount. This will only be done when both conditions are satisfied: 1. the loop termination condition is strictly LT. 2. the IV is proven to not overflow. This provides more information to users of SCEV and can be used to improve identification of finite loops. Reviewers: sanjoy, mkazantsev, silviu.baranga, atrick Reviewed by: mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38825 llvm-svn: 315683	2017-10-13 14:30:43 +00:00
Daniel Jasper	3344a21236	Revert r314923: "Recommit : Use the basic cost if a GEP is not used as addressing mode" Significantly reduces performancei (~30%) of gipfeli (https://github.com/google/gipfeli) I have not yet managed to reproduce this regression with the open-source version of the benchmark on github, but will work with others to get a reproducer to you later today. llvm-svn: 315680	2017-10-13 14:04:21 +00:00
Sanjoy Das	e6b995f2b2	[SCEV] Maintain loop use lists, and use them in forgetLoop Summary: Currently we do not correctly invalidate memoized results for add recurrences that were created directly (i.e. they were not created from a `Value`). This change fixes this by keeping loop use lists and using the loop use lists to determine which SCEV expressions to invalidate. Here are some statistics on the number of uses of in the use lists of all loops on a clang bootstrap (config: release, no asserts): Count: 731310 Min: 1 Mean: 8.555150 50th %time: 4 95th %tile: 25 99th %tile: 53 Max: 433 Reviewers: atrick, sunfish, mkazantsev Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38434 llvm-svn: 315672	2017-10-13 05:50:52 +00:00
Sanjay Patel	e272be7c9a	[ValueTracking] return zero when there's conflict in known bits of a shift (PR34838) Poison allows us to return a better result than undef. llvm-svn: 315595	2017-10-12 17:31:46 +00:00
Don Hinton	3e0199f7eb	[dump] Remove NDEBUG from test to enable dump methods [NFC] Summary: Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP. Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods. Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so it'll be picked up by public headers. Differential Revision: https://reviews.llvm.org/D38406 llvm-svn: 315590	2017-10-12 16:16:06 +00:00
Hiroshi Inoue	b49b015bed	[ScheduleDAGInstrs] fix behavior of getUnderlyingObjectsForCodeGen when no identifiable object found This patch fixes the bug introduced in https://reviews.llvm.org/D35907; the bug is reported by http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171002/491452.html. Before D35907, when GetUnderlyingObjects fails to find an identifiable object, allMMOsOkay lambda in getUnderlyingObjectsForInstr returns false and Objects vector is cleared. This behavior is unintentionally changed by D35907. This patch makes the behavior for such case same as the previous behavior. Since D35907 introduced a wrapper function getUnderlyingObjectsForCodeGen around GetUnderlyingObjects, getUnderlyingObjectsForCodeGen is modified to return a boolean value to ask the caller to clear the Objects vector. Differential Revision: https://reviews.llvm.org/D38735 llvm-svn: 315565	2017-10-12 06:26:04 +00:00
Daniel Neilson	5acfd1dd78	[SCEV] Properly handle the case of a non-constant start with a zero accum in ScalarEvolution::createAddRecFromPHIWithCastsImpl Summary: This patch fixes an error in the patch to ScalarEvolution::createAddRecFromPHIWithCastsImpl made in D37265. In that patch we handle the cases where the either the start or accum values can be zero after truncation. But, we assume that the start value must be a constant if the accum is zero. This is clearly an erroneous assumption. This change removes that assumption. Reviewers: sanjoy, dorit, mkazantsev Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38814 llvm-svn: 315491	2017-10-11 19:05:14 +00:00
Vivek Pandya	9590658fb8	[NFC] Convert OptimizationRemarkEmitter old emit() calls to new closure parameterized emit() calls Summary: This is not functional change to adopt new emit() API added in r313691. Reviewed By: anemet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38285 llvm-svn: 315476	2017-10-11 17:12:59 +00:00
Adam Nemet	0965da2055	Rename OptimizationDiagnosticInfo.* to OptimizationRemarkEmitter.* Sync it up with the name of the class actually defined here. This has been bothering me for a while... llvm-svn: 315249	2017-10-09 23:19:02 +00:00
Matthew Simpson	49ee814996	[SparsePropagation] Move member definitions to header (NFC) AbstractLatticeFunction and SparseSolver are class templates parameterized by a lattice value, so we need to move these member functions over to the header. Differential Revision: https://reviews.llvm.org/D38561 llvm-svn: 314996	2017-10-05 18:03:30 +00:00
Sanjoy Das	005b88c0a6	Do not call Loop::getName on possibly dead loops This fixes PR34832. llvm-svn: 314938	2017-10-04 22:02:27 +00:00
Jun Bum Lim	d40e03c2d8	Recommit : Use the basic cost if a GEP is not used as addressing mode Recommitting r314517 with the fix for handling ConstantExpr. Original commit message: Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target. However, since it doesn't check its actual users, it will return FREE even in cases where the GEP cannot be folded away as a part of actual addressing mode. For example, if an user of the GEP is a call instruction taking the GEP as a parameter, then the GEP may not be folded in isel. llvm-svn: 314923	2017-10-04 18:33:52 +00:00
Adam Nemet	6c381b7a2e	[OptRemark] Move YAML writing to IR Before the patch this was in Analysis. Moving it to IR and making it implicit part of LLVMContext::diagnose allows the full opt-remark facility to be used outside passes e.g. the pass manager. Jessica is planning to use this to report function size after each pass. The same could be used for time reports. Tested with BUILD_SHARED_LIBS=On. llvm-svn: 314909	2017-10-04 15:18:11 +00:00
Adam Nemet	f31b1f310c	Move verbosity check for remarks to the diag handler Test needs some slight adjustment because we no longer check the existence of BFI but rather that the actual hotness is set on the remark. If entry_count is not set getBlockProfileCount returns None. llvm-svn: 314874	2017-10-04 04:26:23 +00:00
Hans Wennborg	9a9048e19f	Revert r314806 "[SLP] Vectorize jumbled memory loads." All the buildbots are red, e.g. http://lab.llvm.org:8011/builders/clang-cmake-aarch64-lld/builds/2436/ > Summary: > This patch tries to vectorize loads of consecutive memory accesses, accessed > in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 > which was reverted back due to some basic issue with representing the 'use mask' of > jumbled accesses. > > This patch fixes the mask representation by recording the 'use mask' in the usertree entry. > > Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df > > Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh > > Reviewed By: Ayal > > Subscribers: hans, mzolotukhin > > Differential Revision: https://reviews.llvm.org/D36130 llvm-svn: 314824	2017-10-03 18:32:29 +00:00
Mohammad Shahid	1d5422f27f	[SLP] Vectorize jumbled memory loads. Summary: This patch tries to vectorize loads of consecutive memory accesses, accessed in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 which was reverted back due to some basic issue with representing the 'use mask' of jumbled accesses. This patch fixes the mask representation by recording the 'use mask' in the usertree entry. Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh Reviewed By: Ayal Subscribers: hans, mzolotukhin Differential Revision: https://reviews.llvm.org/D36130 llvm-svn: 314806	2017-10-03 15:28:48 +00:00
Evgeny Astigeevich	d3558b5d5e	[InlineCost, NFC] Extract code dealing with inbounds GEPs from visitGetElementPtr into a function The code responsible for analysis of inbounds GEPs is extracted into a separate function: CallAnalyzer::canFoldInboundsGEP. With the patch SROA enabling/disabling code is localized at one place instead of spreading across the code of CallAnalyzer::visitGetElementPtr. Differential Revision: https://reviews.llvm.org/D38233 llvm-svn: 314787	2017-10-03 12:00:40 +00:00
Mikael Holmen	6efe507e42	[Lint] Avoid failed assertion by fetching the proper pointer type Summary: When checking if a constant expression is a noop cast we fetched the IntPtrType by doing DL->getIntPtrType(V->getType())). However, there can be cases where V doesn't return a pointer, and then getIntPtrType() triggers an assertion. Now we pass DataLayout to isNoopCast so the method itself can determine what the IntPtrType is. Reviewers: arsenm Reviewed By: arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D37894 llvm-svn: 314763	2017-10-03 06:03:49 +00:00
Daniel Berlin	67fbb351e3	SparseSolver: Rename getOrInitValueState to getValueState, matching what SCCP calls it llvm-svn: 314744	2017-10-03 00:26:21 +00:00
Haicheng Wu	25f6c196d7	[InstSimplify] teach SimplifySelectInst() to fold more vector selects Call ConstantFoldSelectInstruction() to fold cases like below select <2 x i1><i1 true, i1 false>, <2 x i8> <i8 0, i8 1>, <2 x i8> <i8 2, i8 3> All operands are constants and the condition has mixed true and false conditions. Differential Revision: https://reviews.llvm.org/D38369 llvm-svn: 314741	2017-10-02 23:43:52 +00:00
Daniel Berlin	4d825bcf09	Template the sparse propagation solver instead of using void pointers Summary: This avoids using void * as the type of the lattice value and ugly casts needed to make that happen. (If folks want to use references, etc, they can use a reference_wrapper). Reviewers: davide, mssimpso Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D38476 llvm-svn: 314734	2017-10-02 22:49:49 +00:00
Xin Tong	c063c3f09d	Revert "Fix typo [NFC]" This reverts commit e60b5028619be1c81bd039d63a0627dac32d38f9. Incorrectly include changes that are not typo fix. llvm-svn: 314614	2017-10-01 00:09:53 +00:00
Xin Tong	efec219e1b	Fix typo [NFC] llvm-svn: 314613	2017-10-01 00:07:24 +00:00
Alex Shlyapnikov	e76aa3b0b2	Revert "Use the basic cost if a GEP is not used as addressing mode" This reverts commit r314517. This commit crashes sanitizer bots, for example: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/4167 Stack snippet: ... /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Support/Casting.h:255:0 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getGEPCost(llvm::GEPOperator const, llvm::ArrayRef<llvm::Value const>) /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:742:0 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getUserCost(llvm::User const, llvm::ArrayRef<llvm::Value const>) /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:782:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/lib/Analysis/TargetTransformInfo.cpp:116:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:116:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:343:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:864:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfo.h:285:0 ... llvm-svn: 314560	2017-09-29 22:04:45 +00:00
Jun Bum Lim	0e16a59e83	Use the basic cost if a GEP is not used as addressing mode Summary: Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target. However, since it doesn't check its actual users, it will return FREE even in cases where the GEP cannot be folded away as a part of actual addressing mode. For example, if an user of the GEP is a call instruction taking the GEP as a parameter, then the GEP may not be folded in isel. Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng Reviewed By: hfinkel Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38085 llvm-svn: 314517	2017-09-29 14:50:16 +00:00
Florian Hahn	8af01573a3	[LVI] Move LVILatticeVal class to separate header file (NFC). Summary: This allows sharing the lattice value code between LVI and SCCP (D36656). It also adds a `satisfiesPredicate` function, used by D36656. Reviewers: davide, sanjoy, efriedma Reviewed By: sanjoy Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D37591 llvm-svn: 314411	2017-09-28 11:09:22 +00:00
Sanjoy Das	def1729dc4	Use a BumpPtrAllocator for Loop objects Summary: And now that we no longer have to explicitly free() the Loop instances, we can (with more ease) use the destructor of LoopBase to do what LoopBase::clear() was doing. Reviewers: chandlerc Subscribers: mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38201 llvm-svn: 314375	2017-09-28 02:45:42 +00:00
Haicheng Wu	3ec848bc50	[InlineCost] add visitSelectInst() InlineCost can understand Select IR now. This patch finds free Select IRs and continue the propagation of SimplifiedValues, ConstantOffsetPtrs, and SROAArgValues. Differential Revision: https://reviews.llvm.org/D37198 llvm-svn: 314307	2017-09-27 14:44:56 +00:00
Daniel Berlin	97f34e887f	MemorySSAUpdater: Only add phis to insertedphis if we actually inserted them, not if we just found existing ones llvm-svn: 314273	2017-09-27 05:35:19 +00:00
Matthias Braun	cc603ee3d5	TargetLibraryInfo: Stop guessing wchar_t size Usually the frontend communicates the size of wchar_t via metadata and we can optimize wcslen (and possibly other calls in the future). In cases without the wchar_size metadata we would previously try to guess the correct size based on the target triple; however this is fragile to keep up to date and may miss users manually changing the size via flags. Better be safe and stop guessing and optimizing if the frontend didn't communicate the size. Differential Revision: https://reviews.llvm.org/D38106 llvm-svn: 314185	2017-09-26 02:36:57 +00:00
Michael Liao	b30286d81c	Remove trailing whitespaces. llvm-svn: 314115	2017-09-25 16:21:21 +00:00
Clement Courbet	2807c0a442	[CodeGenPrepare][NFC] Rename TargetTransformInfo::expandMemCmp -> TargetTransformInfo::enableMemCmpExpansion. Summary: Right now there are two functions with the same name, one does the work and the other one returns true if expansion is needed. Rename TargetTransformInfo::expandMemCmp to make it more consistent with other members of TargetTransformInfo. Remove the unused Instruction* parameter. Differential Revision: https://reviews.llvm.org/D38165 llvm-svn: 314096	2017-09-25 06:35:16 +00:00
Daniel Neilson	1341ac2ced	[SCEV] Generalize folding of trunc(x)+ntrunc(y) into folding mtrunc(x)+ntrunc(y) Summary: A SCEV such as: {%v2,+,((-1 (trunc i64 (-1 * %v1) to i32)) + (-1 * (trunc i64 %v1 to i32)))}<%loop> can be folded into, simply, {%v2,+,0}. However, the current code in ::getAddExpr() will not try to apply the simplification mtrunc(x)+ntrunc(y) -> trunc(trunc(m)x+trunc(n)y) because it only keys off having a non-multiplied trunc as the first term in the simplification. This patch generalizes this code to try to do a more generic fold of these trunc expressions. Reviewers: sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37888 llvm-svn: 313988	2017-09-22 15:47:57 +00:00
Sanjoy Das	388b012f4e	Rename markAsErased to erase, as pointed out in a previous review; NFC llvm-svn: 313951	2017-09-22 01:47:41 +00:00
Hans Wennborg	57c3341ada	Revert r313771 "[SLP] Vectorize jumbled memory loads." This broke the buildbots, e.g. http://bb.pgr.jp/builders/test-llvm-i686-linux-RA/builds/391 > Summary: > This patch tries to vectorize loads of consecutive memory accesses, accessed > in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 > which was reverted back due to some basic issue with representing the 'use mask' > jumbled accesses. > > This patch fixes the mask representation by recording the 'use mask' in the usertree entry. > > Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df > > Subscribers: mzolotukhin > > Reviewed By: ayal > > Differential Revision: https://reviews.llvm.org/D36130 > > Review comments updated accordingly > > Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0 > > Added a TODO for sortLoadAccesses API > > Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58 > > Modified the TODO for sortLoadAccesses API > > Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565 > > Review comment update for using OpdNum to insert the mask in respective location > > Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce > > Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase > > Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b llvm-svn: 313781	2017-09-20 18:00:03 +00:00
Mohammad Shahid	2b281de576	[SLP] Vectorize jumbled memory loads. Summary: This patch tries to vectorize loads of consecutive memory accesses, accessed in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 which was reverted back due to some basic issue with representing the 'use mask' jumbled accesses. This patch fixes the mask representation by recording the 'use mask' in the usertree entry. Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df Subscribers: mzolotukhin Reviewed By: ayal Differential Revision: https://reviews.llvm.org/D36130 Review comments updated accordingly Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0 Added a TODO for sortLoadAccesses API Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58 Modified the TODO for sortLoadAccesses API Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565 Review comment update for using OpdNum to insert the mask in respective location Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b llvm-svn: 313771	2017-09-20 17:19:57 +00:00
Alexander Kornienko	6a140234ed	Revert r313736: "[SLP] Vectorize jumbled memory loads." The revision breaks buildbots: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/6694/steps/test/logs/stdio llvm-svn: 313758	2017-09-20 14:53:07 +00:00
Alexander Kornienko	7302344bdf	Revert r313753: "Fix a -Wsign-compare warning in LoopAccessAnalysis.cpp" llvm-svn: 313757	2017-09-20 14:52:56 +00:00
Alexander Kornienko	6c629b5728	Fix a -Wsign-compare warning in LoopAccessAnalysis.cpp llvm-svn: 313753	2017-09-20 12:18:22 +00:00
Mohammad Shahid	f8db9bd857	[SLP] Vectorize jumbled memory loads. Summary: This patch tries to vectorize loads of consecutive memory accesses, accessed in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 which was reverted back due to some basic issue with representing the 'use mask' of jumbled accesses. This patch fixes the mask representation by recording the 'use mask' in the usertree entry. Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh Reviewed By: Ayal Subscribers: mzolotukhin Differential Revision: https://reviews.llvm.org/D36130 Commit after rebase for patch D36130 Change-Id: I8add1c265455669ef288d880f870a9522c8c08ab llvm-svn: 313736	2017-09-20 08:18:28 +00:00
Sanjoy Das	09613b122e	Tighten the invariants around LoopBase::invalidate Summary: With this change: - Methods in LoopBase trip an assert if the receiver has been invalidated - LoopBase::clear frees up the memory held the LoopBase instance This change also shuffles things around as necessary to work with this stricter invariant. Reviewers: chandlerc Subscribers: mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38055 llvm-svn: 313708	2017-09-20 02:31:57 +00:00
Sanjoy Das	66a004ac0c	Clang-format few files to make later diffs leaner; NFC llvm-svn: 313705	2017-09-20 01:12:09 +00:00
Sanjoy Das	76ab23234c	[LoopInfo] Make LoopBase and Loop destructors non-public Summary: See comment for why I think this is a good idea. This change also: - Removes an SCEV test case. The SCEV test was not testing anything useful (most of it was `#if 0` ed out) and it would need to be updated to deal with a private ~Loop::Loop. - Updates the loop pass manager test case to deal with a private ~Loop::Loop. - Renames markAsRemoved to markAsErased to contrast with removeLoop, via the usual remove vs. erase idiom we already have for instructions and basic blocks. Reviewers: chandlerc Subscribers: mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37996 llvm-svn: 313695	2017-09-19 23:19:00 +00:00
Sanjay Patel	0d4fd5b668	[InstSimplify] fold sdiv/srem based on compare of dividend and divisor This should bring signed div/rem analysis up to the same level as unsigned. We use icmp simplification to determine when the divisor is known greater than the dividend. Each positive test is followed by a negative test to show that we're not overstepping the boundaries of the known bits. There are extra tests for the signed-min-value special cases. Alive proofs: http://rise4fun.com/Alive/WI5 Differential Revision: https://reviews.llvm.org/D37713 llvm-svn: 313264	2017-09-14 14:59:07 +00:00
Sanjay Patel	cca8f7853f	[InstSimplify] clean up div/rem handling; NFCI The idea to make an 'isDivZero' helper was suggested for the signed case in D37713: https://reviews.llvm.org/D37713 This clean-up makes it clear that D37713 is just filling the gap for signed div/rem, removes unnecessary code, and allows us to remove a bit of duplicated code from the planned improvement in D37713. llvm-svn: 313261	2017-09-14 14:09:11 +00:00
Chandler Carruth	7376ae88eb	[PM/CGSCC] Teach the CGSCC pass manager components to gracefully handle invalidated SCCs even when we do not have an updated SCC to redirect towards. This comes up in a fairly subtle and surprising circumstance: we need to have a connected but internal node in the call graph which later becomes a disconnected island, and then gets deleted. All of this needs to happen mid-CGSCC walk. Because it is disconnected, we have no way of computing a new "current" SCC when it gets deleted. Instead, we need to explicitly check for a deleted "current" SCC and bail out of the current CGSCC step. This will bubble all the way up to the post-order walk and then resume correctly. I've included minimal tests for this bug. The specific behavior matches something we've seen in the wild with the new PM combined with ThinLTO and sample PGO, but I've not yet confirmed whether this is the only issue there. llvm-svn: 313242	2017-09-14 08:33:57 +00:00
Alon Kom	682cfc1d4c	[LV] Fix maximum legal VF calculation This patch fixes pr34283, which exposed that the computation of maximum legal width for vectorization was wrong, because it relied on MaxInterleaveFactor to obtain the maximum stride used in the loop, however not all strided accesses in the loop have an interleave-group associated with them. Instead of recording the maximum stride in the loop, which can be over conservative (e.g. if the access with the maximum stride is not involved in the dependence limitation), this patch tracks the actual maximum legal width imposed by accesses that are involved in dependencies. Differential Revision: https://reviews.llvm.org/D37507 llvm-svn: 313237	2017-09-14 07:40:02 +00:00
Easwaran Raman	4924bb002d	[Inliner] Add another way to compute full inline cost. Summary: Full inline cost is computed when -inline-cost-full is true or ORE is non-null. This patch adds another way to compute full inline cost by adding a field to InlineParams. This will be used by SampleProfileLoader to check legality of inlining a callee that it wants to inline. Reviewers: danielcdh, haicheng Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37819 llvm-svn: 313185	2017-09-13 20:16:02 +00:00
Hiroshi Yamauchi	a43913cfaf	Add options to dump PGO counts in text. Summary: Added text options to -pgo-view-counts and -pgo-view-raw-counts that dump block frequency and branch probability info in text. This is useful when the graph is very large and complex (the dot command crashes, lines/edges too close to tell apart, hard to navigate without textual search) or simply when text is preferred. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37776 llvm-svn: 313159	2017-09-13 17:20:38 +00:00
Teresa Johnson	cbdc5ff628	[ThinLTO] AliasSummary should not have any references Summary: References should only be on the aliasee. Reviewers: pcc Subscribers: llvm-commits, inglorion Differential Revision: https://reviews.llvm.org/D37814 llvm-svn: 313158	2017-09-13 17:10:24 +00:00
Silviu Baranga	ac920f7716	[LAA] Allow more run-time alias checks by coercing pointer expressions to AddRecExprs Summary: LAA can only emit run-time alias checks for pointers with affine AddRec SCEV expressions. However, non-AddRecExprs can be now be converted to affine AddRecExprs using SCEV predicates. This change tries to add the minimal set of SCEV predicates in order to enable run-time alias checking. Reviewers: anemet, mzolotukhin, mkuper, sanjoy, hfinkel Reviewed By: hfinkel Subscribers: mssimpso, Ayal, dorit, roman.shirokiy, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D17080 llvm-svn: 313012	2017-09-12 07:48:22 +00:00
Marcello Maggioni	ce90060d1c	[ScalarEvolution] Refactor forgetLoop() to improve performance forgetLoop() has pretty bad performance because it goes over the same instructions over and over again in particular when nested loop are involved. The refactoring changes the function to a not-recursive function and reusing the allocation for data-structures and the Visited set. NFCI Differential Revision: https://reviews.llvm.org/D37659 llvm-svn: 312920	2017-09-11 15:44:20 +00:00
Sanjay Patel	fa877fd464	[InstSimplify] reorder methods; NFC I'm trying to refactor some shared code for integer div/rem, but I keep having to scroll through fdiv. The FP ops have nothing in common with the integer ops, so I'm moving FP below everything else. While here, improve a couple of comments and fix some formatting. llvm-svn: 312913	2017-09-11 13:34:27 +00:00
Sanjay Patel	5876189ff1	[InstSimplify] refactor udiv/urem code and add tests; NFCI This removes some duplicated code and makes it easier to support signed div/rem in a similar way if we want to do that. Note that the existing comments were not accurate - we don't need a constant divisor to simplify; icmp simplification does more than that. But as the added tests show, it could go even further. llvm-svn: 312885	2017-09-10 17:55:08 +00:00
Nuno Lopes	404f106d71	Merge isKnownNonNull into isKnownNonZero It now knows the tricks of both functions. Also, fix a bug that considered allocas of non-zero address space to be always non null Differential Revision: https://reviews.llvm.org/D37628 llvm-svn: 312869	2017-09-09 18:23:11 +00:00
Sanjay Patel	6fd4391ddd	[DivRempairs] add a pass to optimize div/rem pairs (PR31028) This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented as an independent pass, so there's no stretching of scope and feature creep for an existing pass. I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost this same functionality as an addition to CGP in the motivating example of PR31028: https://bugs.llvm.org/show_bug.cgi?id=31028 The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and undo the hoisting that is done here. Decomposing remainder may allow removing some code from the backend (PPC and possibly others). Differential Revision: https://reviews.llvm.org/D37121 llvm-svn: 312862	2017-09-09 13:38:18 +00:00
Guozhi Wei	62d6414465	[TargetTransformInfo] Add a new public interface getInstructionCost Current TargetTransformInfo can support throughput cost model and code size model, but sometimes we also need instruction latency cost model in different optimizations. Hal suggested we need a single public interface to query the different cost of an instruction. So I proposed following interface: enum TargetCostKind { TCK_RecipThroughput, ///< Reciprocal throughput. TCK_Latency, ///< The latency of instruction. TCK_CodeSize ///< Instruction code size. }; int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const; All clients should mainly use this function to query the cost of an instruction, parameter <kind> specifies the desired cost model. This patch also provides a simple default implementation of getInstructionLatency. The default getInstructionLatency provides latency numbers for only small number of instruction classes, those latency numbers are only reasonable for modern OOO processors. It can be extended in following ways: Add more detail into this function. Add getXXXLatency function and call it from here. Implement target specific getInstructionLatency function. Differential Revision: https://reviews.llvm.org/D37170 llvm-svn: 312832	2017-09-08 22:29:17 +00:00
Alexey Bataev	6dd29fccb8	[SLP] Support for horizontal min/max reduction. SLP vectorizer supports horizontal reductions for Add/FAdd binary operations. Patch adds support for horizontal min/max reductions. Function getReductionCost() is split to getArithmeticReductionCost() for binary operation reductions and getMinMaxReductionCost() for min/max reductions. Patch fixes PR26956. Differential revision: https://reviews.llvm.org/D27846 llvm-svn: 312791	2017-09-08 13:49:36 +00:00
Peter Collingbourne	681fbb64a4	ModuleSummaryAnalysis: Correctly handle all function operand references. The current code that handles personality functions when creating a module summary does not correctly handle the case where a function's personality function operand refers to the function indirectly (e.g. via a bitcast). This patch handles such cases by treating personality function references like any other reference, i.e. by adding them to the function's reference list. This has the minor side benefit of allowing personality functions to participate in early dead stripping. We do this by calling findRefEdges on the function itself. This way we also end up handling other function operands (specifically prefix data and prologue data) for free. Differential Revision: https://reviews.llvm.org/D37553 llvm-svn: 312698	2017-09-07 05:35:35 +00:00
Matt Arsenault	3ced3d90c3	InstSimplify: canonicalize is idempotent llvm-svn: 312685	2017-09-07 01:21:43 +00:00
Nuno Lopes	ba1c9f7aee	Fix PR33878: BasicAA incorrectly assumes different address spaces don't alias Remove code that assumed that a nullptr of address space != 0 couldnt alias with a non-null pointer. This is incorrect, since nothing can be concluded about a null pointer in an address space != 0. This code was written before address spaces were introduced Differential Revision: https://reviews.llvm.org/D37518 llvm-svn: 312648	2017-09-06 16:55:31 +00:00
Sanjay Patel	6840c5ff75	[ValueTracking, InstCombine] canonicalize fcmp ord/uno with non-NAN ops to null constants This is a preliminary step towards solving the remaining part of PR27145 - IR for isfinite(): https://bugs.llvm.org/show_bug.cgi?id=27145 In order to solve that one more generally, we need to add matching for and/or of fcmp ord/uno with a constant operand. But while looking at those patterns, I realized we were missing a canonicalization for nonzero constants. Rather than limiting to just folds for constants, we're adding a general value tracking method for this based on an existing DAG helper. By transforming everything to 0.0, we can simplify the existing code in foldLogicOfFCmps() and pick up missing vector folds. Differential Revision: https://reviews.llvm.org/D37427 llvm-svn: 312591	2017-09-05 23:13:13 +00:00
Daniel Neilson	3f0e4ad833	[SCEV] Ensure ScalarEvolution::createAddRecFromPHIWithCastsImpl properly handles out of range truncations of the start and accum values Summary: When constructing the predicate P1 in ScalarEvolution::createAddRecFromPHIWithCastsImpl() it is possible for the PHISCEV from which the predicate is constructed to be a SCEVConstant instead of a SCEVAddRec. If this happens, then the cast<SCEVAddRec>(PHISCEV) in the code will assert. Such a PHISCEV is possible if either the start value or the accumulator value is a constant value that not equal to its truncated value, and if the truncated value is zero. This patch adds tests that demonstrate the cast<> assertion, and fixes this problem by checking whether the PHISCEV is a constant before constructing the P1 predicate; if it is, then P1 is equivalent to one of P2 or P3. Additionally, if we know that the start value or accumulator value are constants then we check whether the P2 and/or P3 predicates are known false at compile time; if either is, then we bail out of constructing the AddRec. Reviewers: sanjoy, mkazantsev, silviu.baranga Reviewed By: mkazantsev Subscribers: mkazantsev, llvm-commits Differential Revision: https://reviews.llvm.org/D37265 llvm-svn: 312568	2017-09-05 19:54:03 +00:00
Eugene Zelenko	75075efe5e	[Analysis, Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 312383	2017-09-01 21:37:29 +00:00
Craig Topper	924f20262b	[InstCombine][InstSimplify] Teach decomposeBitTestICmp to look through truncate instructions This patch teaches decomposeBitTestICmp to look through truncate instructions on the input to the compare. If a truncate is found it will now return the pre-truncated Value and appropriately extend the APInt mask. This allows some code to be removed from InstSimplify that was doing this functionality. This allows InstCombine's bit test combining code to match a pre-truncate Value with the same Value appear with an 'and' on another icmp. Or it allows us to combine a truncate to i16 and a truncate to i8. This also required removing the type check from the beginning of getMaskedTypeForICmpPair, but I believe that's ok because we still have to find two values from the input to each icmp that are equal before we'll do any transformation. So the type check was really just serving as an early out. There was one user of decomposeBitTestICmp that didn't want to look through truncates, so I've added a flag to prevent that behavior when necessary. Differential Revision: https://reviews.llvm.org/D37158 llvm-svn: 312382	2017-09-01 21:27:34 +00:00
Peter Collingbourne	5e8b94c137	ModuleSummaryAnalysis: Correctly handle refs from function inline asm to module inline asm. If a function contains inline asm and the module-level inline asm contains the definition of a local symbol, prevent the function from being imported in case the function-level inline asm refers to a symbol in the module-level inline asm. Differential Revision: https://reviews.llvm.org/D37370 llvm-svn: 312332	2017-09-01 16:24:02 +00:00
Alexandre Isoard	405728fd47	[SCEV] Add URem support to SCEV In LLVM IR the following code: %r = urem <ty> %t, %b is equivalent to %q = udiv <ty> %t, %b %s = mul <ty> nuw %q, %b %r = sub <ty> nuw %t, %q ; (t / b) * b + (t % b) = t As UDiv, Mul and Sub are already supported by SCEV, URem can be implemented with minimal effort using that relation: %r --> (-%b * (%t /u %b)) + %t We implement two special cases: - if %b is 1, the result is always 0 - if %b is a power-of-two, we produce a zext/trunc based expression instead That is, the following code: %r = urem i32 %t, 65536 Produces: %r --> (zext i16 (trunc i32 %a to i16) to i32) Note that while this helps get a tighter bound on the range analysis and the known-bits analysis, this exposes some normalization shortcoming of SCEVs: %div = udim i32 %a, 65536 %mul = mul i32 %div, 65536 %rem = urem i32 %a, 65536 %add = add i32 %mul, %rem Will usually not be reduced. llvm-svn: 312329	2017-09-01 14:59:59 +00:00
Eugene Zelenko	fa6434bebb	[Analysis] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes. Also affected in files (NFC). llvm-svn: 312289	2017-08-31 21:56:16 +00:00
Adam Nemet	4846e66fdd	Remove an unnecessary const_cast. I think that this is dating back to when emit used to take a const reference. llvm-svn: 311948	2017-08-28 23:00:13 +00:00
Don Hinton	a67e13129d	[Dominators] Remove redundant explicit template instantiation. Summary: Remove redundant explicit template instantiation. This was reported by Andrew Kelley building release_50 with gcc7.2.0 on MacOS: duplicate symbol llvm::DominatorTreeBase. Reviewers: kuhar, andrewrk, davide, hans Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37185 llvm-svn: 311835	2017-08-26 21:08:51 +00:00
Hiroshi Yamauchi	63e17ebf8b	Add options to dump block frequency/branch probability info in text. Summary: Add options -print-bfi/-print-bpi that dump block frequency and branch probability info like -view-block-freq-propagation-dags and -view-machine-block-freq-propagation-dags do but in text. This is useful when the graph is very large and complex (the dot command crashes, lines/edges too close to tell apart, hard to navigate without textual search) or simply when text is preferred. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37165 llvm-svn: 311822	2017-08-26 00:31:00 +00:00
Haicheng Wu	61995364de	[InlineCost] Small changes to early exit condition. NFC. Change the early exit condition from Cost > Threshold to Cost >= Threshold because the inline condition is Cost < Threshold. Differential Revision: https://reviews.llvm.org/D37087 llvm-svn: 311791	2017-08-25 19:00:33 +00:00
Michael Kruse	c0a6aab6b6	Normlize to LF line endings. Commit r297442 introduced mixed CRLF/LF line endings to two files. Normalize to to LF-only line endings. llvm-svn: 311774	2017-08-25 12:38:53 +00:00
Dehao Chen	f0e27e63e7	Move accurate-sample-profile into the function attribute. Summary: We need to have accurate-sample-profile in function attribute so that it works with LTO. Reviewers: davidxl, rsmith Reviewed By: davidxl Subscribers: sanjoy, mehdi_amini, javed.absar, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D37113 llvm-svn: 311706	2017-08-24 21:37:04 +00:00
Tobias Grosser	d7eb619299	Model cache size and associativity in TargetTransformInfo Summary: We add the precise cache sizes and associativity for the following Intel architectures: - Penry - Nehalem - Westmere - Sandy Bridge - Ivy Bridge - Haswell - Broadwell - Skylake - Kabylake Polly uses since several months a performance model for BLAS computations that derives optimal cache and register tile sizes from cache and latency information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016). While bootstrapping this model, these target values have been kept in Polly. However, as our implementation is now rather mature, it seems time to teach LLVM itself about cache sizes. Interestingly, L1 and L2 cache sizes are pretty constant across micro-architectures, hence a set of architecture specific default values seems like a good start. They can be expanded to more target specific values, in case certain newer architectures require different values. For now a set of Intel architectures are provided. Just as a little teaser, for a simple gemm kernel this model allows us to improve performance from 1.2s to 0.27s. For gemm kernels with less optimal memory layouts even larger speedups can be reported. Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman, fhahn, sebpop, efriedma, asb Reviewed By: fhahn, asb Subscribers: lsaba, asb, pollydev, llvm-commits Differential Revision: https://reviews.llvm.org/D37051 llvm-svn: 311647	2017-08-24 09:46:25 +00:00
Rong Xu	15848e5977	[PGO] Set edge weights for indirectbr instruction with profile counts Current PGO only annotates the edge weight for branch and switch instructions with profile counts. We should also annotate the indirectbr instruction as all the information is there. This patch enables the annotating for indirectbr instructions. Also uses this annotation in branch probability analysis. Differential Revision: https://reviews.llvm.org/D37074 llvm-svn: 311604	2017-08-23 21:36:02 +00:00
George Rimar	1e94ca115d	[lib/Analysis] - Mark personality functions as live. This is PR33245. Case I am fixing is next: Imagine we have 2 BC files, one defines and uses personality routine, second has only declaration and also uses it. Previously algorithm computing dead symbols (llvm::computeDeadSymbols) did not know about personality routines and leaved them dead even if function that has routine was live. As a result thinLTOInternalizeAndPromoteGUID() method changed binding for such symbol to local. Later when LLD tried to link these objects it failed because one object had undefined global symbol for routine and second object contained local definition instead of global. Patch set the live root flag on the corresponding FunctionSummary for personality routines when we build the per-module summaries during the compile step. Differential revision: https://reviews.llvm.org/D36834 llvm-svn: 311432	2017-08-22 08:50:56 +00:00
Craig Topper	7227ebad9c	[ValueTracking] Add assertions that the starting Depth in isKnownToBeAPowerOfTwo and ComputeNumSignBitsImpl is not above MaxDepth The function does an equality check later to terminate the recursion, but that won't work if its starts out too high. Similar assert already exists in computeKnownBits. llvm-svn: 311400	2017-08-21 22:56:12 +00:00
Haicheng Wu	0812c5bea3	[InlineCost] Add cl::opt to allow full inline cost to be computed for debugging purposes. Currently, the inline cost model will bail once the inline cost exceeds the inline threshold in order to avoid unnecessary compile-time. However, when debugging it is useful to compute the full cost, so this command line option is added to override the default behavior. I took over this work from Chad Rosier (mcrosier@codeaurora.org). Differential Revision: https://reviews.llvm.org/D35850 llvm-svn: 311371	2017-08-21 20:00:09 +00:00
Chad Rosier	4eb18742ca	[InlineCost] Add more debug during inline cost computation. llvm-svn: 311370	2017-08-21 19:56:46 +00:00
Eugene Zelenko	be709f2c19	[Analysis] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 311212	2017-08-18 23:51:26 +00:00
Amjad Aboud	88ffa3afe2	[InstCombine] Teach ComputeNumSignBitsImpl to handle integer multiply instruction. Differential Revision: https://reviews.llvm.org/D36679 llvm-svn: 311206	2017-08-18 22:56:55 +00:00
Eugene Zelenko	bb1b2d09cf	[Analysis] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 311048	2017-08-16 22:07:40 +00:00
Sanjay Patel	042a53624c	[DemandedBits] simplify call; NFC llvm-svn: 311009	2017-08-16 14:28:23 +00:00
Craig Topper	b1e4b1a070	[InstSimplify] Teach decomposeBitTestICmp to handle non-canonical compares This adds support non-canonical compare predicates. InstSimplify can't rely on canonicalization to have occurred. Differential Revision: https://reviews.llvm.org/D36646 llvm-svn: 310893	2017-08-14 22:11:43 +00:00
Craig Topper	0aa3a19512	Recommit r310869, "[InstSimplify][InstCombine] Modify the interface of decomposeBitTestICmp and use it in the InstSimplify" This recommits r310869, with the moved files and no extra changes. Original commit message: This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too. I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself. I also had to make decomposeBitTest support vectors since InstSimplify needs that. As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library. Differential Revision: https://reviews.llvm.org/D36593 llvm-svn: 310889	2017-08-14 21:39:51 +00:00
Chandler Carruth	bba762a13f	[InlineCost] Refactor the checks for different analyses to be a bit more localized to the code that uses those analyses. Technically, this can change behavior as we no longer require the existence of the ProfileSummaryInfo analysis to use local profile information via BFI. We didn't actually require the PSI to have an interesting profile though, so this only really impacts the behavior in non-default pass pipelines. IMO, this makes it substantially less surprising how everything works -- before an analysis that wasn't actually used had to exist to trigger any profile aware inlining. I think the new organization makes it more obvious where various checks for profile signals happen. Differential Revision: https://reviews.llvm.org/D36710 llvm-svn: 310888	2017-08-14 21:25:00 +00:00
Andrew Kaylor	53a5fbb45f	Add strictfp attribute to prevent unwanted optimizations of libm calls Differential Revision: https://reviews.llvm.org/D34163 llvm-svn: 310885	2017-08-14 21:15:13 +00:00
Craig Topper	69fa8e0d99	Revert r310869 "[InstSimplify][InstCombine] Modify the interface of decomposeBitTestICmp and use it in the InstSimplify" Failed to add the two files that moved. And then added an extra change I didn't mean to while trying to fix that. Reverting everything. llvm-svn: 310873	2017-08-14 19:09:32 +00:00
Craig Topper	9c7b881677	Revert r310870 "[InstCombine][InstSimplify] 'git add' two files that moved in r310869." An extra change crept in here. llvm-svn: 310872	2017-08-14 19:09:28 +00:00
Craig Topper	914c836842	[InstCombine][InstSimplify] 'git add' two files that moved in r310869. llvm-svn: 310870	2017-08-14 19:01:32 +00:00
Craig Topper	2f0b450666	[InstSimplify][InstCombine] Modify the interface of decomposeBitTestICmp and use it in the InstSimplify This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too. I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself. I also had to make decomposeBitTest support vectors since InstSimplify needs that. As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library. Differential Revision: https://reviews.llvm.org/D36593 llvm-svn: 310869	2017-08-14 18:49:42 +00:00
Hal Finkel	b03dd4be70	[ValueTracking] Don't delete assumes of side-effectful instructions ValueTracking has to strike a balance when attempting to propagate information backwards from assumes, because if the information is trivially propagated backwards, it can appear to LLVM that the assumption is known to be true, and therefore can be removed. This is sound (because an assumption has no semantic effect except for causing UB), but prevents the assume from allowing further optimizations. The isEphemeralValueOf check exists to try and prevent this issue by not removing the source of an assumption. This tries to make it a little bit more general to handle the case of side-effectful instructions, such as in %0 = call i1 @get_val() %1 = xor i1 %0, true call void @llvm.assume(i1 %1) Patch by Ariel Ben-Yehuda, thanks! Differential Revision: https://reviews.llvm.org/D36590 llvm-svn: 310859	2017-08-14 17:11:43 +00:00
Chandler Carruth	37c7b08710	[ValueTracking] Revert r310583 which enabled functionality that still is causing compile time issues. Moreover, the patch deleted the flag in addition to changing the default, and links to a code review that doesn't even discuss the flag and just has an update to a Clang test case. I've followed up on the commit thread to ask for numbers on compile time at this point, leaving the flag in place until things stabilize, and pointing at specific code that seems to exhibit excessive compile time with this patch. Original commit message for r310583: """ [ValueTracking] Enabling ValueTracking patch by default (recommit). Part 2. The original patch was an improvement to IR ValueTracking on non-negative integers. It has been checked in to trunk (D18777, r284022). But was disabled by default due to performance regressions. Perf impact has improved. The patch would be enabled by default. """" llvm-svn: 310816	2017-08-14 07:03:24 +00:00
Eugene Zelenko	530851c2bc	[Analysis] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 310766	2017-08-11 21:30:02 +00:00
Chandler Carruth	19913b22c0	[PM] Switch the CGSCC debug messages to use the standard LLVM debug printing techniques with a DEBUG_TYPE controlling them. It was a mistake to start re-purposing the pass manager `DebugLogging` variable for generic debug printing -- those logs are intended to be very minimal and primarily used for testing. More detailed and comprehensive logging doesn't make sense there (it would only make for brittle tests). Moreover, we kept forgetting to propagate the `DebugLogging` variable to various places making it also ineffective and/or unavailable. Switching to `DEBUG_TYPE` makes this a non-issue. llvm-svn: 310695	2017-08-11 05:47:13 +00:00
Nikolai Bozhenov	d97136c182	[ValueTracking] Enabling ValueTracking patch by default (recommit). Part 2. The original patch was an improvement to IR ValueTracking on non-negative integers. It has been checked in to trunk (D18777, r284022). But was disabled by default due to performance regressions. Perf impact has improved. The patch would be enabled by default. Reviewers: reames, hfinkel Differential Revision: https://reviews.llvm.org/D34101 Patch by: Olga Chupina <olga.chupina@intel.com> llvm-svn: 310583	2017-08-10 11:24:57 +00:00
Chandler Carruth	9c161e894a	[LCG] Fix an assert in a on-scope-exit lambda that checked the contents of the returned value. Checking the returned value from inside of a scoped exit isn't actually valid. It happens to work when NRVO fires and the stars align, which they reliably do with Clang but don't, for example, on MSVC builds. llvm-svn: 310547	2017-08-10 03:05:21 +00:00
Hiroshi Yamauchi	ccd412f48d	[LVI] Fix LVI compile time regression around constantFoldUser() Summary: Avoid checking each operand and calling getValueFromCondition() before calling constantFoldUser() when the instruction type isn't supported by constantFoldUser(). This fixes a large compile time regression in an internal build. Reviewers: sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36552 llvm-svn: 310545	2017-08-10 02:23:14 +00:00
Craig Topper	ba69187988	[InstSimplify] Add test cases that show that simplifySelectWithICmpCond doesn't work with non-canonical comparisons. llvm-svn: 310542	2017-08-10 01:02:02 +00:00
Nuno Lopes	7829506731	CFLAA: return MustAlias when pointers p, q are equal, i.e., must-alias(p, sz_p, p, sz_q) irrespective of access sizes sz_p, sz_q As discussed a couple of weeks ago on the ML. This makes the behavior consistent with that of BasicAA. AA clients already check the obj size themselves and may not require the obj size to match exactly the access size (e.g., in case of store forwarding) llvm-svn: 310495	2017-08-09 17:02:18 +00:00
Davide Italiano	1a943a90f5	[ValueTracking] Turn a test into an assertion. As discussed with Chad, this should never happen, but this assertion is basically free, so, keep it around just in case. llvm-svn: 310493	2017-08-09 16:06:54 +00:00
Davide Italiano	30e5194287	[ValueTracking] Honour recursion limit. The recently improved support for `icmp` in ValueTracking (r307304) exposes the fact that `isImplied` condition doesn't really bail out if we hit the recursion limit (and calls `computeKnownBits` which increases the depth and asserts). Differential Revision: https://reviews.llvm.org/D36512 llvm-svn: 310481	2017-08-09 15:13:50 +00:00
Jonas Paulsson	6228aeda65	[LSR / TTI / SystemZ] Eliminate TargetTransformInfo::isFoldableMemAccess() isLegalAddressingMode() has recently gained the extra optional Instruction* parameter, and therefore it can now do the job that previously only isFoldableMemAccess() could do. The SystemZ implementation of isLegalAddressingMode() has gained the functionality of checking for offsets, which used to be done with isFoldableMemAccess(). The isFoldableMemAccess() hook has been removed everywhere. Review: Quentin Colombet, Ulrich Weigand https://reviews.llvm.org/D35933 llvm-svn: 310463	2017-08-09 11:28:01 +00:00
Chandler Carruth	2cd28b2ba0	[LCG] Completely remove the map-based association of post-order numbers to Nodes when removing ref edges from a RefSCC. This map based association turns out to be pretty expensive for large RefSCCs and pointless as we already have embedded data members inside nodes that we use to track the DFS state. We can reuse one of those and the map becomes unnecessary. This also fuses the update of those numbers into the scan across the pending stack of nodes so that we don't walk the nodes twice during the DFS. With this I expect the new PM to be faster than the old PM for the test case I have been optimizing. That said, it also seems simpler and more direct in many ways. The side storage was always pretty awkward. The last remaining hot-spot in the profile of the LCG once this is done will be the edge iterator walk in the DFS. I'll take a look at improving that next. llvm-svn: 310456	2017-08-09 09:37:39 +00:00
Chandler Carruth	9c3deaa653	[LCG] Special case when removing a ref edge from a RefSCC leaves that RefSCC still connected. This is common and can be handled much more efficiently. As soon as we know we've covered every node in the RefSCC with the DFS, we can simply reset our state and return. This avoids numerous data structure updates and other complexity. On top of other changes, this appears to get new PM back to parity with the old PM for a large protocol buffer message source code. The dense map updates are very hot in this function. llvm-svn: 310451	2017-08-09 09:14:34 +00:00
Chandler Carruth	23c2f44cc7	[LCG] Switch one of the update methods for the LazyCallGraph to support limited batch updates. Specifically, allow removing multiple reference edges starting from a common source node. There are a few constraints that play into supporting this form of batching: 1) The way updates occur during the CGSCC walk, about the most we can functionally batch together are those with a common source node. This also makes the batching simpler to implement, so it seems a worthwhile restriction. 2) The far and away hottest function for large C++ files I measured (generated code for protocol buffers) showed a huge amount of time was spent removing ref edges specifically, so it seems worth focusing there. 3) The algorithm for removing ref edges is very amenable to this restricted batching. There are just both API and implementation special casing for the non-batch case that gets in the way. Once removed, supporting batches is nearly trivial. This does modify the API in an interesting way -- now, we only preserve the target RefSCC when the RefSCC structure is unchanged. In the face of any splits, we create brand new RefSCC objects. However, all of the users were OK with it that I could find. Only the unittest needed interesting updates here. How much does batching these updates help? I instrumented the compiler when run over a very large generated source file for a protocol buffer and found that the majority of updates are intrinsically updating one function at a time. However, nearly 40% of the total ref edges removed are removed as part of a batch of removals greater than one, so these are the cases batching can help with. When compiling the IR for this file with 'opt' and 'O3', this patch reduces the total time by 8-9%. Differential Revision: https://reviews.llvm.org/D36352 llvm-svn: 310450	2017-08-09 09:05:27 +00:00
Nuno Lopes	598d1632e1	BasicAA: assert on another case where aliasGEP shouldn't get a PartialAlias response llvm-svn: 310420	2017-08-08 21:25:26 +00:00
Dehao Chen	34cfcb29aa	Make ICP uses PSI to check for hotness. Summary: Currently, ICP checks the count against a fixed value to see if it is hot enough to be promoted. This does not work for SamplePGO because sampled count may be much smaller. This patch uses PSI to check if the count is hot enough to be promoted. Reviewers: davidxl, tejohnson, eraman Reviewed By: davidxl Subscribers: sanjoy, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36341 llvm-svn: 310416	2017-08-08 20:57:33 +00:00
Craig Topper	b498a23f0e	[KnownBits][ValueTracking] Move the math for calculating known bits for add/sub into a static method in KnownBits object I want to reuse this code in SimplifyDemandedBits handling of Add/Sub. This will make that easier. Wonder if we should use it in SelectionDAG's computeKnownBits too. Differential Revision: https://reviews.llvm.org/D36433 llvm-svn: 310378	2017-08-08 16:29:35 +00:00
Nuno Lopes	c7d4110aa7	BasicAA: aliasGEP shouldn't get a PartialAlias response here add an assert() to ensure that's the case (as I'm not convinced it won't happen) llvm-svn: 310373	2017-08-08 16:13:24 +00:00
Chandler Carruth	6e35c31d2d	[PM] Fix a likely more critical infloop bug in the CGSCC pass manager. This was just a bad oversight on my part. The code in question should never have worked without this fix. But it turns out, there are relatively few places that involve libfunctions that participate in a single SCC, and unless they do, this happens to not matter. The effect of not having this correct is that each time through this routine, the edge from write_wrapper to write was toggled between a call edge and a ref edge. First time through, it becomes a demoted call edge and is turned into a ref edge. Next time it is a promoted call edge from a ref edge. On, and on it goes forever. I've added the asserts which should have always been here to catch silly mistakes like this in the future as well as a test case that will actually infloop without the fix. The other (much scarier) infinite-inlining issue I think didn't actually occur in practice, and I simply misdiagnosed this minor issue as that much more scary issue. The other issue is still a real issue, but I'm somewhat relieved that so far it hasn't happened in real-world code yet... llvm-svn: 310342	2017-08-08 10:13:23 +00:00
Chandler Carruth	691d0243a5	[LCG] Remove yet another variable only used inside of asserts. llvm-svn: 310174	2017-08-05 08:33:16 +00:00
Benjamin Kramer	ef42fd43f4	[LCG] Fold otherwise unused variable into assert. No functionality change intended. llvm-svn: 310173	2017-08-05 08:28:48 +00:00
Chandler Carruth	adbf14ab85	[LCG] Completely remove the parent set and leaf tracking for RefSCCs. After the previous series of patches, this is now trivial and deletes a pretty astonishing amount of complexity. This has been a long time coming, as the move toward a PO sequence of RefSCCs started eroding the underlying use cases for this half of the data structure. Among the biggest advantages here is that now there aren't two independent data structures that need to stay in sync. Some of my profiling has also indicated that updating the parent sets was among the most expensive parts of the lazy call graph. Eliminating it whole sale is likely to be a nice win in terms of compile time. Last but not least, I had discussed with some folks previously keeping it around for asserts and other correctness checking, but once the fundamentals of the parent and child checking were implemented without the parent sets their value in correctness checking was tiny and no where near worth the cost of the complexity required to keep everything up-to-date. llvm-svn: 310171	2017-08-05 07:37:00 +00:00
Chandler Carruth	38bd6b50ef	[LCG] Re-implement the basic isParentOf, isAncestorOf, isChildOf, and isDescendantOf methods on RefSCCs in terms of the forward edges rather than the parent sets. This is technically slower, but probably not interestingly slower, and all of these routines were already so expensive that they're guarded behind both !NDEBUG and EXPENSIVE_CHECKS. This removes another non-critical usage of parent sets. I've also added some comments to try and help clarify to any potential users the costs of these routines. They're mostly useful for debugging, asserts, or other queries. llvm-svn: 310170	2017-08-05 06:24:09 +00:00
Chandler Carruth	c718b8e7c3	[LCG] Add the concept of a "dead" node and use it to avoid a complex walk over the parent set. When removing a single function from the call graph, we previously would walk the entire RefSCC's parent set and then walk every outgoing edge just to find the ones to remove. In addition to this being quite high complexity in theory, it is also the last fundamental use of the parent sets. With this change, when we remove a function we transform the node containing it to be recognizably "dead" and then teach the edge iterators to recognize edges to such nodes and skip them the same way they skip null edges. We can't move fully to using "dead" nodes -- when disconnecting two live nodes we need to null out the edge. But the complexity this adds to the edge sequence isn't too bad and the simplification of lazily handling this seems like a significant win. llvm-svn: 310169	2017-08-05 05:47:37 +00:00
Chandler Carruth	39df40d8c2	[LCG] Replace an implicit bool operator with a named function. (NFC) The definition of 'false' here was already pretty vague and debatable, and I'm about to add another potential 'false' that would actually make much more sense in a bool operator. Especially given how rarely this is used, a nicely named method seems better. llvm-svn: 310165	2017-08-05 04:04:06 +00:00
Chandler Carruth	403d3c4b2b	[LCG] When removing a dead function and clearing out the data structures, actually null out the graph pointers as well. We won't ever update these, and we certainly shouldn't be calling any methods on them, so it seems good to defensively nuke them. llvm-svn: 310164	2017-08-05 03:37:39 +00:00
Chandler Carruth	7cb23e705f	[LCG] Rather than walking the directed graph structure to update graph pointers in node objects, just walk the map from function to node. It doesn't have stable ordering, but works just as well and is much simpler. We don't need ordering when just updating internal pointers. llvm-svn: 310163	2017-08-05 03:37:39 +00:00
Chandler Carruth	2c58e1a45c	[LCG] Remove the complex walk of the parent sets to update graph pointers. This is completely unnecessary as we have a trivial list of RefSCCs now that we can walk. llvm-svn: 310162	2017-08-05 03:37:38 +00:00
Chandler Carruth	13ffd110ad	[LCG] Remove the use of the parent sets to compute connectivity when merging RefSCCs. The logic to directly use the reference edges is simpler and not substantially slower (despite the comments to the contrary) because this is not actually an especially hot part of LCG in practice. llvm-svn: 310161	2017-08-05 03:37:37 +00:00
Amara Emerson	56dca4e3ca	[SCEV] Preserve NSW information for sext(subtract). Pushes the sext onto the operands of a Sub if NSW is present. Also adds support for propagating the nowrap flags of the llvm.ssub.with.overflow intrinsic during analysis. Differential Revision: https://reviews.llvm.org/D35256 llvm-svn: 310117	2017-08-04 20:19:46 +00:00
Easwaran Raman	ff77cc750c	[Inliner] Fix a typo in option description. NFC. llvm-svn: 310073	2017-08-04 17:15:17 +00:00
Craig Topper	4e22ee6745	[ConstantInt] Use ConstantInt::getValue instead of Constant::getUniqueInteger in a few places where we obviously have a ConstantInt. NFC getUniqueInteger will ultimately call ConstantInt::getValue, but calling ConstantInt::getValue should be inlined. llvm-svn: 310069	2017-08-04 16:59:29 +00:00
Dehao Chen	63799512b2	Adjust the hotness threshold from 99.9% to 99%. Summary: We originally set the hotness threshold as 99.9% to be consistent with gcc FDO. But because the inline heuristic is different between 2 compilers: llvm uses bottom-up algorithm while gcc uses priority based. The LLVM algorithm tends to inline too much early that prevents hot callsites from further inlined into its caller. Due to this restriction, we think it is reasonable to lower the hotness threshold to give priority to those that are really hot. Our experiments show that this change would improve performance on large applications. Note that the inline heuristic has great room for further tuning. Once the inline heuristics are refined, we could adjust this threshold to allow inlining for less hot callsites. Reviewers: davidxl, tejohnson, eraman Reviewed By: tejohnson Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D36317 llvm-svn: 310065	2017-08-04 16:20:54 +00:00
Charles Saternos	75da10d1b2	[ThinLTO] Add FunctionAttrs to ThinLTO index Adds function attributes to index: ReadNone, ReadOnly, NoRecurse, NoAlias. This attributes will be used for future ThinLTO optimizations that will propagate function attributes across modules. llvm-svn: 310061	2017-08-04 16:00:58 +00:00
Nikolai Bozhenov	1545eb3408	[InstCombine] Canonicalize clamp of float types to minmax in fast mode. Summary: This commit allows matchSelectPattern to recognize clamp of float arguments in the presence of FMF the same way as already done for integers. This case is a little different though. With integers, given the min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX "automatically". That is not the case for float, because for them only full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care about NaNs. On the other hand, some backends (e.g. X86) have only FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM nodes are illegal thus selection is not happening. So I decided to do such kind of transformation in IR (InstCombiner) instead of complicating the logic in the backend. Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper Reviewed By: efriedma Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D33186 llvm-svn: 310054	2017-08-04 12:22:17 +00:00
Teresa Johnson	8482e56920	Use profile summary to disable peeling for huge working sets Summary: Detect when the working set size of a profiled application is huge, by comparing the number of counts required to reach the hot percentile in the profile summary to a large threshold. When the working set size is determined to be huge, disable peeling to avoid bloating the working set further. Note that the selected threshold (15K) is significantly larger than the largest working set value in SPEC cpu2006 (which is gcc at around 11K). Reviewers: davidxl Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D36288 llvm-svn: 310005	2017-08-03 23:42:58 +00:00
Easwaran Raman	974d4eea93	[Inliner] Increase threshold for hot callsites without PGO. Summary: This increases the inlining threshold for hot callsites. Hotness is defined in terms of block frequency of the callsite relative to the caller's entry block's frequency. Since this requires BFI in the inliner, this only affects the new PM pipeline. This is enabled by default at -O3. This improves the performance of some internal benchmarks. Notably, an internal benchmark for Gipfeli compression (https://github.com/google/gipfeli) improves by ~7%. Povray in SPEC2006 improves by ~2.5%. I am running more experiments and will update the thread if other benchmarks show improvement/regression. In terms of text size, LLVM test-suite shows an 1.22% text size increase. Diving into the results, 13 of the benchmarks in the test-suite increases by > 10%. Most of these are small, but Adobe-C++/loop_unroll (17.6% increases) and tramp3d(20.7% size increase) have >250K text size. On a large application, the text size increases by 2% Reviewers: chandlerc, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36199 llvm-svn: 309994	2017-08-03 22:23:33 +00:00
Hiroshi Yamauchi	144ee2b4d7	[LVI] Constant-propagate a zero extension of the switch condition value through case edges Summary: (This is a second attempt as https://reviews.llvm.org/D34822 was reverted.) LazyValueInfo currently computes the constant value of the switch condition through case edges, which allows the constant value to be propagated through the case edges. But we have seen a case where a zero-extended value of the switch condition is used past case edges for which the constant propagation doesn't occur. This patch adds a small logic to handle such a case in getEdgeValueLocal(). This is motivated by the Python 2.7 eval loop in PyEval_EvalFrameEx() where the lack of the constant propagation causes longer live ranges and more spill code than necessary. With this patch, we see that the code size of PyEval_EvalFrameEx() decreases by ~5.4% and a performance test improves by ~4.6%. Reviewers: sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36247 llvm-svn: 309986	2017-08-03 21:11:30 +00:00
Dehao Chen	f58df39529	Do not want to use BFI to get profile count for sample pgo Summary: For SamplePGO, we already record the callsite count in the call instruction itself. So we do not want to use BFI to get profile count as it is less accurate. Reviewers: tejohnson, davidxl, eraman Reviewed By: eraman Subscribers: sanjoy, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36025 llvm-svn: 309964	2017-08-03 17:11:41 +00:00
Max Kazantsev	2cb3653404	[SCEV] Re-enable "Cache results of computeExitLimit" The patch rL309080 was reverted because it did not clean up the cache on "forgetValue" method call. This patch re-enables this change, adds the missing check and introduces two new unit tests that make sure that the cache is cleaned properly. Differential Revision: https://reviews.llvm.org/D36087 llvm-svn: 309925	2017-08-03 08:41:30 +00:00
Hiroshi Inoue	0bd906ec8f	[StackColoring] Update AliasAnalysis information in stack coloring pass (part 2) This patch is update after the first patch (https://reviews.llvm.org/rL309651) based on the post-commit comments. Stack coloring pass need to maintain AliasAnalysis information when merging stack slots of different types. Actually, there is a FIXME comment in StackColoring.cpp // FIXME: In order to enable the use of TBAA when using AA in CodeGen, // we'll also need to update the TBAA nodes in MMOs with values // derived from the merged allocas. But, TBAA has been already enabled in CodeGen without fixing this pass. The incorrect TBAA metadata results in recent failures in bootstrap test on ppc64le (PR33928) by allowing unsafe instruction scheduling. Although we observed the problem on ppc64le, this is a platform neutral issue. This patch makes the stack coloring pass maintains AliasAnalysis information when merging multiple stack slots. This patch fixes PR33928. llvm-svn: 309849	2017-08-02 18:16:32 +00:00
Chad Rosier	5ce28f4f92	[InlineCost] Remove redundant call. NFC. llvm-svn: 309819	2017-08-02 14:50:27 +00:00
Chad Rosier	2e1c050e52	[InlineCost] Simplify more 'and' and 'or' operations. Differential Revision: https://reviews.llvm.org/D35856 llvm-svn: 309817	2017-08-02 14:40:42 +00:00
Sanjoy Das	4cad61adb3	[SCEV/IndVars] Always compute loop exiting values if the backedge count is 0 If SCEV can prove that the backedge taken count for a loop is zero, it does not need to "understand" a recursive PHI to compute its exiting value. This should fix PR33885. llvm-svn: 309758	2017-08-01 22:37:58 +00:00
Chad Rosier	dfd1de687d	[Value Tracking] Default argument to true and rename accordingly. NFC. IMHO this is a bit more readable. llvm-svn: 309739	2017-08-01 20:18:54 +00:00
Chad Rosier	f73a10d2df	[Value Tracking] Refactor and/or logic into helper. NFC. llvm-svn: 309726	2017-08-01 19:22:36 +00:00

... 3 4 5 6 7 ...

7902 Commits