llvm-project

Commit Graph

Author	SHA1	Message	Date
Easwaran Raman	c5fa6358ba	[NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case Differential Revision: https://reviews.llvm.org/D34312 llvm-svn: 306484	2017-06-27 23:11:18 +00:00
Florian Hahn	2665febb54	[AArch64] Inline callee if its target-features are a subset of the caller Summary: Similar to X86, it should be safe to inline callees if their target-features are a subset of the caller. This change matches GCC's inlining behavior with respect to attributes [1]. [1] https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function-Attributes.html#AArch64-Function-Attributes Reviewers: kristof.beyls, javed.absar, rengolin, t.p.northover Reviewed By: t.p.northover Subscribers: aemerson, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D34698 llvm-svn: 306478	2017-06-27 22:27:32 +00:00
Jun Bum Lim	506cfb7ab7	[InlineCost] Do not take INT_MAX when Cost is negative Summary: visitSwitchInst should not take INT_MAX when Cost is negative. Instead of INT_MAX , we also use a valid upperbound cost when overflow occurs in Cost. Reviewers: hans, echristo, dmgreen Reviewed By: dmgreen Subscribers: mcrosier, javed.absar, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D34436 llvm-svn: 306118	2017-06-23 16:12:37 +00:00
whitequark	08b20356c3	Define behavior of "stack-probe-size" attribute when inlining. Also document the attribute, since "probe-stack" already is. Reviewed By: majnemer Differential Revision: https://reviews.llvm.org/D34528 llvm-svn: 306069	2017-06-22 23:22:36 +00:00
whitequark	ed54b4a798	Add a "probe-stack" attribute This attribute is used to ensure the guard page is triggered on stack overflow. Stack frames larger than the guard page size will generate a call to __probestack to touch each page so the guard page won't be skipped. Reviewed By: majnemer Differential Revision: https://reviews.llvm.org/D34386 llvm-svn: 305939	2017-06-21 18:46:50 +00:00
David Blaikie	ae8c4af4ac	Inliner: Don't remove calls to readnone+nounwind (but not always_inline) functions in the AlwaysInliner llvm-svn: 305245	2017-06-12 23:01:17 +00:00
David Blaikie	cb9327b02d	Inliner: Don't touch indirect calls Other comments/implications are that this isn't intended behavior (nor perserved/reimplemented in the new inliner) & complicates fixing the 'inlining' of trivially dead calls without consulting the cost function first. llvm-svn: 305052	2017-06-09 03:29:20 +00:00
Jun Bum Lim	2960d41e68	[InlineCost] Enable the new switch cost heuristic Summary: This is to enable the new switch inline cost heuristic (r301649) by removing the old heuristic as well as the flag itself. In my experiment for LLVM test suite and spec2000/2006, +17.82% performance and 8% code size reduce was observed in spec2000/vertex with O3 LTO in AArch64. No significant code size / performance regression was found in O3/O2/Os. No significant complain was reported from the llvm-dev thread. Reviewers: hans, chandlerc, eraman, haicheng, mcrosier, bmakam, eastig, ddibyend, echristo Reviewed By: echristo Subscribers: javed.absar, kristof.beyls, echristo, aemerson, rengolin, mehdi_amini Differential Revision: https://reviews.llvm.org/D32653 llvm-svn: 304594	2017-06-02 20:42:54 +00:00
Haicheng Wu	bf277f38ad	[InlineCost] Add a test case for GEP cost The added test case is to check whether the simplified value is passed to getGEPCost(). Differential Revision: https://reviews.llvm.org/D33779 llvm-svn: 304454	2017-06-01 19:06:07 +00:00
Teresa Johnson	525dcb617b	Fix update VP metadata after inlining for instrumentation PGO Summary: With instrumentation profiling, when updating the VP metadata after an inline, VP metadata on the inlined copy was inadvertantly having all counts zeroed out. This was causing indirect calls from code inlined during the call step to be marked as cold in the ThinLTO summaries and not imported. The CallerBFI needs to be passed down so that the CallSiteCount can be computed from the profile summary info. With Sample PGO this was working since the count is extracted from the branch weight metadata on the call being inlined (even before we stopped looking at metadata for non-sample PGO in r302844 this largely wasn't working for instrumentation PGO since only promoted indirect calls would be getting inlined and have the metadata). Added an instrumentation PGO test and renamed the sample PGO test. Reviewers: danielcdh, eraman Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D33389 llvm-svn: 303574	2017-05-22 20:28:18 +00:00
Easwaran Raman	3cd1479c3f	[Inliner] Do not mix callsite and callee hotness based updates. Update threshold based on callee's hotness only when BFI is not available. Otherwise use only callsite's hotness. This makes it easier to reason about hotness related threshold updates. Differential revision: https://reviews.llvm.org/D33157 llvm-svn: 303210	2017-05-16 21:18:09 +00:00
Xinliang David Li	90a9ef6ced	Renable test that was disabled due to cost analysis llvm-svn: 303000	2017-05-14 02:58:39 +00:00
Teresa Johnson	2a6b7991d4	Restrict call metadata based hotness detection to Sample PGO mode Summary: Don't use the metadata on call instructions for determining hotness unless we are in sample PGO mode, where it is needed because profile counts are not accurate. In instrumentation mode this is not necessary and does more harm than good when calls have VP metadata that hasn't been properly scaled after transformations or dropped after constant prop based devirtualization (both should be fixed, but we don't need to do this in the first place for instrumentation PGO). This required adjusting a number of tests to distinguish between sample and instrumentation PGO handling, and to add in profile summary metadata so that getProfileCount can get the summary. Reviewers: davidxl, danielcdh Subscribers: aemerson, rengolin, mehdi_amini, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D32877 llvm-svn: 302844	2017-05-11 23:18:05 +00:00
Easwaran Raman	c103ef89ee	Decrease inlinecold-threshold to 45 I ran the test-suite (including SPEC 2006) in PGO mode comparing cold thresholds of 225 and 45. Here are some stats on the text size: Out of 904 tests that ran, 197 see a change in text size. The average text size reduction (of all the 904 binaries) is 1.07%. Of the 197 binaries, 19 see a text size increase, as high as 18%, but most of them are small single source benchmarks. There are 3 multisource benchmarks with a >0.5% size increase (0.7, 1.3 and 2.1 are their % increases). On the other side of the spectrum, 31 benchmarks see >10% size reduction and 6 of them are MultiSource. I haven't run the test-suite with other values of inlinecold-threshold. Since we have a cold callsite threshold of 45, I picked this value. Differential revision: https://reviews.llvm.org/D33106 llvm-svn: 302829	2017-05-11 21:36:28 +00:00
Daniel Berlin	0f2af7f93b	ConstantFold: Handle gep nonnull, undef as well llvm-svn: 302447	2017-05-08 17:37:33 +00:00
Dehao Chen	a75d0da91b	Update VP prof metadata during inlining. Summary: r298270 added profile update logic for branch_weights. This patch implements profile update logic for VP prof metadata too. Reviewers: eraman, tejohnson, davidxl Reviewed By: eraman Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32773 llvm-svn: 302209	2017-05-05 00:47:34 +00:00
Easwaran Raman	5e6f9bd4f8	[PM] Add ProfileSummaryAnalysis as a required pass in the new pipeline. Differential revision: https://reviews.llvm.org/D32768 llvm-svn: 302170	2017-05-04 16:58:45 +00:00
Jun Bum Lim	919f9e8d65	[InlineCost] Improve the cost heuristic for Switch Summary: The motivation example is like below which has 13 cases but only 2 distinct targets ``` lor.lhs.false2: ; preds = %if.then switch i32 %Status, label %if.then27 [ i32 -7012, label %if.end35 i32 -10008, label %if.end35 i32 -10016, label %if.end35 i32 15000, label %if.end35 i32 14013, label %if.end35 i32 10114, label %if.end35 i32 10107, label %if.end35 i32 10105, label %if.end35 i32 10013, label %if.end35 i32 10011, label %if.end35 i32 7008, label %if.end35 i32 7007, label %if.end35 i32 5002, label %if.end35 ] ``` which is compiled into a balanced binary tree like this on AArch64 (similar on X86) ``` .LBB853_9: // %lor.lhs.false2 mov w8, #10012 cmp w19, w8 b.gt .LBB853_14 // BB#10: // %lor.lhs.false2 mov w8, #5001 cmp w19, w8 b.gt .LBB853_18 // BB#11: // %lor.lhs.false2 mov w8, #-10016 cmp w19, w8 b.eq .LBB853_23 // BB#12: // %lor.lhs.false2 mov w8, #-10008 cmp w19, w8 b.eq .LBB853_23 // BB#13: // %lor.lhs.false2 mov w8, #-7012 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_14: // %lor.lhs.false2 mov w8, #14012 cmp w19, w8 b.gt .LBB853_21 // BB#15: // %lor.lhs.false2 mov w8, #-10105 add w8, w19, w8 cmp w8, #9 // =9 b.hi .LBB853_17 // BB#16: // %lor.lhs.false2 orr w9, wzr, #0x1 lsl w8, w9, w8 mov w9, #517 and w8, w8, w9 cbnz w8, .LBB853_23 .LBB853_17: // %lor.lhs.false2 mov w8, #10013 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_18: // %lor.lhs.false2 mov w8, #-7007 add w8, w19, w8 cmp w8, #2 // =2 b.lo .LBB853_23 // BB#19: // %lor.lhs.false2 mov w8, #5002 cmp w19, w8 b.eq .LBB853_23 // BB#20: // %lor.lhs.false2 mov w8, #10011 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_21: // %lor.lhs.false2 mov w8, #14013 cmp w19, w8 b.eq .LBB853_23 // BB#22: // %lor.lhs.false2 mov w8, #15000 cmp w19, w8 b.ne .LBB853_3 ``` However, the inline cost model estimates the cost to be linear with the number of distinct targets and the cost of the above switch is just 2 InstrCosts. The function containing this switch is then inlined about 900 times. This change use the general way of switch lowering for the inline heuristic. It etimate the number of case clusters with the suitability check for a jump table or bit test. Considering the binary search tree built for the clusters, this change modifies the model to be linear with the size of the balanced binary tree. The model is off by default for now : -inline-generic-switch-cost=false This change was originally proposed by Haicheng in D29870. Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier Reviewed By: hans Subscribers: joerg, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D31085 llvm-svn: 301649	2017-04-28 16:04:03 +00:00
Matt Arsenault	f10061ec70	Add address space mangling to lifetime intrinsics In preparation for allowing allocas to have non-0 addrspace. llvm-svn: 299876	2017-04-10 20:18:21 +00:00
Teresa Johnson	428b9e0627	[ThinLTO] Correct counting of functions in inliner stats Summary: Declarations need to be filtered out when counting functions. Reviewers: eraman Subscribers: Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D31336 llvm-svn: 298720	2017-03-24 17:59:06 +00:00
Dehao Chen	e593049fb0	Updates branch_weights annotation for call instructions during inlining. Summary: Inliner should update the branch_weights annotation to scale it to proper value. Reviewers: davidxl, eraman Reviewed By: eraman Subscribers: zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D30767 llvm-svn: 298270	2017-03-20 16:40:44 +00:00
Chandler Carruth	814e0df1c5	[PM/Inliner] Fix a bug in r297374 where we would leave stale calls in the work queue and crash when trying to visit them after deleting the function containing those calls. llvm-svn: 297940	2017-03-16 10:45:42 +00:00
Chandler Carruth	6ef42cc6bb	[PM/Inliner] Add a test case that encapsulates the core issue addressed in r297374. I've extracted a small version of this from the C++ metaprogram Richard came up with to exercise these kinds of issues and written comments to describe both how to reproduce a fresh version of the test case and what likely failure modes are. The test case is still a bit brittle as it depends on the particular inline cost modeling and SCC visitation order, but it definitely would have caught the bug right away when developing things so it seems a really valuable test case to have. llvm-svn: 297935	2017-03-16 10:13:55 +00:00
Chandler Carruth	20e588e1af	[PM/Inliner] Make the new PM's inliner process call edges across an entire SCC before iterating on newly-introduced call edges resulting from any inlined function bodies. This more closely matches the behavior of the old PM's inliner. While it wasn't really clear to me initially, this behavior is actually essential to the inliner behaving reasonably in its current design. Because the inliner is fundamentally a bottom-up inliner and all of its cost modeling is designed around that it often runs into trouble within an SCC where we don't have any meaningful bottom-up ordering to use. In addition to potentially cyclic, infinite inlining that we block with the inline history mechanism, it can also take seemingly simple call graph patterns within an SCC and turn them into insanely large functions by accidentally working top-down across the SCC without any of the threshold limitations that traditional top-down inliners use. Consider this diabolical monster.cpp file that Richard Smith came up with to help demonstrate this issue: ``` template <int N> extern const char str; void g(const char ); template <bool K, int N> void f(bool B, bool E) { if (K) g(str<N>); if (B == E) return; if (B) f<true, N + 1>(B + 1, E); else f<false, N + 1>(B + 1, E); } template <> void f<false, MAX>(bool B, bool E) { return f<false, 0>(B, E); } template <> void f<true, MAX>(bool B, bool E) { return f<true, 0>(B, E); } extern bool arr, end; void test() { f<false, 0>(arr, end); } ``` When compiled with '-DMAX=N' for various values of N, this will create an SCC with a reasonably large number of functions. Previously, the inliner would try to exhaust the inlining candidates in a single function before moving on. This, unfortunately, turns it into a top-down inliner within the SCC. Because our thresholds were never built for that, we will incrementally decide that it is always worth inlining and proceed to flatten the entire SCC into that one function. What's worse, we'll then proceed to the next function, and do the exact same thing except we'll skip the first function, and so on. And at each step, we'll also make some of the constant factors larger, which is awesome. The fix in this patch is the obvious one which makes the new PM's inliner use the same technique used by the old PM: consider all the call edges across the entire SCC before beginning to process call edges introduced by inlining. The result of this is essentially to distribute the inlining across the SCC so that every function incrementally grows toward the inline thresholds rather than allowing the inliner to grow one of the functions vastly beyond the threshold. The code for this is a bit awkward, but it works out OK. We could consider in the future doing something more powerful here such as prioritized order (via lowest cost and/or profile info) and/or a code-growth budget per SCC. However, both of those would require really substantial work both to design the system in a way that wouldn't break really useful abstraction decomposition properties of the current inliner and to be tuned across a reasonably diverse set of code and workloads. It also seems really risky in many ways. I have only found a single real-world file that triggers the bad behavior here and it is generated code that has a pretty pathological pattern. I'm not worried about the inliner not doing an awesome* job here as long as it does ok. On the other hand, the cases that will be tricky to get right in a prioritized scheme with a budget will be more common and idiomatic for at least some frontends (C++ and Rust at least). So while these approaches are still really interesting, I'm not in a huge rush to go after them. Staying even closer to the existing PM's behavior, especially when this easy to do, seems like the right short to medium term approach. I don't really have a test case that makes sense yet... I'll try to find a variant of the IR produced by the monster template metaprogram that is both small enough to be sane and large enough to clearly show when we get this wrong in the future. But I'm not confident this exists. And the behavior change here should be unobservable without snooping on debug logging. So there isn't really much to test. The test case updates come from two incidental changes: 1) We now visit functions in an SCC in the opposite order. I don't think there really is a "right" order here, so I just update the test cases. 2) We no longer compute some analyses when an SCC has no call instructions that we consider for inlining. llvm-svn: 297374	2017-03-09 11:35:40 +00:00
Adrian Prantl	d4056501fb	Revert "Strip debug info when inlining into a nodebug function." This reverts commit r296488. As noted by David Blaikie on llvm-commits, I overlooked the case of a debug function being inlined into a nodebug function being inlined into a debug function. llvm-svn: 297163	2017-03-07 17:28:57 +00:00
Adrian Prantl	80d0c93436	Strip debug info when inlining into a nodebug function. The LLVM backend cannot produce any debug info for an llvm::Function without a DISubprogram attachment. When inlining a debug-info-carrying function into a nodebug function, there is therefore no reason to keep any debug info intrinsic calls or debug locations on the instructions. This fixes a problem discovered in PR32042. rdar://problem/30679307 llvm-svn: 296488	2017-02-28 16:58:13 +00:00
Hans Wennborg	2d5841fa73	Revert r296366 "[InlineFunction] add nonnull assumptions based on argument attributes" It causes miscompiles e.g. during self-host of Clang (PR32082). llvm-svn: 296398	2017-02-27 22:33:02 +00:00
Sanjay Patel	40975e05eb	[InlineFunction] add nonnull assumptions based on argument attributes This was suggested in D27855: have the inliner add assumptions, so we don't lose nonnull info provided by argument attributes. This still doesn't solve PR28430 (dyn_cast), but this gets us closer. https://reviews.llvm.org/D29999 llvm-svn: 296366	2017-02-27 18:13:48 +00:00
Sanjay Patel	056218644b	[Inline] add tests to show attribute information loss; NFC llvm-svn: 295209	2017-02-15 17:42:58 +00:00
Easwaran Raman	5a12f236c6	Fix a bug in caller's BFI update code after inlining. Multiple blocks in the callee can be mapped to a single cloned block since we prune the callee as we clone it. The existing code iterates over the value map and clones the block frequency (and eventually scales the frequencies of the cloned blocks). Value map's iteration is not deterministic and so the cloned block might get the frequency of any of the original blocks. The fix is to set the max of the original frequencies to the cloned block. The first block in the sequence must have this max frequency and, in the call context, subsequent blocks must have its frequency. Differential Revision: https://reviews.llvm.org/D29696 llvm-svn: 295115	2017-02-14 22:49:28 +00:00
Taewook Oh	f22fa72e4a	Do not apply redundant LastCallToStaticBonus Summary: As written in the comments above, LastCallToStaticBonus is already applied to the cost if Caller has only one user, so it is redundant to reapply the bonus here. If the only user is not a caller, TotalSecondaryCost will not be adjusted anyway because callerWillBeRemoved is false. If there's no caller at all, we don't need to care about TotalSecondaryCost because inliningPreventsSomeOuterInline is false. Reviewers: chandlerc, eraman Reviewed By: eraman Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D29169 llvm-svn: 295075	2017-02-14 17:30:05 +00:00
Adam Nemet	e7bdf227f6	[Inliner] Fold analysis remarks into missed remarks This significantly reduces the noise level of these messages. llvm-svn: 293492	2017-01-30 16:22:45 +00:00
Chandler Carruth	6acdca78a0	[PH] Replace uses of AssertingVH from members of analysis results with a lazy-asserting PoisoningVH. AssertVH is fundamentally incompatible with cache-invalidation of analysis results. The invaliadtion happens after the AssertingVH has already fired. Instead, use a PoisoningVH that will assert if the dangling handle is ever used rather than merely be assigned or destroyed. This patch also removes all of the (numerous) doomed attempts to work around this fundamental incompatibility. It is a pretty significant simplification IMO. The most interesting change is in the Inliner where we still do some clearing because we don't want to rely on the coarse grained invalidation strategy of the containing pass manager. However, I prefer the approach that contains this logic to the cleanup phase of the Inliner, and I think we could enhance the CGSCC analysis management layer to make this even better in the future if desired. The rest is straight cleanup. I've also added a test for one of the harder cases to work around: when a module analysis contains many AssertingVHes pointing at functions. Differential Revision: https://reviews.llvm.org/D29006 llvm-svn: 292928	2017-01-24 12:55:57 +00:00
Chandler Carruth	5144703664	[PM] Add a dedicated test case for the issue fixed in r292770. While this is covered by a clang test case, we should have something locally to LLVM that immediately checks the inliner doesn't leave analyses to dangling IR bodies. llvm-svn: 292772	2017-01-23 07:53:20 +00:00
Chandler Carruth	b698d5964d	[PM] Fix a really nasty bug introduced when adding PGO support to the new PM's inliner. The bug happens when we refine an SCC after having computed a proxy for the FunctionAnalysisManager, and then proceed to compute fresh analyses for functions in the new SCC using the manager provided by the old SCC's proxy. And when we manage to mutate a function in this new SCC in a way that invalidates those analyses. This can be... challenging to reproduce. I've managed to contrive a set of functions that trigger this and added a test case, but it is a bit brittle. I've directly checked that the passes run in the expected ways to help avoid the test just becoming silently irrelevant. This gets the new PM back to passing the LLVM test suite after the PGO improvements landed. llvm-svn: 292757	2017-01-22 10:34:01 +00:00
Easwaran Raman	12585b0148	Improve PGO support for the new inliner This adds the following to the new PM based inliner in PGO mode: * Use block frequency analysis to derive callsite's profile count and use that to adjust thresholds of hot and cold callsites. * Incrementally update the BFI of the caller after a callee gets inlined into it. This incremental update is only within an invocation of the run method - BFI is not preserved across calls to run. Update the function entry count of the callee after inlining it into a caller. * I've tuned the thresholds for the hot and cold callsites using a hacked up version of the old inliner that explicitly computes BFI on a set of internal benchmarks and spec. Once the new PM based pipeline stabilizes (IIRC Chandler mentioned there are known issues) I'll benchmark this again and adjust the thresholds if required. Inliner PGO support. Differential revision: https://reviews.llvm.org/D28331 llvm-svn: 292666	2017-01-20 22:44:04 +00:00
Haicheng Wu	201b191b82	Recommit "[InlineCost] Use TTI to check if GEP is free." #3 This is the third attemp to recommit r292526. The original summary: Currently, a GEP is considered free only if its indices are all constant. TTI::getGEPCost() can give target-specific more accurate analysis. TTI is already used for the cost of many other instructions. llvm-svn: 292633	2017-01-20 18:51:22 +00:00
Haicheng Wu	71ef5bc0ff	Revert "Recommit "[InlineCost] Use TTI to check if GEP is free." #2" This reverts commit r292616 because the test case still has problem. llvm-svn: 292618	2017-01-20 16:52:22 +00:00
Haicheng Wu	8f34ae2aae	Recommit "[InlineCost] Use TTI to check if GEP is free." #2 This is the second attemp to recommit r292526. The original summary: Currently, a GEP is considered free only if its indices are all constant. TTI::getGEPCost() can give target-specific more accurate analysis. TTI is already used for the cost of many other instructions. llvm-svn: 292616	2017-01-20 16:36:34 +00:00
Haicheng Wu	8f2aca388b	Revert "Recommit "[InlineCost] Use TTI to check if GEP is free."" This reverts commit r292570. The test still has problem. llvm-svn: 292572	2017-01-20 03:40:41 +00:00
Haicheng Wu	1af1f071ea	Recommit "[InlineCost] Use TTI to check if GEP is free." This recommits r292526 which is reverted in r292529 after fixing the test case. The original summary: Currently, a GEP is considered free only if its indices are all constant. TTI::getGEPCost() can give target-specific more accurate analysis. TTI is already used for the cost of many other instructions. llvm-svn: 292570	2017-01-20 03:09:11 +00:00
Haicheng Wu	e036df4723	Revert "[InlineCost] Use TTI to check if GEP is free." This reverts commit r292526. The test case has problem. llvm-svn: 292529	2017-01-19 22:51:03 +00:00
Haicheng Wu	da556345dc	[InlineCost] Use TTI to check if GEP is free. Currently, a GEP is considered free only if its indices are all constant. TTI::getGEPCost() can give target-specific more accurate analysis. TTI is already used for the cost of many other instructions. Differential Revision: https://reviews.llvm.org/D28693 llvm-svn: 292526	2017-01-19 22:28:34 +00:00
Chandler Carruth	96809ae7ea	[Inliner] Fix a test where I typo'ed 'CHECK' as 'CHCEK' when converting to FileCheck. Fortunately, it passes. =] Spotted in review by Bob Wilson! llvm-svn: 290953	2017-01-04 11:15:01 +00:00
Chandler Carruth	05ca5acc9e	[PM] Introduce a devirtualization iteration layer for the new PM. This is an orthogonal and separated layer instead of being embedded inside the pass manager. While it adds a small amount of complexity, it is fairly minimal and the composability and control seems worth the cost. The logic for this ends up being nicely isolated and targeted. It should be easy to experiment with different iteration strategies wrapped around the CGSCC bottom-up walk using this kind of facility. The mechanism used to track devirtualization is the simplest one I came up with. I think it handles most of the cases the existing iteration machinery handles, but I haven't done a very in depth analysis. It does however match the basic intended semantics, and we can tweak or tune its exact behavior incrementally as necessary. One thing that we may want to revisit is freshly building the value handle set on each iteration. While I don't think this will be a significant cost (it is strictly fewer value handles but more churn of value handes than the old call graph), it is conceivable that we'll want a somewhat more clever tracking mechanism. My hope is to layer that on as a follow up patch with data supporting any implementation complexity it adds. This code also provides for a basic count heuristic: if the number of indirect calls decreases and the number of direct calls increases for a given function in the SCC, we assume devirtualization is responsible. This matches the heuristics currently used in the legacy pass manager. Differential Revision: https://reviews.llvm.org/D23114 llvm-svn: 290665	2016-12-28 11:07:33 +00:00
Chandler Carruth	443e57e01d	[PM] Teach the CGSCC's CG update utility to more carefully invalidate analyses when we're about to break apart an SCC. We can't wait until after breaking apart the SCC to invalidate things: 1) Which SCC do we then invalidate? All of them? 2) Even if we invalidate all of them, a newly created SCC may not have a proxy that will convey the invalidation to functions! Previously we only invalidated one of the SCCs and too late. This led to stale analyses remaining in the cache. And because the caching strategy actually works, they would get used and chaos would ensue. Doing invalidation early is somewhat pessimizing though if we know that the SCC structure won't change. So it turns out that the design to make the mutation API force the caller to know the kind of mutation in advance was indeed 100% correct and we didn't do enough of it. So this change also splits two cases of switching a call edge to a ref edge into two separate APIs so that callers can clearly test for this and take the easy path without invalidating when appropriate. This is particularly important in this case as we expect most inlines to be between functions in separate SCCs and so the common case is that we don't have to so aggressively invalidate analyses. The LCG API change in turn needed some basic cleanups and better testing in its unittest. No interesting functionality changed there other than more coverage of the returned sequence of SCCs. While this seems like an obvious improvement over the current state, I'd like to revisit the core concept of invalidating within the CG-update layer at all. I'm wondering if we would be better served forcing the callers to handle the invalidation beforehand in the cases that they can handle it. An interesting example is when we want to teach the inliner to update and preserve analyses. But we can cross that bridge when we get there. With this patch, the new pass manager an build all of the LLVM test suite at -O3 and everything passes. =D I haven't bootstrapped yet and I'm sure there are still plenty of bugs, but this gives a nice baseline so I'm going to increasingly focus on fleshing out the missing functionality, especially the bits that are just turned off right now in order to let us establish this baseline. llvm-svn: 290664	2016-12-28 10:34:50 +00:00
Chandler Carruth	9900d18bab	[PM] Teach the inliner's call graph update to handle inserting new edges when they are call edges at the leaf but may (transitively) be reached via ref edges. It turns out there is a simple rule: insert everything as a ref edge which is a safe conservative default. Then we let the existing update logic handle promoting some of those to call edges. Note that it would be fairly cheap to make these call edges right away if that is desirable by testing whether there is some existing call path from the source to the target. It just seemed like slightly more complexity in this code path that isn't strictly necessary. If anyone feels strongly about handling this differently I'm happy to change it. llvm-svn: 290649	2016-12-28 03:13:12 +00:00
Chandler Carruth	625038d5d5	[PM] Turn on the new PM's inliner in addition to the current one for most of the inliner test cases. The inliner involves a bunch of interesting code and tends to be where most of the issues I've seen experimenting with the new PM lie. All of these test cases pass, but I'd like to keep some more thorough coverage here so doing a fairly blanket enabling. There are a handful of interesting tests I've not enabled yet because they're focused on the always inliner, or on functionality that doesn't (yet) exist in the inliner. llvm-svn: 290592	2016-12-27 07:18:43 +00:00
Chandler Carruth	141bf5d14d	[PM] Add one of the features left out of the initial inliner patch: skipping indirectly recursive inline chains. To do this, we implicitly build an inline stack for each callsite and check prior to inlining that doing so would not form a cycle. This uses the exact same technique and even shares some code with the legacy PM inliner. This solution remains deeply unsatisfying to me because it means we cannot actually iterate the inliner externally. Doing so would not be able to easily detect and avoid such cycles. Some day I would very much like to have a solution that works without this internal state to detect cycles, but this is not that day. llvm-svn: 290590	2016-12-27 06:46:20 +00:00
Chandler Carruth	db6ced8484	[PM] Wire up another test to the new pass manager. Nothing really interesting here, but I had to improve the test to use variables rather than hard coding value names as we happen to end up with different value names in the new PM. llvm-svn: 290589	2016-12-27 06:46:16 +00:00

1 2 3 4 5 ...

435 Commits