llvm-project

Commit Graph

Author	SHA1	Message	Date
Easwaran Raman	c103ef89ee	Decrease inlinecold-threshold to 45 I ran the test-suite (including SPEC 2006) in PGO mode comparing cold thresholds of 225 and 45. Here are some stats on the text size: Out of 904 tests that ran, 197 see a change in text size. The average text size reduction (of all the 904 binaries) is 1.07%. Of the 197 binaries, 19 see a text size increase, as high as 18%, but most of them are small single source benchmarks. There are 3 multisource benchmarks with a >0.5% size increase (0.7, 1.3 and 2.1 are their % increases). On the other side of the spectrum, 31 benchmarks see >10% size reduction and 6 of them are MultiSource. I haven't run the test-suite with other values of inlinecold-threshold. Since we have a cold callsite threshold of 45, I picked this value. Differential revision: https://reviews.llvm.org/D33106 llvm-svn: 302829	2017-05-11 21:36:28 +00:00
Xinliang David Li	351d9b01b9	Refactor callsite cost computation into a helper function /NFC Makes code more readable. The function will also be used by the partial inlining's cost analysis. llvm-svn: 301899	2017-05-02 05:38:41 +00:00
Jun Bum Lim	919f9e8d65	[InlineCost] Improve the cost heuristic for Switch Summary: The motivation example is like below which has 13 cases but only 2 distinct targets ``` lor.lhs.false2: ; preds = %if.then switch i32 %Status, label %if.then27 [ i32 -7012, label %if.end35 i32 -10008, label %if.end35 i32 -10016, label %if.end35 i32 15000, label %if.end35 i32 14013, label %if.end35 i32 10114, label %if.end35 i32 10107, label %if.end35 i32 10105, label %if.end35 i32 10013, label %if.end35 i32 10011, label %if.end35 i32 7008, label %if.end35 i32 7007, label %if.end35 i32 5002, label %if.end35 ] ``` which is compiled into a balanced binary tree like this on AArch64 (similar on X86) ``` .LBB853_9: // %lor.lhs.false2 mov w8, #10012 cmp w19, w8 b.gt .LBB853_14 // BB#10: // %lor.lhs.false2 mov w8, #5001 cmp w19, w8 b.gt .LBB853_18 // BB#11: // %lor.lhs.false2 mov w8, #-10016 cmp w19, w8 b.eq .LBB853_23 // BB#12: // %lor.lhs.false2 mov w8, #-10008 cmp w19, w8 b.eq .LBB853_23 // BB#13: // %lor.lhs.false2 mov w8, #-7012 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_14: // %lor.lhs.false2 mov w8, #14012 cmp w19, w8 b.gt .LBB853_21 // BB#15: // %lor.lhs.false2 mov w8, #-10105 add w8, w19, w8 cmp w8, #9 // =9 b.hi .LBB853_17 // BB#16: // %lor.lhs.false2 orr w9, wzr, #0x1 lsl w8, w9, w8 mov w9, #517 and w8, w8, w9 cbnz w8, .LBB853_23 .LBB853_17: // %lor.lhs.false2 mov w8, #10013 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_18: // %lor.lhs.false2 mov w8, #-7007 add w8, w19, w8 cmp w8, #2 // =2 b.lo .LBB853_23 // BB#19: // %lor.lhs.false2 mov w8, #5002 cmp w19, w8 b.eq .LBB853_23 // BB#20: // %lor.lhs.false2 mov w8, #10011 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_21: // %lor.lhs.false2 mov w8, #14013 cmp w19, w8 b.eq .LBB853_23 // BB#22: // %lor.lhs.false2 mov w8, #15000 cmp w19, w8 b.ne .LBB853_3 ``` However, the inline cost model estimates the cost to be linear with the number of distinct targets and the cost of the above switch is just 2 InstrCosts. The function containing this switch is then inlined about 900 times. This change use the general way of switch lowering for the inline heuristic. It etimate the number of case clusters with the suitability check for a jump table or bit test. Considering the binary search tree built for the clusters, this change modifies the model to be linear with the size of the balanced binary tree. The model is off by default for now : -inline-generic-switch-cost=false This change was originally proposed by Haicheng in D29870. Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier Reviewed By: hans Subscribers: joerg, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D31085 llvm-svn: 301649	2017-04-28 16:04:03 +00:00
Easwaran Raman	e1bd7cceca	Remove a repeated comment line. NFC. llvm-svn: 301059	2017-04-21 23:12:16 +00:00
Eric Christopher	908ed7f20c	Tidy checking for the soft float attribute. llvm-svn: 300394	2017-04-15 06:14:52 +00:00
Eric Christopher	85be8ca881	Cache the DataLayout rather than looking it up frequently. llvm-svn: 300393	2017-04-15 06:14:50 +00:00
Reid Kleckner	fb502d2f5e	[IR] Make paramHasAttr to use arg indices instead of attr indices This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern. Previously we were testing return value attributes with index 0, so I introduced hasReturnAttr() for that use case. llvm-svn: 300367	2017-04-14 20:19:02 +00:00
Chandler Carruth	927d8e610a	[IR] Redesign the case iterator in SwitchInst to actually be an iterator and to expose a handle to represent the actual case rather than having the iterator return a reference to itself. All of this allows the iterator to be used with common STL facilities, standard algorithms, etc. Doing this exposed some missing facilities in the iterator facade that I've fixed and required some work to the actual iterator to fully support the necessary API. Differential Revision: https://reviews.llvm.org/D31548 llvm-svn: 300032	2017-04-12 07:27:28 +00:00
Vassil Vassilev	e1f12fadc0	Remove unused functions. Remove static qualifier from functions in header files. NFC. llvm-svn: 299947	2017-04-11 14:55:32 +00:00
Dehao Chen	9907e9d860	Do not inline hot callsites for samplepgo in thinlto compile phase. Summary: Because SamplePGO passes will be invoked twice in ThinLTO build: once at compile phase, the other at backend. We want to make sure the IR at the 2nd phase matches the hot part in profile, thus we do not want to inline hot callsites in the first phase. Reviewers: tejohnson, eraman Reviewed By: tejohnson Subscribers: mehdi_amini, llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D31201 llvm-svn: 298428	2017-03-21 19:55:36 +00:00
Easwaran Raman	a8b9cdc9e2	[InlineCost] Move the code in isGEPOffsetConstant to a lambda. Differential revision: https://reviews.llvm.org/D30112 llvm-svn: 296208	2017-02-25 00:10:22 +00:00
Easwaran Raman	617f63640b	Refactor instruction simplification code in visitors. NFC. Several visitors check if operands to the instruction are constants, either as it is or after looking up SimplifiedValues, check if the result is a constant and update the SimplifiedValues map. This refactoring splits it into a common function that does the checking of whether the operands are constants and updating of the SimplifiedValues table, and an instruction specific part that is implemented by each instruction visitor as a lambda and passed to the common function. Differential revision: https://reviews.llvm.org/D30104 llvm-svn: 295552	2017-02-18 17:22:52 +00:00
Easwaran Raman	12585b0148	Improve PGO support for the new inliner This adds the following to the new PM based inliner in PGO mode: * Use block frequency analysis to derive callsite's profile count and use that to adjust thresholds of hot and cold callsites. * Incrementally update the BFI of the caller after a callee gets inlined into it. This incremental update is only within an invocation of the run method - BFI is not preserved across calls to run. Update the function entry count of the callee after inlining it into a caller. * I've tuned the thresholds for the hot and cold callsites using a hacked up version of the old inliner that explicitly computes BFI on a set of internal benchmarks and spec. Once the new PM based pipeline stabilizes (IIRC Chandler mentioned there are known issues) I'll benchmark this again and adjust the thresholds if required. Inliner PGO support. Differential revision: https://reviews.llvm.org/D28331 llvm-svn: 292666	2017-01-20 22:44:04 +00:00
Haicheng Wu	201b191b82	Recommit "[InlineCost] Use TTI to check if GEP is free." #3 This is the third attemp to recommit r292526. The original summary: Currently, a GEP is considered free only if its indices are all constant. TTI::getGEPCost() can give target-specific more accurate analysis. TTI is already used for the cost of many other instructions. llvm-svn: 292633	2017-01-20 18:51:22 +00:00
Haicheng Wu	71ef5bc0ff	Revert "Recommit "[InlineCost] Use TTI to check if GEP is free." #2" This reverts commit r292616 because the test case still has problem. llvm-svn: 292618	2017-01-20 16:52:22 +00:00
Haicheng Wu	8f34ae2aae	Recommit "[InlineCost] Use TTI to check if GEP is free." #2 This is the second attemp to recommit r292526. The original summary: Currently, a GEP is considered free only if its indices are all constant. TTI::getGEPCost() can give target-specific more accurate analysis. TTI is already used for the cost of many other instructions. llvm-svn: 292616	2017-01-20 16:36:34 +00:00
Haicheng Wu	8f2aca388b	Revert "Recommit "[InlineCost] Use TTI to check if GEP is free."" This reverts commit r292570. The test still has problem. llvm-svn: 292572	2017-01-20 03:40:41 +00:00
Haicheng Wu	1af1f071ea	Recommit "[InlineCost] Use TTI to check if GEP is free." This recommits r292526 which is reverted in r292529 after fixing the test case. The original summary: Currently, a GEP is considered free only if its indices are all constant. TTI::getGEPCost() can give target-specific more accurate analysis. TTI is already used for the cost of many other instructions. llvm-svn: 292570	2017-01-20 03:09:11 +00:00
Haicheng Wu	e036df4723	Revert "[InlineCost] Use TTI to check if GEP is free." This reverts commit r292526. The test case has problem. llvm-svn: 292529	2017-01-19 22:51:03 +00:00
Haicheng Wu	da556345dc	[InlineCost] Use TTI to check if GEP is free. Currently, a GEP is considered free only if its indices are all constant. TTI::getGEPCost() can give target-specific more accurate analysis. TTI is already used for the cost of many other instructions. Differential Revision: https://reviews.llvm.org/D28693 llvm-svn: 292526	2017-01-19 22:28:34 +00:00
Easwaran Raman	e08b139d7d	Refactor inline threshold update code. Functional change: Previously, if a callee is cold, we used ColdThreshold if it minimizes the existing threshold. This was irrespective of whether we were optimizing for minsize (-Oz) or not. But -Oz uses very low threshold to begin with and the inlining with -Oz is expected to be tuned for lowering code size, so there is no good reason to set an even lower threshold for cold callees. We now lower the threshold for cold callees only when -Oz is not used. For default values of -inlinethreshold and -inlinecold-threshold, this change has no effect and this simplifies the code. NFC changes: Group all threshold updates that are guarded by !Caller->optForMinSize() and within that group threshold updates that require profile summary info. Differential revision: https://reviews.llvm.org/D28369 llvm-svn: 291487	2017-01-09 21:56:26 +00:00
Chandler Carruth	1d96311447	[PM] Provide an initial, minimal port of the inliner to the new pass manager. This doesn't implement every feature of the existing inliner, but tries to implement the most important ones for building a functional optimization pipeline and beginning to sort out bugs, regressions, and other problems. Notable, but intentional omissions: - No alloca merging support. Why? Because it isn't clear we want to do this at all. Active discussion and investigation is going on to remove it, so for simplicity I omitted it. - No support for trying to iterate on "internally" devirtualized calls. Why? Because it adds what I suspect is inappropriate coupling for little or no benefit. We will have an outer iteration system that tracks devirtualization including that from function passes and iterates already. We should improve that rather than approximate it here. - Optimization remarks. Why? Purely to make the patch smaller, no other reason at all. The last one I'll probably work on almost immediately. But I wanted to skip it in the initial patch to try to focus the change as much as possible as there is already a lot of code moving around and both of these could be skipped without really disrupting the core logic. A summary of the different things happening here: 1) Adding the usual new PM class and rigging. 2) Fixing minor underlying assumptions in the inline cost analysis or inline logic that don't generally hold in the new PM world. 3) Adding the core pass logic which is in essence a loop over the calls in the nodes in the call graph. This is a bit duplicated from the old inliner, but only a handful of lines could realistically be shared. (I tried at first, and it really didn't help anything.) All told, this is only about 100 lines of code, and most of that is the mechanics of wiring up analyses from the new PM world. 4) Updating the LazyCallGraph (in the new PM) based on the newly inlined calls and references. This is very minimal because we cannot form cycles. 5) When inlining removes the last use of a function, eagerly nuking the body of the function so that any "one use remaining" inline cost heuristics are immediately refined, and queuing these functions to be completely deleted once inlining is complete and the call graph updated to reflect that they have become dead. 6) After all the inlining for a particular function, updating the LazyCallGraph and the CGSCC pass manager to reflect the function-local simplifications that are done immediately and internally by the inline utilties. These are the exact same fundamental set of CG updates done by arbitrary function passes. 7) Adding a bunch of test cases to specifically target CGSCC and other subtle aspects in the new PM world. Many thanks to the careful review from Easwaran and Sanjoy and others! Differential Revision: https://reviews.llvm.org/D24226 llvm-svn: 290161	2016-12-20 03:15:32 +00:00
Daniel Jasper	aec2fa352f	Revert @llvm.assume with operator bundles (r289755-r289757) This creates non-linear behavior in the inliner (see more details in r289755's commit thread). llvm-svn: 290086	2016-12-19 08:22:17 +00:00
Hal Finkel	3ca4a6bcf1	Remove the AssumptionCache After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756	2016-12-15 03:02:15 +00:00
Craig Topper	107b187d2a	[Analysis] Fix typo in comment. NFC llvm-svn: 289171	2016-12-09 02:18:04 +00:00
Peter Collingbourne	ab85225be4	IR: Change the gep_type_iterator API to avoid always exposing the "current" type. Instead, expose whether the current type is an array or a struct, if an array what the upper bound is, and if a struct the struct type itself. This is in preparation for a later change which will make PointerType derive from Type rather than SequentialType. Differential Revision: https://reviews.llvm.org/D26594 llvm-svn: 288458	2016-12-02 02:24:42 +00:00
James Molloy	6df8f27c95	[InlineCost] Remove skew when calculating call costs When calculating the cost of a call instruction we were applying a heuristic penalty as well as the cost of the instruction itself. However, when calculating the benefit from inlining we weren't discounting the equivalent penalty for the call instruction that would be removed! This caused skew in the calculation and meant we wouldn't inline in the following, trivial case: int g() { h(); } int f() { g(); } llvm-svn: 286814	2016-11-14 11:14:41 +00:00
Dehao Chen	84287abf43	Rename isHotFunction/isColdFunction to isFunctionEntryHot/isFunctionEntryCold. (NFC) This is in preparation for https://reviews.llvm.org/D25048 llvm-svn: 283805	2016-10-10 21:47:28 +00:00
Piotr Padlewski	f3d122cd02	NFC fix doxygen comments llvm-svn: 282950	2016-09-30 21:05:49 +00:00
Easwaran Raman	7060af9d22	Fix a thinko in r278189. llvm-svn: 280008	2016-08-29 20:45:51 +00:00
Easwaran Raman	0d58fcac99	Make more fields of InlineParams Optional. Differential revision: https://reviews.llvm.org/D23386 llvm-svn: 278312	2016-08-11 03:58:05 +00:00
Piotr Padlewski	d89875ca39	Changed sign of LastCallToStaticBouns Summary: I think it is much better this way. When I firstly saw line: Cost += InlineConstants::LastCallToStaticBonus; I though that this is a bug, because everywhere where the cost is being reduced it is usuing -=. Reviewers: eraman, tejohnson, mehdi_amini Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23222 llvm-svn: 278290	2016-08-10 21:15:22 +00:00
Easwaran Raman	1c57cc2b68	Do not directly use inline threshold cl options in cost analysis. This adds an InlineParams struct which is populated from the command line options by getInlineParams and passed to getInlineCost for the call analyzer to use. Differential revision: https://reviews.llvm.org/D22120 llvm-svn: 278189	2016-08-10 00:48:04 +00:00
Dehao Chen	e1c7c57d11	Remove cold callsite heuristic that is not necessary because of cold callee heuristic. llvm-svn: 277863	2016-08-05 20:49:04 +00:00
Dehao Chen	de39cb9384	Replace hot-callsite based heuristic to use its own threshold parameter instead of share inline-hint parameter Summary: Hot callsites should have higher threshold than inline hints. This patch uses separate threshold parameter for hot callsites. Reviewers: davidxl, eraman Subscribers: Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D22368 llvm-svn: 277860	2016-08-05 20:28:41 +00:00
Sean Silva	ab6a683765	Avoid using a raw AssumptionCacheTracker in various inliner functions. This unblocks the new PM part of River's patch in https://reviews.llvm.org/D22706 Conveniently, this same change was needed for D21921 and so these changes are just spun out from there. llvm-svn: 276515	2016-07-23 04:22:50 +00:00
Dehao Chen	9232f98279	Implement callsite-hotness based inline cost for Sample-based PGO Summary: For sample-based PGO, using BFI to calculate callsite count is sometime not accurate. This is because with sampling based approach, if a callsite resides in a hot loop deeply nested in a bunch of cold branches, the callsite's BFI frequency would be inaccurately calculated due to lack of samples in the cold branch. E.g. if (A1 && A2 && A3 && ..... && A10) { for (i=0; i < 100000000; i++) { callsite(); } } Assume that A1 to A100 are all 100% taken, and callsite has 1000 samples and thus is considerred hot. Because the loop's trip count is huge, it's normal that all branches outside the loop has no sample at all. As a result, we can only use static branch probability to derive the the frequency of the loop header. Assuming that static heuristic thinks each branch is 50% taken, then the count calculated from BFI will be 1/(2^10) of the actual value. In order to get more accurate callsite count, we directly annotate the weight on the call instruction, and directly use it when checking callsite hotness. Note that this mechanism can also be shared by instrumentation based callsite hotness analysis. The side benefit is that it breaks the dependency from Inliner to BFI as call count is embedded in the IR. Reviewers: davidxl, eraman, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D22118 llvm-svn: 275073	2016-07-11 16:48:54 +00:00
Easwaran Raman	22eb80a114	Fix size computation of array allocation in inline cost analysis Differential revision: http://reviews.llvm.org/D21690 llvm-svn: 273952	2016-06-27 22:31:53 +00:00
Easwaran Raman	71069cf67d	Use ProfileSummaryInfo in inline cost analysis. Instead of directly using MaxFunctionCount and function entry count to determine callee hotness, use the isHotFunction/isColdFunction methods provided by ProfileSummaryInfo. Differential revision: http://reviews.llvm.org/D21045 llvm-svn: 272321	2016-06-09 22:23:21 +00:00
Easwaran Raman	bb578ef0dd	Allow -inline-threshold to override default threshold. Before r257832, the threshold used by SimpleInliner was explicitly specified or generated from opt levels and passed to the base class Inliner's constructor. There, it was first overridden by explicitly specified -inline-threshold. The refactoring in r257832 did not preserve this behavior for all opt levels. This change brings back the original behavior. Differential Revision: http://reviews.llvm.org/D20452 llvm-svn: 270153	2016-05-19 23:02:09 +00:00
Easwaran Raman	9b792923d0	Revert r269131 llvm-svn: 269138	2016-05-10 23:26:04 +00:00
Easwaran Raman	7eccf4ee0e	Reapply r266477 and r266488 llvm-svn: 269131	2016-05-10 22:03:23 +00:00
Sanjay Patel	0f153424a9	[Inliner] don't assume that a Constant alloca size is a ConstantInt (PR27277) Differential Revision: http://reviews.llvm.org/D20077 llvm-svn: 268980	2016-05-09 21:51:53 +00:00
Chad Rosier	567556aa9c	[Inliner] Formatting. NFC. Patch by Aditya Kumar! Differential Revision: http://reviews.llvm.org/D19047 llvm-svn: 267888	2016-04-28 14:47:23 +00:00
Peter Collingbourne	7dd8dbf486	Introduce llvm.load.relative intrinsic. This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that value and returns it. The constant folder specifically recognizes the form of this intrinsic and the constant initializers it may load from; if a loaded constant initializer is known to have the form ``i32 trunc(x - %ptr)``, the intrinsic call is folded to ``x``. LLVM provides that the calculation of such a constant initializer will not overflow at link time under the medium code model if ``x`` is an ``unnamed_addr`` function. However, it does not provide this guarantee for a constant initializer folded into a function body. This intrinsic can be used to avoid the possibility of overflows when loading from such a constant. Differential Revision: http://reviews.llvm.org/D18367 llvm-svn: 267223	2016-04-22 21:18:02 +00:00
Eric Liu	d09f15ea6f	Revert "Replace the use of MaxFunctionCount module flag" This reverts commit r266477. This commit introduces cyclic dependency. This commit has "Analysis" depend on "ProfileData", while "ProfileData" depends on "Object", which depends on "BitCode", which depends on "Analysis". llvm-svn: 266619	2016-04-18 15:31:11 +00:00
Easwaran Raman	f53baca686	Replace the use of MaxFunctionCount module flag Adds an interface to get ProfileSummary for a module and makes InlineCost use ProfileSummary to get max function count. Differential Revision: http://reviews.llvm.org/D18622 llvm-svn: 266477	2016-04-15 21:39:58 +00:00
Justin Lebar	8650a4da93	[TTI] Add getInliningThresholdMultiplier. Summary: InlineCost's threshold is multiplied by this value. This lets us adjust the inlining threshold up or down on a per-target basis. For example, we might want to increase the threshold on targets where calls are unusually expensive. Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18560 llvm-svn: 266405	2016-04-15 01:38:48 +00:00
Easwaran Raman	d295b00ae9	Return immediately from analyzeCall if analyzeBlock returns false. This is part of the patch reviewed at http://reviews.llvm.org/D17584 llvm-svn: 266249	2016-04-13 21:20:22 +00:00
Easwaran Raman	9a3fc17ad4	Refactor Threshold computation. NFC. This is part of changes reviewed in http://reviews.llvm.org/D17584. llvm-svn: 265852	2016-04-08 21:28:02 +00:00

1 2 3 4

193 Commits