llvm-project

Commit Graph

Author	SHA1	Message	Date
Bill Wendling	f319e9905f	Have 'addFnAttr' take the attribute enum value. Then have it build the attribute object and add it appropriately. No functionality change. llvm-svn: 165595	2012-10-10 03:12:49 +00:00
Bill Wendling	c9b22d735a	Create enums for the different attributes. We use the enums to query whether an Attributes object has that attribute. The opaque layer is responsible for knowing where that specific attribute is stored. llvm-svn: 165488	2012-10-09 07:45:08 +00:00
Micah Villmow	cdfe20b97f	Move TargetData to DataLayout. llvm-svn: 165402	2012-10-08 16:38:25 +00:00
Bill Wendling	863bab689a	Remove the `hasFnAttr' method from Function. The hasFnAttr method has been replaced by querying the Attributes explicitly. No intended functionality change. llvm-svn: 164725	2012-09-26 21:48:26 +00:00
Nadav Rotem	97d44349c9	Fix an 80 char line limit. llvm-svn: 163808	2012-09-13 16:27:32 +00:00
Benjamin Kramer	8bcc971174	Make MemoryBuiltins aware of TargetLibraryInfo. This disables malloc-specific optimization when -fno-builtin (or -ffreestanding) is specified. This has been a problem for a long time but became more severe with the recent memory builtin improvements. Since the memory builtin functions are used everywhere, this required passing TLI in many places. This means that functions that now have an optional TLI argument, like RecursivelyDeleteTriviallyDeadFunctions, won't remove dead mallocs anymore if the TLI argument is missing. I've updated most passes to do the right thing. Fixes PR13694 and probably others. llvm-svn: 162841	2012-08-29 15:32:21 +00:00
Benjamin Kramer	bde9176663	Fix typos found by http://github.com/lyda/misspell-check llvm-svn: 157885	2012-06-02 10:20:22 +00:00
Patrik Hägglund	8a1e316c15	Fix the inliner so that the optsize function attribute don't alter the inline threshold if the global inline threshold is lower (as for -Oz). Reviewed by Chandler Carruth and Bill Wendling. llvm-svn: 157323	2012-05-23 13:42:57 +00:00
Chandler Carruth	7ae90d4d2d	Add two statistics to help track how we are computing the inline cost. Yea, 'NumCallerCallersAnalyzed' isn't a great name, suggestions welcome. llvm-svn: 154492	2012-04-11 10:15:10 +00:00
Chandler Carruth	45ae88f5fc	Belatedly address some code review from Chris. As a side note, I really dislike array_pod_sort... Do we really still care about any STL implementations that get this so wrong? Does libc++? llvm-svn: 153834	2012-04-01 10:41:24 +00:00
Chandler Carruth	edd2826f3e	Remove a bunch of empty, dead, and no-op methods from all of these interfaces. These methods were used in the old inline cost system where there was a persistent cache that had to be updated, invalidated, and cleared. We're now doing more direct computations that don't require this intricate dance. Even if we resume some level of caching, it would almost certainly have a simpler and more narrow interface than this. llvm-svn: 153813	2012-03-31 12:48:08 +00:00
Chandler Carruth	0539c071ea	Initial commit for the rewrite of the inline cost analysis to operate on a per-callsite walk of the called function's instructions, in breadth-first order over the potentially reachable set of basic blocks. This is a major shift in how inline cost analysis works to improve the accuracy and rationality of inlining decisions. A brief outline of the algorithm this moves to: - Build a simplification mapping based on the callsite arguments to the function arguments. - Push the entry block onto a worklist of potentially-live basic blocks. - Pop the first block off of the front of the worklist (for breadth-first ordering) and walk its instructions using a custom InstVisitor. - For each instruction's operands, re-map them based on the simplification mappings available for the given callsite. - Compute any simplification possible of the instruction after re-mapping, and store that back int othe simplification mapping. - Compute any bonuses, costs, or other impacts of the instruction on the cost metric. - When the terminator is reached, replace any conditional value in the terminator with any simplifications from the mapping we have, and add any successors which are not proven to be dead from these simplifications to the worklist. - Pop the next block off of the front of the worklist, and repeat. - As soon as the cost of inlining exceeds the threshold for the callsite, stop analyzing the function in order to bound cost. The primary goal of this algorithm is to perfectly handle dead code paths. We do not want any code in trivially dead code paths to impact inlining decisions. The previous metric was extremely flawed here, and would always subtract the average cost of two successors of a conditional branch when it was proven to become an unconditional branch at the callsite. There was no handling of wildly different costs between the two successors, which would cause inlining when the path actually taken was too large, and no inlining when the path actually taken was trivially simple. There was also no handling of the code path, only the immediate successors. These problems vanish completely now. See the added regression tests for the shiny new features -- we skip recursive function calls, SROA-killing instructions, and high cost complex CFG structures when dead at the callsite being analyzed. Switching to this algorithm required refactoring the inline cost interface to accept the actual threshold rather than simply returning a single cost. The resulting interface is pretty bad, and I'm planning to do lots of interface cleanup after this patch. Several other refactorings fell out of this, but I've tried to minimize them for this patch. =/ There is still more cleanup that can be done here. Please point out anything that you see in review. I've worked really hard to try to mirror at least the spirit of all of the previous heuristics in the new model. It's not clear that they are all correct any more, but I wanted to minimize the change in this single patch, it's already a bit ridiculous. One heuristic that is not yet mirrored is to allow inlining of functions with a dynamic alloca if the caller has a dynamic alloca. I will add this back, but I think the most reasonable way requires changes to the inliner itself rather than just the cost metric, and so I've deferred this for a subsequent patch. The test case is XFAIL-ed until then. As mentioned in the review mail, this seems to make Clang run about 1% to 2% faster in -O0, but makes its binary size grow by just under 4%. I've looked into the 4% growth, and it can be fixed, but requires changes to other parts of the inliner. llvm-svn: 153812	2012-03-31 12:42:41 +00:00
Chandler Carruth	b9e35fbc1e	Make a seemingly tiny change to the inliner and fix the generated code size bloat. Unfortunately, I expect this to disable the majority of the benefit from r152737. I'm hopeful at least that it will fix PR12345. To explain this requires... quite a bit of backstory I'm afraid. TL;DR: The change in r152737 actually did The Wrong Thing for linkonce-odr functions. This change makes it do the right thing. The benefits we saw were simple luck, not any actual strategy. Benchmark numbers after a mini-blog-post so that I've written down my thoughts on why all of this works and doesn't work... To understand what's going on here, you have to understand how the "bottom-up" inliner actually works. There are two fundamental modes to the inliner: 1) Standard fixed-cost bottom-up inlining. This is the mode we usually think about. It walks from the bottom of the CFG up to the top, looking at callsites, taking information about the callsite and the called function and computing th expected cost of inlining into that callsite. If the cost is under a fixed threshold, it inlines. It's a touch more complicated than that due to all the bonuses, weights, etc. Inlining the last callsite to an internal function gets higher weighth, etc. But essentially, this is the mode of operation. 2) Deferred bottom-up inlining (a term I just made up). This is the interesting mode for this patch an r152737. Initially, this works just like mode #1, but once we have the cost of inlining into the callsite, we don't just compare it with a fixed threshold. First, we check something else. Let's give some names to the entities at this point, or we'll end up hopelessly confused. We're considering inlining a function 'A' into its callsite within a function 'B'. We want to check whether 'B' has any callers, and whether it might be inlined into those callers. If so, we also check whether inlining 'A' into 'B' would block any of the opportunities for inlining 'B' into its callers. We take the sum of the costs of inlining 'B' into its callers where that inlining would be blocked by inlining 'A' into 'B', and if that cost is less than the cost of inlining 'A' into 'B', then we skip inlining 'A' into 'B'. Now, in order for #2 to make sense, we have to have some confidence that we will actually have the opportunity to inline 'B' into its callers when cheaper, and that we'll be able to revisit the decision and inline 'A' into 'B' if that ever becomes the correct tradeoff. This often isn't true for external functions -- we can see very few of their callers, and we won't be able to re-consider inlining 'A' into 'B' if 'B' is external when we finally see more callers of 'B'. There are two cases where we believe this to be true for C/C++ code: functions local to a translation unit, and functions with an inline definition in every translation unit which uses them. These are represented as internal linkage and linkonce-odr (resp.) in LLVM. I enabled this logic for linkonce-odr in r152737. Unfortunately, when I did that, I also introduced a subtle bug. There was an implicit assumption that the last caller of the function within the TU was the last caller of the function in the program. We want to bonus the last caller of the function in the program by a huge amount for inlining because inlining that callsite has very little cost. Unfortunately, the last caller in the TU of a linkonce-odr function is not the last caller in the program, and so we don't want to apply this bonus. If we do, we can apply it to one callsite per-TU. Because of the way deferred inlining works, when it sees this bonus applied to one callsite in the TU for 'B', it decides that inlining 'B' is of the utmost importance just so we can get that final bonus. It then proceeds to essentially force deferred inlining regardless of the actual cost tradeoff. The result? PR12345: code bloat, code bloat, code bloat. Another result is getting damn lucky on a few benchmarks, and the over-inlining exposing critically important optimizations. I would very much like a list of benchmarks that regress after this change goes in, with bitcode before and after. This will help me greatly understand what opportunities the current cost analysis is missing. Initial benchmark numbers look very good. WebKit files that exhibited the worst of PR12345 went from growing to shrinking compared to Clang with r152737 reverted. - Bootstrapped Clang is 3% smaller with this change. - Bootstrapped Clang -O0 over a single-source-file of lib/Lex is 4% faster with this change. Please let me know about any other performance impact you see. Thanks to Nico for reporting and urging me to actually fix, Richard Smith, Duncan Sands, Manuel Klimek, and Benjamin Kramer for talking through the issues today. llvm-svn: 153506	2012-03-27 10:48:28 +00:00
Chandler Carruth	2121199241	Move the instruction simplification of callsite arguments in the inliner to instead rely on much more generic and powerful instruction simplification in the function cloner (and thus inliner). This teaches the pruning function cloner to use instsimplify rather than just the constant folder to fold values during cloning. This can simplify a large number of things that constant folding alone cannot begin to touch. For example, it will realize that 'or' and 'and' instructions with certain constant operands actually become constants regardless of what their other operand is. It also can thread back through the caller to perform simplifications that are only possible by looking up a few levels. In particular, GEPs and pointer testing tend to fold much more heavily with this change. This should (in some cases) have a positive impact on compile times with optimizations on because the inliner itself will simply avoid cloning a great deal of code. It already attempted to prune proven-dead code, but now it will be use the stronger simplifications to prove more code dead. llvm-svn: 153403	2012-03-25 04:03:40 +00:00
Chandler Carruth	d7a5f2adb0	Start removing the use of an ad-hoc 'never inline' set and instead directly query the function information which this set was representing. This simplifies the interface of the inline cost analysis, and makes the always-inline pass significantly more efficient. Previously, always-inline would first make a single set of every function in the module except those marked with the always-inline attribute. It would then query this set at every call site to see if the function was a member of the set, and if so, refuse to inline it. This is quite wasteful. Instead, simply check the function attribute directly when looking at the callsite. The normal inliner also had similar redundancy. It added every function in the module with the noinline attribute to its set to ignore, even though inside the cost analysis function we already tested the noinline attribute and produced the same result. The only tricky part of removing this is that we have to be able to correctly remove only the functions inlined by the always-inline pass when finalizing, which requires a bit of a hack. Still, much less of a hack than the set of all non-always-inline functions was. While I was touching this function, I switched a heavy-weight set to a vector with sort+unique. The algorithm already had a two-phase insert and removal pattern, we were just needlessly paying the uniquing cost on every insert. This probably speeds up some compiles by a small amount (-O0 compiles with lots of always-inline, so potentially heavy libc++ users), but I've not tried to measure it. I believe there is no functional change here, but yell if you spot one. None are intended. Finally, the direction this is going in is to greatly simplify the inline cost query interface so that we can replace its implementation with a much more clever one. Along the way, all the APIs get simplified, so it seems incrementally good. llvm-svn: 152903	2012-03-16 06:10:13 +00:00
Chandler Carruth	30b8416d2c	Change where we enable the heuristic that delays inlining into functions which are small enough to themselves be inlined. Delaying in this manner can be harmful if the function is inelligible for inlining in some (or many) contexts as it pessimizes the code of the function itself in the event that inlining does not eventually happen. Previously the check was written to only do this delaying of inlining for static functions in the hope that they could be entirely deleted and in the knowledge that all callers of static functions will have the opportunity to inline if it is in fact profitable. However, with C++ we get two other important sources of functions where the definition is always available for inlining: inline functions and templated functions. This patch generalizes the inliner to allow linkonce-ODR (the linkage such C++ routines receive) to also qualify for this delay-based inlining. Benchmarking across a range of large real-world applications shows roughly 2% size increase across the board, but an average speedup of about 0.5%. Some benhcmarks improved over 2%, and the 'clang' binary itself (when bootstrapped with this feature) shows a 1% -O0 performance improvement when run over all Sema, Lex, and Parse source code smashed into a single file. A clean re-build of Clang+LLVM with a bootstrapped Clang shows approximately 2% improvement, but that measurement is often noisy. llvm-svn: 152737	2012-03-14 20:16:41 +00:00
Chandler Carruth	595fda8466	When inlining a function and adding its inner call sites to the candidate set for subsequent inlining, try to simplify the arguments to the inner call site now that inlining has been performed. The goal here is to propagate and fold constants through deeply nested call chains. Without doing this, we loose the inliner bonus that should be applied because the arguments don't match the exact pattern the cost estimator uses. Reviewed on IRC by Benjamin Kramer. llvm-svn: 152556	2012-03-12 11:19:33 +00:00
Chad Rosier	07d37bc1ed	Add support for disabling llvm.lifetime intrinsics in the AlwaysInliner. These are optimization hints, but at -O0 we're not optimizing. This becomes a problem when the alwaysinline attribute is abused. rdar://10921594 llvm-svn: 151429	2012-02-25 02:56:01 +00:00
Eli Friedman	1923a330e6	Refactor code from inlining and globalopt that checks whether a function definition is unused, and enhance it so it can tell that functions which are only used by a blockaddress are in fact dead. This probably doesn't happen much on most code, but the Linux kernel's _THIS_IP_ can trigger this issue with blockaddress. (GlobalDCE can also handle the given tescase, but we only run that at -O3.) Found while looking at PR11180. llvm-svn: 142572	2011-10-20 05:23:42 +00:00
Chris Lattner	229907cd11	land David Blaikie's patch to de-constify Type, with a few tweaks. llvm-svn: 135375	2011-07-18 04:54:35 +00:00
Jay Foad	1a180156b6	Remove unused STL header includes. llvm-svn: 130068	2011-04-23 19:53:52 +00:00
Dale Johannesen	a71d2cc88d	Improve the accuracy of the inlining heuristic looking for the case where a static caller is itself inlined everywhere else, and thus may go away if it doesn't get too big due to inlining other things into it. If there are references to the caller other than calls, it will not be removed; account for this. This results in same-day completion of the case in PR8853. llvm-svn: 122821	2011-01-04 19:01:54 +00:00
Chris Lattner	fb212de06d	Fix PR8735, a really terrible problem in the inliner's "alloca merging" optimization. Consider: static void foo() { A = alloca ... } static void bar() { B = alloca ... call foo(); } void main() { bar() } The inliner proceeds bottom up, but lets pretend it decides not to inline foo into bar. When it gets to main, it inlines bar into main(), and says "hey, I just inlined an alloca "B" into main, lets remember that. Then it keeps going and finds that it now contains a call to foo. It decides to inline foo into main, and says "hey, foo has an alloca A, and I have an alloca B from another inlined call site, lets reuse it". The problem with this of course, is that the lifetime of A and B are nested, not disjoint. Unfortunately I can't create a reasonable testcase for this: the one in the PR is both huge and extremely sensitive, because you minor tweaks end up causing foo to get inlined into bar too early. We already have tests for the basic alloca merging optimization and this does not break them. llvm-svn: 120995	2010-12-06 07:52:42 +00:00
Chris Lattner	5b6a865f2e	improve -debug output and comments a little. llvm-svn: 120993	2010-12-06 07:38:40 +00:00
Jakob Stoklund Olesen	31a7eb40c1	Let the -inline-threshold command line argument take precedence over the threshold given to createFunctionInliningPass(). Both opt -O3 and clang would silently ignore the -inline-threshold option. llvm-svn: 118117	2010-11-02 23:40:26 +00:00
Owen Anderson	a7aed18624	Reapply r110396, with fixes to appease the Linux buildbot gods. llvm-svn: 110460	2010-08-06 18:33:48 +00:00
Owen Anderson	bda59bd247	Revert r110396 to fix buildbots. llvm-svn: 110410	2010-08-06 00:23:35 +00:00
Owen Anderson	755aceb5d0	Don't use PassInfo* as a type identifier for passes. Instead, use the address of the static ID member as the sole unique type identifier. Clean up APIs related to this change. llvm-svn: 110396	2010-08-05 23:42:04 +00:00
Gabor Greif	62f0aac99d	simplify by using CallSite constructors; virtually eliminates CallSite::get from the tree llvm-svn: 109687	2010-07-28 22:50:26 +00:00
Eric Christopher	ea282034b6	Grammar. llvm-svn: 108252	2010-07-13 18:27:13 +00:00
Benjamin Kramer	5ac57e3440	Avoid swap when a copy suffices. llvm-svn: 105220	2010-05-31 12:50:41 +00:00
Chris Lattner	b49a622fe9	revert r102831. We already delete dead readonly calls in other places, killing a valid transformation is not the right answer. llvm-svn: 102850	2010-05-01 17:19:38 +00:00
Owen Anderson	550986ea90	Disable the call-deletion transformation introduced in r86975. Without halting analysis, it is illegal to delete a call to a read-only function. The correct solution is almost certainly to add a "must halt" attribute and only allow deletions in its presence. XFAIL the relevant testcase for now. llvm-svn: 102831	2010-05-01 08:34:28 +00:00
Chris Lattner	c2432b9d44	rename InlineInfo.DevirtualizedCalls -> InlinedCalls to reflect that it includes all inlined calls now, not just devirtualized ones. llvm-svn: 102824	2010-05-01 01:26:13 +00:00
Chris Lattner	e8262675a3	The inliner has traditionally not considered call sites that appear due to inlining a callee as candidates for futher inlining, but a recent patch made it do this if those call sites were indirect and became direct. Unfortunately, in bizarre cases (see testcase) doing this can cause us to infinitely inline mutually recursive functions into callers not in the cycle. Fix this by keeping track of the inline history from which callsite inline candidates got inlined from. This shouldn't affect any "real world" code, but is required for a follow on patch that is coming up next. llvm-svn: 102822	2010-05-01 01:05:10 +00:00
Chris Lattner	b34ffe36ae	remove #if 1's. llvm-svn: 102296	2010-04-25 04:43:02 +00:00
Chris Lattner	d3b361d1b6	enable my inliner change: add newly devirtualized call sites to the worklist, making them inline candidates. llvm-svn: 102213	2010-04-23 21:16:07 +00:00
Chris Lattner	c691de3b4e	switch InlineInfo.DevirtualizedCalls's list to be of WeakVH. This fixes a bug where calls inlined into an invoke would get changed into an invoke but the array would keep pointing to the (now dead) call. The improved inliner behavior is still disabled for now. llvm-svn: 102196	2010-04-23 18:37:01 +00:00
Chris Lattner	d8d898dbd3	disable my previous inliner patch, it appears to be busting self-host. llvm-svn: 102153	2010-04-23 00:41:03 +00:00
Chris Lattner	2eee5d3467	The inliner was choosing to not consider call sites that appear in the SCC as a result of inlining as candidates for inlining. Change this so that it does consider call sites that change from being indirect to being direct as a result of inlining. This allows it to completely "devirtualize" the testcase. llvm-svn: 102146	2010-04-22 23:37:35 +00:00
Chris Lattner	4ba01ec869	refactor the interface to InlineFunction so that most of the in/out arguments are handled with a new InlineFunctionInfo class. This makes it easier to extend InlineFunction to return more info in the future. llvm-svn: 102137	2010-04-22 23:07:58 +00:00
Chris Lattner	a5cdd5e6a2	make the inliner do less work for leaf functions. llvm-svn: 101846	2010-04-20 00:47:08 +00:00
Chris Lattner	4422d31b84	introduce a new CallGraphSCC class, and pass it around to CallGraphSCCPass's instead of passing around a std::vector<CallGraphNode*>. No functionality change, but now we have a much tidier interface. llvm-svn: 101558	2010-04-16 22:42:17 +00:00
Jakob Stoklund Olesen	b495cad7ca	Try to keep the cached inliner costs around for a bit longer for big functions. The Caller cost info would be reset everytime a callee was inlined. If the caller has lots of calls and there is some mutual recursion going on, the caller cost info could be calculated many times. This patch reduces inliner runtime from 240s to 0.5s for a function with 20000 small function calls. This is a more conservative version of r98089 that doesn't break the clang test CodeGenCXX/temp-order.cpp. That test relies on rather extreme inlining for constant folding. llvm-svn: 98099	2010-03-09 23:02:17 +00:00
Jakob Stoklund Olesen	4497475905	Revert r98089, it was breaking a clang test. llvm-svn: 98094	2010-03-09 22:43:37 +00:00
Jakob Stoklund Olesen	741dec43e4	Try to keep the cached inliner costs around for a bit longer for big functions. The Caller cost info would be reset everytime a callee was inlined. If the caller has lots of calls and there is some mutual recursion going on, the caller cost info could be calculated many times. This patch reduces inliner runtime from 240s to 0.5s for a function with 20000 small function calls. llvm-svn: 98089	2010-03-09 22:17:11 +00:00
Jakob Stoklund Olesen	d62c2f554c	Add inlining threshold to log output. llvm-svn: 98024	2010-03-09 00:59:53 +00:00
Jakob Stoklund Olesen	492b8b42cd	Enable the inlinehint attribute in the Inliner. Functions explicitly marked inline will get an inlining threshold slightly more aggressive than the default for -O3. This means than -O3 builds are mostly unaffected while -Os builds will be a bit bigger and faster. The difference depends entirely on how many 'inline's are sprinkled on the source. In the CINT2006 suite, only these tests are significantly affected under -Os: Size Time 471.omnetpp +1.63% -1.85% 473.astar +4.01% -6.02% 483.xalancbmk +4.60% 0.00% Note that 483.xalancbmk runs too quickly to give useful timing results. llvm-svn: 96066	2010-02-13 01:51:53 +00:00
Jakob Stoklund Olesen	74bb06c0f0	Reintroduce the InlineHint function attribute. This time it's for real! I am going to hook this up in the frontends as well. The inliner has some experimental heuristics for dealing with the inline hint. When given a -respect-inlinehint option, functions marked with the inline keyword are given a threshold just above the default for -O3. We need some experiments to determine if that is the right thing to do. llvm-svn: 95466	2010-02-06 01:16:28 +00:00
Jakob Stoklund Olesen	113fb54bcb	Increase inliner thresholds by 25. This makes the inliner about as agressive as it was before my changes to the inliner cost calculations. These levels give the same performance and slightly smaller code than before. llvm-svn: 95320	2010-02-04 18:48:20 +00:00
Jakob Stoklund Olesen	8a19d3c96c	Move per-function inline threshold calculation to a method. No functional change except the forgotten test for InlineLimit.getNumOccurrences() == 0 in the CurrentThreshold2 calculation. llvm-svn: 94007	2010-01-20 17:51:28 +00:00
David Greene	0122fc495d	Change errs() to dbgs(). llvm-svn: 92625	2010-01-05 01:27:51 +00:00
Chris Lattner	5c89f4b4ef	use isInstructionTriviallyDead, as pointed out by Duncan llvm-svn: 87035	2009-11-12 21:58:18 +00:00
Chris Lattner	eb9acbfb05	implement a nice little efficiency hack in the inliner. Since we're now running IPSCCP early, and we run functionattrs interlaced with the inliner, we often (particularly for small or noop functions) completely propagate all of the information about a call to its call site in IPSSCP (making a call dead) and functionattrs is smart enough to realize that the function is readonly (because it is interlaced with inliner). To improve compile time and make the inliner threshold more accurate, realize that we don't have to inline dead readonly function calls. Instead, just delete the call. This happens all the time for C++ codes, here are some counters from opt/llvm-ld counting the number of times calls were deleted vs inlined on various apps: Tramp3d opt: 5033 inline - Number of call sites deleted, not inlined 24596 inline - Number of functions inlined llvm-ld: 667 inline - Number of functions deleted because all callers found 699 inline - Number of functions inlined 483.xalancbmk opt: 8096 inline - Number of call sites deleted, not inlined 62528 inline - Number of functions inlined llvm-ld: 217 inline - Number of allocas merged together 2158 inline - Number of functions inlined 471.omnetpp: 331 inline - Number of call sites deleted, not inlined 8981 inline - Number of functions inlined llvm-ld: 171 inline - Number of functions deleted because all callers found 629 inline - Number of functions inlined Deleting a call is much faster than inlining it, and is insensitive to the size of the callee. :) llvm-svn: 86975	2009-11-12 07:56:08 +00:00
Dan Gohman	4552e3cd73	Move the InlineCost code from Transforms/Utils to Analysis. llvm-svn: 83998	2009-10-13 18:30:07 +00:00
Dale Johannesen	96a5b87ae2	Use names instead of numbers for some of the magic constants used in inlining heuristics (especially those used in more than one file). No functional change. llvm-svn: 83675	2009-10-09 21:42:02 +00:00
Dale Johannesen	3059924bdd	When considering whether to inline Callee into Caller, and that will make Caller too big to inline, see if it might be better to inline Caller into its callers instead. This situation is described in PR 2973, although I haven't tried the specific case in SPASS. llvm-svn: 83602	2009-10-09 00:11:32 +00:00
Evan Cheng	bb4ed2394b	Allow -inline-threshold override default threshold even if compiling to optimize for size. llvm-svn: 83274	2009-10-04 06:13:54 +00:00
Chris Lattner	9e50747958	comment and simplify some code. llvm-svn: 80540	2009-08-31 05:34:32 +00:00
Chris Lattner	081375bb08	Fix PR4834, a tricky case where the inliner would resolve an indirect function pointer, inline it, then go to delete the body. The problem is that the callgraph had other references to the function, though the inliner had no way to know it, so we got a dangling pointer and an invalid iterator out of the deal. The fix to this is pretty simple: stop the inliner from deleting the function by knowing that there are references to it. Do this by making CallGraphNodes contain a refcount. This requires moving deletion of available_externally functions to the module-level cleanup sweep where it belongs. llvm-svn: 80533	2009-08-31 03:15:49 +00:00
Chris Lattner	305b115a87	Fix some nasty callgraph dangling pointer problems in argpromotion and structretpromote. Basically, when replacing a function, they used the 'changeFunction' api which changes the entry in the function map (and steals/reuses the callgraph node). This has some interesting effects: first, the problem is that it doesn't update the "callee" edges in any callees of the function in the call graph. Second, this covers for a major problem in all the CGSCC pass stuff, which is that it is completely broken when functions are deleted if they don't reuse a CGN. (there is a cute little fixme about this though :). This patch changes the protocol that CGSCC passes must obey: now the CGSCC pass manager copies the SCC and preincrements its iterator to avoid passes invalidating it. This allows CGSCC passes to mutate the current SCC. However multiple passes may be run on that SCC, so if passes do this, they are now required to update the SCC to be current when they return. Other less interesting parts of this patch are that it makes passes update the CG more directly, eliminates changeFunction, and requires clients of replaceCallSite to specify the new callee CGN if they are changing it. llvm-svn: 80527	2009-08-31 00:19:58 +00:00
Chris Lattner	0e8901803c	finish a half formed thought :) llvm-svn: 80334	2009-08-28 04:48:54 +00:00
Chris Lattner	d3374e8dfd	Implement a new optimization in the inliner: if inlining multiple calls into a function and if the calls bring in arrays, try to merge them together to reduce stack size. For example, in the testcase we'd previously end up with 4 allocas, now we end up with 2 allocas. As described in the comments, this is not really the ideal solution to this problem, but it is surprisingly effective. For example, on 176.gcc, we end up eliminating 67 arrays at "gccas" time and another 24 at "llvm-ld" time. One piece of concern that I didn't look into: at -O0 -g with forced inlining this will almost certainly result in worse debug info. I think this is acceptable though given that this is a case of "debugging optimized code", and we don't want debug info to prevent the optimizer from doing things anyway. llvm-svn: 80215	2009-08-27 06:29:33 +00:00
Chris Lattner	b9d0a961f9	reduce header #include'age llvm-svn: 80204	2009-08-27 04:32:07 +00:00
Chris Lattner	5eef6ad6a9	reduce inlining factor some stuff out to a static helper function, and other code cleanups. No functionality change. llvm-svn: 80199	2009-08-27 03:51:50 +00:00
Dale Johannesen	c221a55f58	Allow multiple occurrences of -inline-threshold on the command line. This gives llvm-gcc developers a way to control inlining (documented as "not intended for end users"). llvm-svn: 79966	2009-08-25 01:13:58 +00:00
Bill Wendling	2602bb4cdc	- Convert the rest of the DOUTs to DEBUG+errs(). - One formatting change. No intended functionality change. llvm-svn: 77717	2009-07-31 19:52:24 +00:00
Daniel Dunbar	0dd5e1ed39	More migration to raw_ostream, the water has dried up around the iostream hole. - Some clients which used DOUT have moved to DEBUG. We are deprecating the "magic" DOUT behavior which avoided calling printing functions when the statement was disabled. In addition to being unnecessary magic, it had the downside of leaving code in -Asserts builds, and of hiding potentially unnecessary computations. llvm-svn: 77019	2009-07-25 00:23:56 +00:00
Dan Gohman	67243a4bec	Convert several more passes to use getAnalysisIfAvailable<TargetData>() instead of getAnalysis<TargetData>(). llvm-svn: 76982	2009-07-24 18:13:53 +00:00
Eli Friedman	f13b36ddc5	Add line breaks to make the debug output a bit more readable. llvm-svn: 76284	2009-07-18 05:12:58 +00:00
Torok Edwin	7996339dd8	available_externall linkage is not local, this was confusing the codegenerator, and it wasn't generating calls through @PLT for these functions. hasLocalLinkage() is now false for available_externally, I attempted to fix the inliner and dce to handle available_externally properly. It passed make check. llvm-svn: 72328	2009-05-23 14:06:57 +00:00
Dale Johannesen	32dfb35281	Use a SmallPtrSet instead of std::set. llvm-svn: 67578	2009-03-23 23:39:20 +00:00
Dale Johannesen	2050968df9	Clear the cached cost when removing a function in the inliner; prevents nondeterministic behavior when the same address is reallocated. Don't build call graph nodes for debug intrinsic calls; they're useless, and there were typically a lot of them. llvm-svn: 67311	2009-03-19 18:03:56 +00:00
Rafael Espindola	6de96a1b5d	Add the private linkage. llvm-svn: 62279	2009-01-15 20:18:42 +00:00
Dale Johannesen	433a9086c0	Enable recursive inlining. Reduce inlining threshold back to 200; 400 seems to be too high, loses more than it gains. llvm-svn: 62107	2009-01-12 22:11:50 +00:00
Dale Johannesen	f84685290a	Increase default inlining aggressiveness in partial compensation for turning off gcc's inliner. This gets us closer to the amount of inlining we were getting before. It is not a win on everything, of course, but seems to gain overall. llvm-svn: 62058	2009-01-11 23:11:00 +00:00
Dale Johannesen	4755d9df78	Adjustments to last patch based on review. llvm-svn: 61969	2009-01-09 01:30:11 +00:00
Bill Wendling	f5260d29c2	Fix error where it wasn't getting the correct caller function. llvm-svn: 59758	2008-11-21 00:09:21 +00:00
Bill Wendling	26c6a3e736	If the function being inlined has a higher stack protection level than the inlining function, then increase the stack protection level on the inlining function. llvm-svn: 59757	2008-11-21 00:06:32 +00:00
Devang Patel	f0ef35738c	Do now allow InlineAlways pass to remove dead functions. llvm-svn: 58744	2008-11-05 01:39:16 +00:00
Daniel Dunbar	3933e66a89	Add InlineCost class for represent the estimated cost of inlining a function. - This explicitly models the costs for functions which should "always" or "never" be inlined. This fixes bugs where such costs were not previously respected. llvm-svn: 58450	2008-10-30 19:26:59 +00:00
Daniel Dunbar	e7fbf9f425	Factor shouldInline method out of Inliner. - No functionality change. llvm-svn: 58355	2008-10-29 01:02:02 +00:00
Devang Patel	9eb525d4f9	Implement function notes as function attributes. llvm-svn: 56716	2008-09-26 23:51:19 +00:00
Devang Patel	4c758ea3e0	Large mechanical patch. s/ParamAttr/Attribute/g s/PAList/AttrList/g s/FnAttributeWithIndex/AttributeWithIndex/g s/FnAttr/Attribute/g This sets the stage - to implement function notes as function attributes and - to distinguish between function attributes and return value attributes. This requires corresponding changes in llvm-gcc and clang. llvm-svn: 56622	2008-09-25 21:00:45 +00:00
Devang Patel	e15607b7bb	Put FN_NOTE_AlwaysInline and others in FnAttr namespace. llvm-svn: 56527	2008-09-24 00:06:15 +00:00
Devang Patel	e87abd26ba	Move FN_NOTE_AlwaysInline and other out of ParamAttrs namespace. Do not check isDeclaration() in hasNote(). It is clients' responsibility. llvm-svn: 56524	2008-09-23 23:52:03 +00:00
Devang Patel	82fed6702b	Use parameter attribute store (soon to be renamed) for Function Notes also. Function notes are stored at index ~0. llvm-svn: 56511	2008-09-23 22:35:17 +00:00
Devang Patel	329fe728b5	Add hasNote() to check note associated with a function. llvm-svn: 56477	2008-09-22 22:32:29 +00:00
Duncan Sands	3a52056d4d	Use removeAllCalledFunctions rather than removing edges one by one by hand. llvm-svn: 55836	2008-09-05 14:56:53 +00:00
Dan Gohman	a79db30d28	Tidy up several unbeseeming casts from pointer to intptr_t. llvm-svn: 55779	2008-09-04 17:05:41 +00:00
Devang Patel	a26e2075b8	Update inline threshold for current function if the notes say, optimize for size. llvm-svn: 55745	2008-09-03 23:06:09 +00:00
Devang Patel	0d442ffa2b	Handle "always inline" note during inline cost analysis. llvm-svn: 55712	2008-09-03 18:47:45 +00:00
Devang Patel	62be9ad270	Handle "noinline" note inside the simple inliner. llvm-svn: 55708	2008-09-03 18:10:21 +00:00
Devang Patel	7e59270272	s/FP_AlwaysInline/FN_NOTE_AlwaysInline/g llvm-svn: 55676	2008-09-02 22:43:57 +00:00
Devang Patel	bfa535af9f	respect inline=never and inline=always notes. llvm-svn: 55673	2008-09-02 22:16:13 +00:00
Dan Gohman	d78c400b5b	Clean up the use of static and anonymous namespaces. This turned up several things that were neither in an anonymous namespace nor static but not intended to be global. llvm-svn: 51017	2008-05-13 00:00:25 +00:00
Dan Gohman	6a2da37c0e	Make several variable declarations static. llvm-svn: 50696	2008-05-06 01:53:16 +00:00
Evan Cheng	ac38d444e2	1. Drop default inline threshold back down to 200. 2. Do not use # of basic blocks as part of the cost computation since it doesn't really figure into function size. 3. More aggressively inline function with vector code. llvm-svn: 49061	2008-04-01 23:59:29 +00:00
Evan Cheng	3471ae8c5d	Increasing the inline limit from (overly conservative) 200 to 300. Given each BB costs 20 and each instruction costs 5, 200 means a 4 BB function + 24 instructions (actually less because caller's size also contributes to it). Furthermore, double the limit when more than 10% of the callee instructions are vector instructions. Multimedia kernels tend to love inlining. llvm-svn: 48725	2008-03-24 06:37:48 +00:00
Chris Lattner	a683edb2d8	allow specified inline threshold to be negative, as the value is itself sometimes negative. llvm-svn: 47786	2008-03-01 08:09:51 +00:00

1 2 3 4

195 Commits