Commit Graph

181 Commits

Author SHA1 Message Date
Easwaran Raman 12585b0148 Improve PGO support for the new inliner
This adds the following to the new PM based inliner in PGO mode:

* Use block frequency analysis to derive callsite's profile count and use
that to adjust thresholds of hot and cold callsites.

* Incrementally update the BFI of the caller after a callee gets inlined
into it. This incremental update is only within an invocation of the run
method - BFI is not preserved across calls to run.
Update the function entry count of the callee after inlining it into a
caller.

* I've tuned the thresholds for the hot and cold callsites using a hacked
up version of the old inliner that explicitly computes BFI on a set of
internal benchmarks and spec. Once the new PM based pipeline stabilizes
(IIRC Chandler mentioned there are known issues) I'll benchmark this
again and adjust the thresholds if required.
Inliner PGO support.

Differential revision: https://reviews.llvm.org/D28331

llvm-svn: 292666
2017-01-20 22:44:04 +00:00
Haicheng Wu 201b191b82 Recommit "[InlineCost] Use TTI to check if GEP is free." #3
This is the third attemp to recommit r292526.

The original summary:

Currently, a GEP is considered free only if its indices are all constant.
TTI::getGEPCost() can give target-specific more accurate analysis. TTI is
already used for the cost of many other instructions.

llvm-svn: 292633
2017-01-20 18:51:22 +00:00
Haicheng Wu 71ef5bc0ff Revert "Recommit "[InlineCost] Use TTI to check if GEP is free." #2"
This reverts commit r292616 because the test case still has problem.

llvm-svn: 292618
2017-01-20 16:52:22 +00:00
Haicheng Wu 8f34ae2aae Recommit "[InlineCost] Use TTI to check if GEP is free." #2
This is the second attemp to recommit r292526.

The original summary:

Currently, a GEP is considered free only if its indices are all constant.
TTI::getGEPCost() can give target-specific more accurate analysis. TTI is
already used for the cost of many other instructions.

llvm-svn: 292616
2017-01-20 16:36:34 +00:00
Haicheng Wu 8f2aca388b Revert "Recommit "[InlineCost] Use TTI to check if GEP is free.""
This reverts commit r292570.  The test still has problem.

llvm-svn: 292572
2017-01-20 03:40:41 +00:00
Haicheng Wu 1af1f071ea Recommit "[InlineCost] Use TTI to check if GEP is free."
This recommits r292526 which is reverted in r292529 after fixing the test case.

The original summary:

Currently, a GEP is considered free only if its indices are all constant.
TTI::getGEPCost() can give target-specific more accurate analysis. TTI is
already used for the cost of many other instructions.

llvm-svn: 292570
2017-01-20 03:09:11 +00:00
Haicheng Wu e036df4723 Revert "[InlineCost] Use TTI to check if GEP is free."
This reverts commit r292526.  The test case has problem.

llvm-svn: 292529
2017-01-19 22:51:03 +00:00
Haicheng Wu da556345dc [InlineCost] Use TTI to check if GEP is free.
Currently, a GEP is considered free only if its indices are all constant.
TTI::getGEPCost() can give target-specific more accurate analysis. TTI is
already used for the cost of many other instructions.

Differential Revision: https://reviews.llvm.org/D28693

llvm-svn: 292526
2017-01-19 22:28:34 +00:00
Easwaran Raman e08b139d7d Refactor inline threshold update code.
Functional change: Previously, if a callee is cold, we used ColdThreshold if it minimizes the existing threshold. This was irrespective of whether we were optimizing for minsize (-Oz) or not. But -Oz uses very low threshold to begin with and the inlining with -Oz is expected to be tuned for lowering code size, so there is no good reason to set an even lower threshold for cold callees. We now lower the threshold for cold callees only when -Oz is not used. For default values of -inlinethreshold and -inlinecold-threshold, this change has no effect and this simplifies the code.

NFC changes: Group all threshold updates that are guarded by !Caller->optForMinSize() and within that group threshold updates that require profile summary info.

Differential revision: https://reviews.llvm.org/D28369

llvm-svn: 291487
2017-01-09 21:56:26 +00:00
Chandler Carruth 1d96311447 [PM] Provide an initial, minimal port of the inliner to the new pass manager.
This doesn't implement *every* feature of the existing inliner, but
tries to implement the most important ones for building a functional
optimization pipeline and beginning to sort out bugs, regressions, and
other problems.

Notable, but intentional omissions:
- No alloca merging support. Why? Because it isn't clear we want to do
  this at all. Active discussion and investigation is going on to remove
  it, so for simplicity I omitted it.
- No support for trying to iterate on "internally" devirtualized calls.
  Why? Because it adds what I suspect is inappropriate coupling for
  little or no benefit. We will have an outer iteration system that
  tracks devirtualization including that from function passes and
  iterates already. We should improve that rather than approximate it
  here.
- Optimization remarks. Why? Purely to make the patch smaller, no other
  reason at all.

The last one I'll probably work on almost immediately. But I wanted to
skip it in the initial patch to try to focus the change as much as
possible as there is already a lot of code moving around and both of
these *could* be skipped without really disrupting the core logic.

A summary of the different things happening here:

1) Adding the usual new PM class and rigging.

2) Fixing minor underlying assumptions in the inline cost analysis or
   inline logic that don't generally hold in the new PM world.

3) Adding the core pass logic which is in essence a loop over the calls
   in the nodes in the call graph. This is a bit duplicated from the old
   inliner, but only a handful of lines could realistically be shared.
   (I tried at first, and it really didn't help anything.) All told,
   this is only about 100 lines of code, and most of that is the
   mechanics of wiring up analyses from the new PM world.

4) Updating the LazyCallGraph (in the new PM) based on the *newly
   inlined* calls and references. This is very minimal because we cannot
   form cycles.

5) When inlining removes the last use of a function, eagerly nuking the
   body of the function so that any "one use remaining" inline cost
   heuristics are immediately refined, and queuing these functions to be
   completely deleted once inlining is complete and the call graph
   updated to reflect that they have become dead.

6) After all the inlining for a particular function, updating the
   LazyCallGraph and the CGSCC pass manager to reflect the
   function-local simplifications that are done immediately and
   internally by the inline utilties. These are the exact same
   fundamental set of CG updates done by arbitrary function passes.

7) Adding a bunch of test cases to specifically target CGSCC and other
   subtle aspects in the new PM world.

Many thanks to the careful review from Easwaran and Sanjoy and others!

Differential Revision: https://reviews.llvm.org/D24226

llvm-svn: 290161
2016-12-20 03:15:32 +00:00
Daniel Jasper aec2fa352f Revert @llvm.assume with operator bundles (r289755-r289757)
This creates non-linear behavior in the inliner (see more details in
r289755's commit thread).

llvm-svn: 290086
2016-12-19 08:22:17 +00:00
Hal Finkel 3ca4a6bcf1 Remove the AssumptionCache
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...

llvm-svn: 289756
2016-12-15 03:02:15 +00:00
Craig Topper 107b187d2a [Analysis] Fix typo in comment. NFC
llvm-svn: 289171
2016-12-09 02:18:04 +00:00
Peter Collingbourne ab85225be4 IR: Change the gep_type_iterator API to avoid always exposing the "current" type.
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.

Differential Revision: https://reviews.llvm.org/D26594

llvm-svn: 288458
2016-12-02 02:24:42 +00:00
James Molloy 6df8f27c95 [InlineCost] Remove skew when calculating call costs
When calculating the cost of a call instruction we were applying a heuristic penalty as well as the cost of the instruction itself.

However, when calculating the benefit from inlining we weren't discounting the equivalent penalty for the call instruction that would be removed! This caused skew in the calculation and meant we wouldn't inline in the following, trivial case:

  int g() {
    h();
  }
  int f() {
    g();
  }

llvm-svn: 286814
2016-11-14 11:14:41 +00:00
Dehao Chen 84287abf43 Rename isHotFunction/isColdFunction to isFunctionEntryHot/isFunctionEntryCold. (NFC)
This is in preparation for https://reviews.llvm.org/D25048

llvm-svn: 283805
2016-10-10 21:47:28 +00:00
Piotr Padlewski f3d122cd02 NFC fix doxygen comments
llvm-svn: 282950
2016-09-30 21:05:49 +00:00
Easwaran Raman 7060af9d22 Fix a thinko in r278189.
llvm-svn: 280008
2016-08-29 20:45:51 +00:00
Easwaran Raman 0d58fcac99 Make more fields of InlineParams Optional.
Differential revision: https://reviews.llvm.org/D23386

llvm-svn: 278312
2016-08-11 03:58:05 +00:00
Piotr Padlewski d89875ca39 Changed sign of LastCallToStaticBouns
Summary:
I think it is much better this way.
When I firstly saw line:
  Cost += InlineConstants::LastCallToStaticBonus;
I though that this is a bug, because everywhere where the cost is being reduced
it is usuing -=.

Reviewers: eraman, tejohnson, mehdi_amini

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23222

llvm-svn: 278290
2016-08-10 21:15:22 +00:00
Easwaran Raman 1c57cc2b68 Do not directly use inline threshold cl options in cost analysis.
This adds an InlineParams struct which is populated from the command line options by getInlineParams and passed to getInlineCost for the call analyzer to use.

Differential revision: https://reviews.llvm.org/D22120

llvm-svn: 278189
2016-08-10 00:48:04 +00:00
Dehao Chen e1c7c57d11 Remove cold callsite heuristic that is not necessary because of cold callee heuristic.
llvm-svn: 277863
2016-08-05 20:49:04 +00:00
Dehao Chen de39cb9384 Replace hot-callsite based heuristic to use its own threshold parameter instead of share inline-hint parameter
Summary: Hot callsites should have higher threshold than inline hints. This patch uses separate threshold parameter for hot callsites.

Reviewers: davidxl, eraman

Subscribers: Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D22368

llvm-svn: 277860
2016-08-05 20:28:41 +00:00
Sean Silva ab6a683765 Avoid using a raw AssumptionCacheTracker in various inliner functions.
This unblocks the new PM part of River's patch in
https://reviews.llvm.org/D22706

Conveniently, this same change was needed for D21921 and so these
changes are just spun out from there.

llvm-svn: 276515
2016-07-23 04:22:50 +00:00
Dehao Chen 9232f98279 Implement callsite-hotness based inline cost for Sample-based PGO
Summary:
For sample-based PGO, using BFI to calculate callsite count is sometime not accurate. This is because with sampling based approach, if a callsite resides in a hot loop deeply nested in a bunch of cold branches, the callsite's BFI frequency would be inaccurately calculated due to lack of samples in the cold branch.

E.g.

if (A1 && A2 && A3 && ..... && A10) {
  for (i=0; i < 100000000; i++) {
    callsite();
  }
}

Assume that A1 to A100 are all 100% taken, and callsite has 1000 samples and thus is considerred hot. Because the loop's trip count is huge, it's normal that all branches outside the loop has no sample at all. As a result, we can only use static branch probability to derive the the frequency of the loop header. Assuming that static heuristic thinks each branch is 50% taken, then the count calculated from BFI will be 1/(2^10) of the actual value.

In order to get more accurate callsite count, we directly annotate the weight on the call instruction, and directly use it when checking callsite hotness.

Note that this mechanism can also be shared by instrumentation based callsite hotness analysis. The side benefit is that it breaks the dependency from Inliner to BFI as call count is embedded in the IR.

Reviewers: davidxl, eraman, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22118

llvm-svn: 275073
2016-07-11 16:48:54 +00:00
Easwaran Raman 22eb80a114 Fix size computation of array allocation in inline cost analysis
Differential revision: http://reviews.llvm.org/D21690

llvm-svn: 273952
2016-06-27 22:31:53 +00:00
Easwaran Raman 71069cf67d Use ProfileSummaryInfo in inline cost analysis.
Instead of directly using MaxFunctionCount and function entry count to determine callee hotness, use the isHotFunction/isColdFunction methods provided by ProfileSummaryInfo.

Differential revision: http://reviews.llvm.org/D21045

llvm-svn: 272321
2016-06-09 22:23:21 +00:00
Easwaran Raman bb578ef0dd Allow -inline-threshold to override default threshold.
Before r257832, the threshold used by SimpleInliner was explicitly specified or generated from opt levels and passed to the base class Inliner's constructor. There, it was first overridden by explicitly specified -inline-threshold. The refactoring in r257832 did not preserve this behavior for all opt levels. This change brings back the original behavior.

Differential Revision: http://reviews.llvm.org/D20452

llvm-svn: 270153
2016-05-19 23:02:09 +00:00
Easwaran Raman 9b792923d0 Revert r269131
llvm-svn: 269138
2016-05-10 23:26:04 +00:00
Easwaran Raman 7eccf4ee0e Reapply r266477 and r266488
llvm-svn: 269131
2016-05-10 22:03:23 +00:00
Sanjay Patel 0f153424a9 [Inliner] don't assume that a Constant alloca size is a ConstantInt (PR27277)
Differential Revision: http://reviews.llvm.org/D20077

llvm-svn: 268980
2016-05-09 21:51:53 +00:00
Chad Rosier 567556aa9c [Inliner] Formatting. NFC.
Patch by Aditya Kumar!
Differential Revision: http://reviews.llvm.org/D19047

llvm-svn: 267888
2016-04-28 14:47:23 +00:00
Peter Collingbourne 7dd8dbf486 Introduce llvm.load.relative intrinsic.
This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads
a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that
value and returns it. The constant folder specifically recognizes the form of
this intrinsic and the constant initializers it may load from; if a loaded
constant initializer is known to have the form ``i32 trunc(x - %ptr)``,
the intrinsic call is folded to ``x``.

LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if ``x`` is an
``unnamed_addr`` function. However, it does not provide this guarantee for
a constant initializer folded into a function body. This intrinsic can be
used to avoid the possibility of overflows when loading from such a constant.

Differential Revision: http://reviews.llvm.org/D18367

llvm-svn: 267223
2016-04-22 21:18:02 +00:00
Eric Liu d09f15ea6f Revert "Replace the use of MaxFunctionCount module flag"
This reverts commit r266477.

This commit introduces cyclic dependency. This commit has "Analysis" depend on "ProfileData",
while "ProfileData" depends on "Object", which depends on "BitCode", which
depends on "Analysis".

llvm-svn: 266619
2016-04-18 15:31:11 +00:00
Easwaran Raman f53baca686 Replace the use of MaxFunctionCount module flag
Adds an interface to get ProfileSummary for a module and makes InlineCost use ProfileSummary to get max function count.

Differential Revision: http://reviews.llvm.org/D18622

llvm-svn: 266477
2016-04-15 21:39:58 +00:00
Justin Lebar 8650a4da93 [TTI] Add getInliningThresholdMultiplier.
Summary:
InlineCost's threshold is multiplied by this value.  This lets us adjust
the inlining threshold up or down on a per-target basis.  For example,
we might want to increase the threshold on targets where calls are
unusually expensive.

Reviewers: chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18560

llvm-svn: 266405
2016-04-15 01:38:48 +00:00
Easwaran Raman d295b00ae9 Return immediately from analyzeCall if analyzeBlock returns false.
This is part of the patch reviewed at http://reviews.llvm.org/D17584

llvm-svn: 266249
2016-04-13 21:20:22 +00:00
Easwaran Raman 9a3fc17ad4 Refactor Threshold computation. NFC.
This is part of changes reviewed in http://reviews.llvm.org/D17584.

llvm-svn: 265852
2016-04-08 21:28:02 +00:00
Sanjoy Das 5ce3272833 Don't IPO over functions that can be de-refined
Summary:
Fixes PR26774.

If you're aware of the issue, feel free to skip the "Motivation"
section and jump directly to "This patch".

Motivation:

I define "refinement" as discarding behaviors from a program that the
optimizer has license to discard.  So transforming:

```
void f(unsigned x) {
  unsigned t = 5 / x;
  (void)t;
}
```

to

```
void f(unsigned x) { }
```

is refinement, since the behavior went from "if x == 0 then undefined
else nothing" to "nothing" (the optimizer has license to discard
undefined behavior).

Refinement is a fundamental aspect of many mid-level optimizations done
by LLVM.  For instance, transforming `x == (x + 1)` to `false` also
involves refinement since the expression's value went from "if x is
`undef` then { `true` or `false` } else { `false` }" to "`false`" (by
definition, the optimizer has license to fold `undef` to any non-`undef`
value).

Unfortunately, refinement implies that the optimizer cannot assume
that the implementation of a function it can see has all of the
behavior an unoptimized or a differently optimized version of the same
function can have.  This is a problem for functions with comdat
linkage, where a function can be replaced by an unoptimized or a
differently optimized version of the same source level function.

For instance, FunctionAttrs cannot assume a comdat function is
actually `readnone` even if it does not have any loads or stores in
it; since there may have been loads and stores in the "original
function" that were refined out in the currently visible variant, and
at the link step the linker may in fact choose an implementation with
a load or a store.  As an example, consider a function that does two
atomic loads from the same memory location, and writes to memory only
if the two values are not equal.  The optimizer is allowed to refine
this function by first CSE'ing the two loads, and the folding the
comparision to always report that the two values are equal.  Such a
refined variant will look like it is `readonly`.  However, the
unoptimized version of the function can still write to memory (since
the two loads //can// result in different values), and selecting the
unoptimized version at link time will retroactively invalidate
transforms we may have done under the assumption that the function
does not write to memory.

Note: this is not just a problem with atomics or with linking
differently optimized object files.  See PR26774 for more realistic
examples that involved neither.

This patch:

This change introduces a new set of linkage types, predicated as
`GlobalValue::mayBeDerefined` that returns true if the linkage type
allows a function to be replaced by a differently optimized variant at
link time.  It then changes a set of IPO passes to bail out if they see
such a function.

Reviewers: chandlerc, hfinkel, dexonsmith, joker.eph, rnk

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18634

llvm-svn: 265762
2016-04-08 00:48:30 +00:00
Easwaran Raman b1bd398ceb Revert revisions 262636, 262643, 262679, and 262682.
llvm-svn: 262883
2016-03-08 00:36:35 +00:00
Easwaran Raman 588c68a87b Fix a memory leak.
llvm-svn: 262682
2016-03-04 01:18:40 +00:00
Easwaran Raman fd6557e368 Fix breakage caused by r262636.
Use LLVM_ATTRIBUTE_UNUSED instead of __attribute_((unused))

llvm-svn: 262643
2016-03-03 18:53:20 +00:00
Easwaran Raman 3035719c86 Infrastructure for PGO enhancements in inliner
This patch provides the following infrastructure for PGO enhancements in inliner:

Enable the use of block level profile information in inliner
Incremental update of block frequency information during inlining
Update the function entry counts of callees when they get inlined into callers.

Differential Revision: http://reviews.llvm.org/D16381

llvm-svn: 262636
2016-03-03 18:26:33 +00:00
Hans Wennborg 00ab73dcb0 CallAnalyzer::analyzeCall: change the condition back to "Cost < Threshold"
In r252595, I inadvertently changed the condition to "Cost <= Threshold",
which caused a significant size regression in Chrome. This commit rectifies
that.

llvm-svn: 259915
2016-02-05 20:32:42 +00:00
Jun Bum Lim 53907161cc Avoid inlining call sites in unreachable-terminated block
Summary:
If the normal destination of the invoke or the parent block of the call site is unreachable-terminated, there is little point in inlining the call site unless there is literally zero cost. Unlike my previous change (D15289), this change specifically handle the call sites followed by unreachable in the same basic block for call or in the normal destination for the invoke. This change could be a reasonable first step to conservatively inline call sites leading to an unreachable-terminated block while BFI / BPI is not yet available in inliner.

Reviewers: manmanren, majnemer, hfinkel, davidxl, mcrosier, dblaikie, eraman

Subscribers: dblaikie, davidxl, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16616

llvm-svn: 259403
2016-02-01 20:55:11 +00:00
Yaron Keren eb2a25467e Annotate dump() methods with LLVM_DUMP_METHOD, addressing Richard Smith r259192 post commit comment.
clang part in r259232, this is the LLVM part of the patch.

llvm-svn: 259240
2016-01-29 20:50:44 +00:00
Easwaran Raman 30a93c1848 Lower inlining threshold when the caller has minsize attribute.
When the caller has optsize attribute, we reduce the inlinining threshold
to OptSizeThreshold (=75) if it is not already lower than that. We don't do
the same for minsize and I suspect it was not intentional. This also addresses
a FIXME regarding checking optsize attribute explicitly instead of using the
right wrapper.

Differential Revision: http://reviews.llvm.org/D16493

llvm-svn: 259120
2016-01-28 23:44:41 +00:00
Manuel Jacob e902459c4b Change ConstantFoldInstOperands to take Instruction instead of opcode and type. NFC.
Summary:
The previous form, taking opcode and type, is moved to an internal
helper and the new form, taking an instruction, is a wrapper around this
helper.

Although this is a slight cleanup on its own, the main motivation is to
refactor the constant folding API to ease migration to opaque pointers.
This will be follow-up work.

Reviewers: eddyb

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D16383

llvm-svn: 258391
2016-01-21 06:33:22 +00:00
Easwaran Raman f4bb2f0dc3 Refactor threshold computation for inline cost analysis
Differential Revision: http://reviews.llvm.org/D15401

llvm-svn: 257832
2016-01-14 23:16:29 +00:00
Easwaran Raman b9f7120e7a Refactor inline costs analysis by removing the InlineCostAnalysis class
InlineCostAnalysis is an analysis pass without any need for it to be one.
Once it stops being an analysis pass, it doesn't maintain any useful state
and the member functions inside can be made free functions. NFC.

Differential Revision: http://reviews.llvm.org/D15701

llvm-svn: 256521
2015-12-28 20:28:19 +00:00