Commit Graph

79 Commits

Author SHA1 Message Date
Dehao Chen 53a0c082d2 Do not set branch weight if the branch weight annotation is present.
Summary: ThinLTO will annotate the CFG twice. If the branch weight is set by the first annotation, we should not set the branch weight again in the second annotation because the first annotation is more accurate as there is less optimization that could affect debug info accuracy.

Reviewers: tejohnson, davidxl

Reviewed By: tejohnson

Subscribers: mehdi_amini, aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D31228

llvm-svn: 298602
2017-03-23 14:43:10 +00:00
Dehao Chen 4a435e0896 SamplePGO ThinLTO ICP fix for local functions.
Summary:
In SamplePGO, if the profile is collected from non-LTO binary, and used to drive ThinLTO, the indirect call promotion may fail because ThinLTO adjusts local function names to avoid conflicts. There are two places of where the mismatch can happen:

1. thin-link prepends SourceFileName to front of FuncName to build the GUID (GlobalValue::getGlobalIdentifier). Unlike instrumentation FDO, SamplePGO does not use the PGOFuncName scheme and therefore the indirect call target profile data contains a hash of the OriginalName.
2. backend compiler promotes some local functions to global and appends .llvm.{$ModuleHash} to the end of the FuncName to derive PromotedFunctionName

This patch tries at the best effort to find the GUID from the original local function name (in profile), and use that in ICP promotion, and in SamplePGO matching that happens in the backend after importing/inlining:

1. in thin-link, it builds the map from OriginalName to GUID so that when thin-link reads in indirect call target profile (represented by OriginalName), it knows which GUID to import.
2. in backend compiler, if sample profile reader cannot find a profile match for PromotedFunctionName, it will try to find if there is a match for OriginalFunctionName.
3. in backend compiler, we build symbol table entry for OriginalFunctionName and pointer to the same symbol of PromotedFunctionName, so that ICP can find the correct target to promote.

Reviewers: mehdi_amini, tejohnson

Reviewed By: tejohnson

Subscribers: llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D30754

llvm-svn: 297757
2017-03-14 17:33:01 +00:00
Dehao Chen c632a393b7 Remove the sample pgo annotation heuristic that uses call count to annotate basic block count.
Summary: We do not need that special handling because the debug info is more accurate now. Performance testing shows no regression on google internal benchmarks.

Reviewers: davidxl, aprantl

Reviewed By: aprantl

Subscribers: llvm-commits, aprantl

Differential Revision: https://reviews.llvm.org/D30658

llvm-svn: 297038
2017-03-06 17:49:59 +00:00
Dehao Chen a60cdd3881 Add function importing info from samplepgo profile to the module summary.
Summary: For SamplePGO, the profile may contain cross-module inline stacks. As we need to make sure the profile annotation happens when all the hot inline stacks are expanded, we need to pass this info to the module importer so that it can import proper functions if necessary. This patch implemented this feature by emitting cross-module targets as part of function entry metadata. In the module-summary phase, the metadata is used to build call edges that points to functions need to be imported.

Reviewers: mehdi_amini, tejohnson

Reviewed By: tejohnson

Subscribers: davidxl, llvm-commits

Differential Revision: https://reviews.llvm.org/D30053

llvm-svn: 296498
2017-02-28 18:09:44 +00:00
Dehao Chen cc75d2441d Add call branch annotation for ICP promoted direct call in SamplePGO mode.
Summary: SamplePGO uses branch_weight annotation to represent callsite hotness. When ICP promotes an indirect call to direct call, we need to make sure the direct call is annotated with branch_weight in SamplePGO mode, so that downstream function inliner can use hot callsite heuristic.

Reviewers: davidxl, eraman, xur

Reviewed By: davidxl, xur

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D30282

llvm-svn: 296028
2017-02-23 22:15:18 +00:00
Dehao Chen 533bc6ea8e Use base discriminator in sample pgo profile matching.
Summary: The discriminator has been encoded, and only the base discriminator should be used during profile matching.

Reviewers: dblaikie, davidxl

Reviewed By: dblaikie, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30218

llvm-svn: 295999
2017-02-23 18:27:45 +00:00
Dehao Chen 4a9dd70213 Fix the samplepgo indirect call promotion bug: we should not promote a direct call.
Summary: Checking CS.getCalledFunction() == nullptr does not necessary indicate indirect call. We also need to check if CS.getCalledValue() is not a constant.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29570

llvm-svn: 294260
2017-02-06 23:33:15 +00:00
Dehao Chen c81483d79c Fix the bug of samplepgo indirect call promption when type casting of the return value is needed.
Summary: When type casting of the return value is needed, promoteIndirectCall will return the type casting instruction instead of the direct call. This patch changed to return the direct call instruction instead.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29569

llvm-svn: 294205
2017-02-06 18:10:36 +00:00
Dehao Chen 448b5790f6 Refactor SampleProfile.cpp to make it cleaner. (NFC)
llvm-svn: 294118
2017-02-05 07:32:17 +00:00
Dehao Chen 274df5ea41 Explicitly promote indirect calls before sample profile annotation.
Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation.

Reviewers: xur, davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29040

llvm-svn: 293657
2017-01-31 17:49:37 +00:00
Dehao Chen 6217fa44b8 Revert r292979 which causes compile time failure.
llvm-svn: 293557
2017-01-30 22:26:05 +00:00
Dehao Chen a5eb1689dc Explicitly promote indirect calls before sample profile annotation.
Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation.

Reviewers: xur, davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29040

llvm-svn: 292979
2017-01-24 21:05:51 +00:00
Evgeniy Stepanov 29e56bceec Revert "Refactor SampleProfile.cpp to move computation inside a branch. (NFC)"
Causes MSan failures on the buildbot.

llvm-svn: 292840
2017-01-23 22:40:08 +00:00
Dehao Chen a53219a5f2 Refactor SampleProfile.cpp to move computation inside a branch. (NFC)
llvm-svn: 292803
2017-01-23 17:09:02 +00:00
Dehao Chen 77079003dd Add indirect call promotion to SamplePGO
Summary: This patch adds metadata for indirect call promotion in the sample profile loader.

Reviewers: xur, davidxl, dnovillo

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28923

llvm-svn: 292672
2017-01-20 22:56:07 +00:00
Dehao Chen 94f369fc7f clang-format SampleProfile.cpp (NFC)
llvm-svn: 292533
2017-01-19 23:20:31 +00:00
Daniel Jasper aec2fa352f Revert @llvm.assume with operator bundles (r289755-r289757)
This creates non-linear behavior in the inliner (see more details in
r289755's commit thread).

llvm-svn: 290086
2016-12-19 08:22:17 +00:00
Hal Finkel 3ca4a6bcf1 Remove the AssumptionCache
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...

llvm-svn: 289756
2016-12-15 03:02:15 +00:00
Dehao Chen 40dd8c5109 Only sets profile summary when it was not preset.
Summary: SampleProfileLoader pass may be invoked twice by LTO. The 2nd pass should not append more summary info as it is already preset by the 1st pass.

Reviewers: eraman, davidxl

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D27733

llvm-svn: 289725
2016-12-14 22:06:49 +00:00
Dehao Chen fb699619a0 Fix the bug in r289714 (NFC).
llvm-svn: 289724
2016-12-14 22:03:08 +00:00
Dehao Chen 0f35fa907d Change CoverageTracker from a global variable to member variable to avoid breaking thread-safety. (NFC)
llvm-svn: 289603
2016-12-13 22:13:18 +00:00
David Blaikie 831b652020 Use CallSite to simplify code
llvm-svn: 288192
2016-11-29 19:42:27 +00:00
Dehao Chen 554f500ae2 Before sample pgo annotation, do not inline a function that has no debug info. (NFC)
If there is no debug info in the callee, inlining it will not help annotator. This avoids infinite loop as reported in PR/31119.

llvm-svn: 287710
2016-11-22 22:50:01 +00:00
Mehdi Amini 117296c0a0 Use StringRef in Pass/PassManager APIs (NFC)
llvm-svn: 283004
2016-10-01 02:56:57 +00:00
Dehao Chen 160fbc3f95 Change the basic block weight calculation algorithm to use max instead of voting.
Summary: Now that we have more precise debug info, we should change back to use maximum to get basic block weight.

Reviewers: dnovillo

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D24788

llvm-svn: 282084
2016-09-21 16:26:51 +00:00
Dehao Chen 20866ed57e Handle early inline for hot callsites that reside in the same basic block.
Summary: Callsites in the same basic block should share the same hotness. This patch checks for the hottest callsite in the same basic block, and use the hotness for all callsites in that basic block for early inline decisions. It also fixes the test to add "-S" so theat the "CHECK-NOT" is actually checking the content.

Reviewers: dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24734

llvm-svn: 281927
2016-09-19 18:38:14 +00:00
Dehao Chen 82667d04c5 Only set branch weight during sample pgo annotation when max_weight of the branch is non-zero. Otherwise use default static profile to set branch probability.
Summary: It does not make sense to set equal weights for all unkown branches as we have static branch prediction available.

Reviewers: dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24732

llvm-svn: 281912
2016-09-19 16:33:41 +00:00
Dehao Chen 38e3731c47 Use call target count to derive the call instruction weight
Summary: The call target count profile is directly derived from LBR branch->target data. This is more reliable than instruction frequency profiles that could be moved across basic block boundaries. This patches uses call target count profile to annotate call instructions.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24410

llvm-svn: 281911
2016-09-19 16:06:37 +00:00
Dehao Chen 41cde0b986 Handle Invoke during sample profiler annotation: make it inlinable.
Summary: Previously we reline on inst-combine to remove inlinable invoke instructions. This causes trouble because a few extra optimizations are schedule early that could introduce too much CFG change (e.g. simplifycfg removes too much control flow). This patch handles invoke instruction in-place during sample profile annotation, so that we do not rely on instcombine to remove those invoke instructions.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24409

llvm-svn: 281870
2016-09-18 23:11:37 +00:00
Dehao Chen c0a1e432c7 Fine tuning of sample profile propagation algorithm.
Summary: The refined propagation algorithm is more accurate and robust.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23224

llvm-svn: 278522
2016-08-12 16:22:12 +00:00
Sean Silva fd03ac6a0c Consistently use ModuleAnalysisManager
Besides a general consistently benefit, the extra layer of indirection
allows the mechanical part of https://reviews.llvm.org/D23256 that
requires touching every transformation and analysis to be factored out
cleanly.

Thanks to David for the suggestion.

llvm-svn: 278078
2016-08-09 00:28:38 +00:00
Sean Silva ab6a683765 Avoid using a raw AssumptionCacheTracker in various inliner functions.
This unblocks the new PM part of River's patch in
https://reviews.llvm.org/D22706

Conveniently, this same change was needed for D21921 and so these
changes are just spun out from there.

llvm-svn: 276515
2016-07-23 04:22:50 +00:00
Dehao Chen 9232f98279 Implement callsite-hotness based inline cost for Sample-based PGO
Summary:
For sample-based PGO, using BFI to calculate callsite count is sometime not accurate. This is because with sampling based approach, if a callsite resides in a hot loop deeply nested in a bunch of cold branches, the callsite's BFI frequency would be inaccurately calculated due to lack of samples in the cold branch.

E.g.

if (A1 && A2 && A3 && ..... && A10) {
  for (i=0; i < 100000000; i++) {
    callsite();
  }
}

Assume that A1 to A100 are all 100% taken, and callsite has 1000 samples and thus is considerred hot. Because the loop's trip count is huge, it's normal that all branches outside the loop has no sample at all. As a result, we can only use static branch probability to derive the the frequency of the loop header. Assuming that static heuristic thinks each branch is 50% taken, then the count calculated from BFI will be 1/(2^10) of the actual value.

In order to get more accurate callsite count, we directly annotate the weight on the call instruction, and directly use it when checking callsite hotness.

Note that this mechanism can also be shared by instrumentation based callsite hotness analysis. The side benefit is that it breaks the dependency from Inliner to BFI as call count is embedded in the IR.

Reviewers: davidxl, eraman, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22118

llvm-svn: 275073
2016-07-11 16:48:54 +00:00
Dehao Chen 29d2641f52 Tune the weight propagation algorithm for sample profile.
Summary: Handle the case when there is only one incoming/outgoing edge for a visited basic block: use the block weight to adjust edge weight even when the edge has been visited before. This can help reduce inaccuracies introduced by incorrect basic block profile, as shown in the updated unittest.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22180

llvm-svn: 275072
2016-07-11 16:40:17 +00:00
Dehao Chen 429f5c735f Remove inline hints computation from SampleProfile.cpp
Summary: As we will move to use uniformed hotness check in inliner, we do not need inline hints in SampleProfile pass any more.

Reviewers: dnovillo, davidxl

Subscribers: eraman, llvm-commits

Differential Revision: http://reviews.llvm.org/D19287

llvm-svn: 274918
2016-07-08 20:12:44 +00:00
Dehao Chen c66a06ad0e Hookup ProfileSummary with SampleProfilerLoader
Summary: Set ProfileSummary in SampleProfilerLoader.

Reviewers: davidxl, eraman

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D21702

llvm-svn: 273745
2016-06-24 22:57:06 +00:00
Dehao Chen 071bb9d7af Pass AssumptionCacheTracker from SampleProfileLoader to Inliner
Summary: Inliner needs ACT when calling InlineFunction. Instead of nullptr, we need to pass it in from SampleProfileLoader

Reviewers: davidxl

Subscribers: eraman, vsk, danielcdh, llvm-commits

Differential Revision: http://reviews.llvm.org/D21205

llvm-svn: 273199
2016-06-20 20:53:40 +00:00
Xinliang David Li d38392ecd6 [PM] Port the Sample FDO to new PM (part-2)
llvm-svn: 271072
2016-05-27 23:20:16 +00:00
Xinliang David Li e897edbd36 [PM] Port the Sample FDO to new PM (part-1)
llvm-svn: 271062
2016-05-27 22:30:44 +00:00
Dehao Chen 80b16d4135 Remove sample profile dependency to instcombine, which is not a analysis pass.
Summary: This patch removes dependency from sample profile pass to instcombine pass.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20501

llvm-svn: 271009
2016-05-27 16:14:15 +00:00
Benjamin Kramer 4fed928f53 Avoid some copies by using const references.
clang-tidy's performance-unnecessary-copy-initialization with some manual
fixes. No functional changes intended.

llvm-svn: 270988
2016-05-27 12:30:51 +00:00
Dehao Chen 21aefaec97 Do not read callee name when matching IR to profile as it is not used.
Summary: Callee name is not used to identify a callsite now, so do not read it during annotation.

Reviewers: davidxl, dnovillo

Subscribers: dnovillo, danielcdh, llvm-commits

Differential Revision: http://reviews.llvm.org/D19704

llvm-svn: 268069
2016-04-29 17:19:10 +00:00
Dehao Chen 5d6d4841ed Tune basic block annotation algorithm.
Summary:
Instead of using maximum IR weight as the basic block weight, this patch uses the voting algorithm to find the most likely weight for the basic block. This can effectively avoid the cases when some IRs are annotated incorrectly due to code motion of the profiled binary.

This patch also updates propagate.ll unittest to include discriminator in the input file so that it is testing something meaningful.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19301

llvm-svn: 267519
2016-04-26 04:59:11 +00:00
Dehao Chen a8bae82373 Refine instruction weight annotation algorithm for sample profiler.
Summary:
This patch refined the instruction weight anootation algorithm:
1. Do not use dbg_value intrinsics for annotation.
2. Annotate cold calls if the call is inlined in profile, but not inlined before preparation. This indicates that the annotation preparation step found no sample for the inlined callsite, thus the call should be very cold.

Reviewers: dnovillo, davidxl

Subscribers: mgrang, llvm-commits

Differential Revision: http://reviews.llvm.org/D19286

llvm-svn: 266936
2016-04-20 23:36:23 +00:00
Pete Cooper adebb9379a Remove llvm::getDISubprogram in favor of Function::getSubprogram
llvm::getDISubprogram walks the instructions in a function, looking for one in the scope of the current function, so that it can find the !dbg entry for the subprogram itself.

Now that !dbg is attached to functions, this should not be necessary. This patch changes all uses to just query the subprogram directly on the function.

Ideally this should be NFC, but in reality its possible that a function:

has no !dbg (in which case there's likely a bug somewhere in an opt pass), or
that none of the instructions had a scope referencing the function, so we used to not find the !dbg on the function but now we will

Reviewed by Duncan Exon Smith.

Differential Revision: http://reviews.llvm.org/D18074

llvm-svn: 263184
2016-03-11 02:14:16 +00:00
Dehao Chen 57d1dda558 Use LineLocation instead of CallsiteLocation to index callsite profile.
Summary: With discriminator, LineLocation can uniquely identify a callsite without the need to specifying callee name. Remove Callee function name from the key, and put it in the value (FunctionSamples).

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17827

llvm-svn: 262634
2016-03-03 18:09:32 +00:00
Dehao Chen 1012be120a Perform InstructioinCombiningPass before SampleProfile pass.
Summary: SampleProfile pass needs to be performed after InstructionCombiningPass, which helps eliminate un-inlinable function calls.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17742

llvm-svn: 262419
2016-03-01 22:53:02 +00:00
Dehao Chen 6c73b49911 Set function entry count as 0 if sample profile is not found for the function.
Summary: This change makes the sample profile's behavior consistent with instr profile.

Reviewers: davidxl, eraman, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17522

llvm-svn: 261587
2016-02-22 22:46:21 +00:00
Benjamin Kramer 8a752e316d Use ArrayRef to hide SmallVector details, kill a useless vector copy along the way.
llvm-svn: 260824
2016-02-13 16:01:12 +00:00
Matthias Braun b30f2f5141 Avoid overly large SmallPtrSet/SmallSet
These sets perform linear searching in small mode so it is never a good
idea to use SmallSize/N bigger than 32.

llvm-svn: 259283
2016-01-30 01:24:31 +00:00