llvm-project/llvm/tools/llvm-profgen/llvm-profgen.cpp

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

165 lines
6.1 KiB
C++
Raw Normal View History

[CSSPGO][llvm-profgen] Context-sensitive profile data generation This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC. This change supports context-sensitive profile data generation into llvm-profgen. With simultaneous sampling for LBR and call stack, we can identify leaf of LBR sample with calling context from stack sample . During the process of deriving fall through path from LBR entries, we unwind LBR by replaying all the calls and returns (including implicit calls/returns due to inlining) backwards on top of the sampled call stack. Then the state of call stack as we unwind through LBR always represents the calling context of current fall through path. we have two types of virtual unwinding 1) LBR unwinding and 2) linear range unwinding. Specifically, for each LBR entry which can be classified into call, return, regular branch, LBR unwinding will replay the operation by pushing, popping or switching leaf frame towards the call stack and since the initial call stack is most recently sampled, the replay should be in anti-execution order, i.e. for the regular case, pop the call stack when LBR is call, push frame on call stack when LBR is return. After each LBR processed, it also needs to align with the next LBR by going through instructions from previous LBR's target to current LBR's source, which we named linear unwinding. As instruction from linear range can come from different function by inlining, linear unwinding will do the range splitting and record counters through the range with same inline context. With each fall through path from LBR unwinding, we aggregate each sample into counters by the calling context and eventually generate full context sensitive profile (without relying on inlining) to driver compiler's PGO/FDO. A breakdown of noteworthy changes: - Added `HybridSample` class as the abstraction perf sample including LBR stack and call stack * Extended `PerfReader` to implement auto-detect whether input perf script output contains CS profile, then do the parsing. Multiple `HybridSample` are extracted * Speed up by aggregating `HybridSample` into `AggregatedSamples` * Added VirtualUnwinder that consumes aggregated `HybridSample` and implements unwinding of calls, returns, and linear path that contains implicit call/return from inlining. Ranges and branches counters are aggregated by the calling context.
 Here calling context is string type, each context is a pair of function name and callsite location info, the whole context is like `main:1 @ foo:2 @ bar`. * Added PorfileGenerater that accumulates counters by ranges unfolding or branch target mapping, then generates context-sensitive function profile including function body, inferring callee's head sample, callsite target samples, eventually records into ProfileMap.
 * Leveraged LLVM build-in(`SampleProfWriter`) writer to support different serialization format with no stop - `getCanonicalFnName` for callee name and name from ELF section - Added regression test for both unwinding and profile generation Test Plan: ninja & ninja check-llvm Reviewed By: hoy, wenlei, wmi Differential Revision: https://reviews.llvm.org/D89723
2020-10-20 03:55:59 +08:00
//===- llvm-profgen.cpp - LLVM SPGO profile generation tool -----*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// llvm-profgen generates SPGO profiles from perf script ouput.
//
//===----------------------------------------------------------------------===//
#include "ErrorHandling.h"
#include "PerfReader.h"
[CSSPGO][llvm-profgen] Context-sensitive profile data generation This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC. This change supports context-sensitive profile data generation into llvm-profgen. With simultaneous sampling for LBR and call stack, we can identify leaf of LBR sample with calling context from stack sample . During the process of deriving fall through path from LBR entries, we unwind LBR by replaying all the calls and returns (including implicit calls/returns due to inlining) backwards on top of the sampled call stack. Then the state of call stack as we unwind through LBR always represents the calling context of current fall through path. we have two types of virtual unwinding 1) LBR unwinding and 2) linear range unwinding. Specifically, for each LBR entry which can be classified into call, return, regular branch, LBR unwinding will replay the operation by pushing, popping or switching leaf frame towards the call stack and since the initial call stack is most recently sampled, the replay should be in anti-execution order, i.e. for the regular case, pop the call stack when LBR is call, push frame on call stack when LBR is return. After each LBR processed, it also needs to align with the next LBR by going through instructions from previous LBR's target to current LBR's source, which we named linear unwinding. As instruction from linear range can come from different function by inlining, linear unwinding will do the range splitting and record counters through the range with same inline context. With each fall through path from LBR unwinding, we aggregate each sample into counters by the calling context and eventually generate full context sensitive profile (without relying on inlining) to driver compiler's PGO/FDO. A breakdown of noteworthy changes: - Added `HybridSample` class as the abstraction perf sample including LBR stack and call stack * Extended `PerfReader` to implement auto-detect whether input perf script output contains CS profile, then do the parsing. Multiple `HybridSample` are extracted * Speed up by aggregating `HybridSample` into `AggregatedSamples` * Added VirtualUnwinder that consumes aggregated `HybridSample` and implements unwinding of calls, returns, and linear path that contains implicit call/return from inlining. Ranges and branches counters are aggregated by the calling context.
 Here calling context is string type, each context is a pair of function name and callsite location info, the whole context is like `main:1 @ foo:2 @ bar`. * Added PorfileGenerater that accumulates counters by ranges unfolding or branch target mapping, then generates context-sensitive function profile including function body, inferring callee's head sample, callsite target samples, eventually records into ProfileMap.
 * Leveraged LLVM build-in(`SampleProfWriter`) writer to support different serialization format with no stop - `getCanonicalFnName` for callee name and name from ELF section - Added regression test for both unwinding and profile generation Test Plan: ninja & ninja check-llvm Reviewed By: hoy, wenlei, wmi Differential Revision: https://reviews.llvm.org/D89723
2020-10-20 03:55:59 +08:00
#include "ProfileGenerator.h"
#include "ProfiledBinary.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/InitLLVM.h"
#include "llvm/Support/TargetSelect.h"
static cl::OptionCategory ProfGenCategory("ProfGen Options");
static cl::opt<std::string> PerfScriptFilename(
"perfscript", cl::value_desc("perfscript"), cl::ZeroOrMore,
llvm::cl::MiscFlags::CommaSeparated,
cl::desc("Path of perf-script trace created by Linux perf tool with "
"`script` command(the raw perf.data should be profiled with -b)"),
cl::cat(ProfGenCategory));
static cl::alias PSA("ps", cl::desc("Alias for --perfscript"),
cl::aliasopt(PerfScriptFilename));
static cl::opt<std::string> PerfDataFilename(
"perfdata", cl::value_desc("perfdata"), cl::ZeroOrMore,
llvm::cl::MiscFlags::CommaSeparated,
cl::desc("Path of raw perf data created by Linux perf tool (it should be "
"profiled with -b)"),
cl::cat(ProfGenCategory));
static cl::alias PDA("pd", cl::desc("Alias for --perfdata"),
cl::aliasopt(PerfDataFilename));
static cl::opt<std::string> UnsymbolizedProfFilename(
"unsymbolized-profile", cl::value_desc("unsymbolized profile"),
cl::ZeroOrMore, llvm::cl::MiscFlags::CommaSeparated,
cl::desc("Path of the unsymbolized profile created by "
"`llvm-profgen` with `--skip-symbolization`"),
cl::cat(ProfGenCategory));
static cl::alias UPA("up", cl::desc("Alias for --unsymbolized-profile"),
cl::aliasopt(UnsymbolizedProfFilename));
static cl::opt<std::string>
BinaryPath("binary", cl::value_desc("binary"), cl::Required,
cl::desc("Path of profiled executable binary."),
cl::cat(ProfGenCategory));
static cl::opt<std::string> DebugBinPath(
"debug-binary", cl::value_desc("debug-binary"), cl::ZeroOrMore,
cl::desc("Path of debug info binary, llvm-profgen will load the DWARF info "
"from it instead of the executable binary."),
cl::cat(ProfGenCategory));
extern cl::opt<bool> ShowDisassemblyOnly;
extern cl::opt<bool> ShowSourceLocations;
[llvm-profgen] Support LBR only perf script This change aims at supporting LBR only sample perf script which is used for regular(Non-CS) profile generation. A LBR perf script includes a batch of LBR sample which starts with a frame pointer and a group of 32 LBR entries is followed. The FROM/TO LBR pair and the range between two consecutive entries (the former entry's TO and the latter entry's FROM) will be used to infer function profile info. An example of LBR perf script(created by `perf script -F ip,brstack -i perf.data`) ``` 40062f 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 ... 4005d7 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 ... ... ``` For implementation: - Extended a new child class `LBRPerfReader` for the sample parsing, reused all the functionalities in `extractLBRStack` except for an extension to parsing leading instruction pointer. - `HybridSample` is reused(just leave the call stack empty) and the parsed samples is still aggregated in `AggregatedSamples`. After that, range samples, branch sample, address samples are computed and recorded. - Reused `ContextSampleCounterMap` to store the raw profile, since it's no need to aggregation by context, here it just registered one sample counter with a fake context key. - Unified to use `show-raw-profile` instead of `show-unwinder-output` to dump the intermediate raw profile, see the comments of the format of the raw profile. For CS profile, it remains to output the unwinder output. Profile generation part will come soon. Differential Revision: https://reviews.llvm.org/D108153
2021-09-01 04:27:42 +08:00
extern cl::opt<bool> SkipSymbolization;
using namespace llvm;
using namespace sampleprof;
// Validate the command line input.
static void validateCommandLine() {
// Allow the missing perfscript if we only use to show binary disassembly.
if (!ShowDisassemblyOnly) {
// Validate input profile is provided only once
uint16_t HasPerfData = PerfDataFilename.getNumOccurrences();
uint16_t HasPerfScript = PerfScriptFilename.getNumOccurrences();
uint16_t HasUnsymbolizedProfile =
UnsymbolizedProfFilename.getNumOccurrences();
uint16_t S = HasPerfData + HasPerfScript + HasUnsymbolizedProfile;
if (S != 1) {
std::string Msg =
S > 1
? "`--perfscript`, `--perfdata` and `--unsymbolized-profile` "
"cannot be used together."
: "Perf input file is missing, please use one of `--perfscript`, "
"`--perfdata` and `--unsymbolized-profile` for the input.";
exitWithError(Msg);
}
auto CheckFileExists = [](bool H, StringRef File) {
if (H && !llvm::sys::fs::exists(File)) {
std::string Msg = "Input perf file(" + File.str() + ") doesn't exist.";
exitWithError(Msg);
}
};
CheckFileExists(HasPerfData, PerfDataFilename);
CheckFileExists(HasPerfScript, PerfScriptFilename);
CheckFileExists(HasUnsymbolizedProfile, UnsymbolizedProfFilename);
}
if (!llvm::sys::fs::exists(BinaryPath)) {
std::string Msg = "Input binary(" + BinaryPath + ") doesn't exist.";
exitWithError(Msg);
}
if (CSProfileGenerator::MaxCompressionSize < -1) {
exitWithError("Value of --compress-recursion should >= -1");
}
if (ShowSourceLocations && !ShowDisassemblyOnly) {
exitWithError("--show-source-locations should work together with "
"--show-disassembly-only!");
}
}
static PerfInputFile getPerfInputFile() {
PerfInputFile File;
if (PerfDataFilename.getNumOccurrences()) {
File.InputFile = PerfDataFilename;
File.Format = PerfFormat::PerfData;
} else if (PerfScriptFilename.getNumOccurrences()) {
File.InputFile = PerfScriptFilename;
File.Format = PerfFormat::PerfScript;
} else if (UnsymbolizedProfFilename.getNumOccurrences()) {
File.InputFile = UnsymbolizedProfFilename;
File.Format = PerfFormat::UnsymbolizedProfile;
}
return File;
}
int main(int argc, const char *argv[]) {
InitLLVM X(argc, argv);
// Initialize targets and assembly printers/parsers.
InitializeAllTargetInfos();
InitializeAllTargetMCs();
InitializeAllDisassemblers();
cl::HideUnrelatedOptions({&ProfGenCategory, &getColorCategory()});
cl::ParseCommandLineOptions(argc, argv, "llvm SPGO profile generator\n");
validateCommandLine();
// Load symbols and disassemble the code of a given binary.
std::unique_ptr<ProfiledBinary> Binary =
std::make_unique<ProfiledBinary>(BinaryPath, DebugBinPath);
if (ShowDisassemblyOnly)
return EXIT_SUCCESS;
PerfInputFile PerfFile = getPerfInputFile();
std::unique_ptr<PerfReaderBase> Reader =
PerfReaderBase::create(Binary.get(), PerfFile);
// Parse perf events and samples
Reader->parsePerfTraces();
[llvm-profgen] Support LBR only perf script This change aims at supporting LBR only sample perf script which is used for regular(Non-CS) profile generation. A LBR perf script includes a batch of LBR sample which starts with a frame pointer and a group of 32 LBR entries is followed. The FROM/TO LBR pair and the range between two consecutive entries (the former entry's TO and the latter entry's FROM) will be used to infer function profile info. An example of LBR perf script(created by `perf script -F ip,brstack -i perf.data`) ``` 40062f 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 ... 4005d7 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 ... ... ``` For implementation: - Extended a new child class `LBRPerfReader` for the sample parsing, reused all the functionalities in `extractLBRStack` except for an extension to parsing leading instruction pointer. - `HybridSample` is reused(just leave the call stack empty) and the parsed samples is still aggregated in `AggregatedSamples`. After that, range samples, branch sample, address samples are computed and recorded. - Reused `ContextSampleCounterMap` to store the raw profile, since it's no need to aggregation by context, here it just registered one sample counter with a fake context key. - Unified to use `show-raw-profile` instead of `show-unwinder-output` to dump the intermediate raw profile, see the comments of the format of the raw profile. For CS profile, it remains to output the unwinder output. Profile generation part will come soon. Differential Revision: https://reviews.llvm.org/D108153
2021-09-01 04:27:42 +08:00
if (SkipSymbolization)
return EXIT_SUCCESS;
std::unique_ptr<ProfileGeneratorBase> Generator =
ProfileGeneratorBase::create(Binary.get(), Reader->getSampleCounters(),
[CSSPGO] Use nested context-sensitive profile. CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts. For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned. In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed. A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum. ``` main:1968679:12 2: 24 3: 28 _Z5funcAi:18 3.1: 28 _Z5funcBi:30 3: _Z5funcAi:1467398 0: 10 1: 10 _Z8funcLeafi:11 3: 24 1: _Z8funcLeafi:1467299 0: 6 1: 6 3: 287884 4: 287864 _Z3fibi:315608 15: 23 !CFGChecksum: 138828622701 !Attributes: 2 !CFGChecksum: 281479271677951 !Attributes: 2 ``` Specific work included in this change: - A recursive profile converter to convert CS flat profile to nested profile. - Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile. - Unifiy sample loader inliner path for CS and preinlined nested profile. - Changes in the sample loader to support probe-based nested profile. I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1. Test Plan: Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D115205
2021-12-15 02:03:05 +08:00
Reader->profileIsCSFlat());
[CSSPGO][llvm-profgen] Context-sensitive profile data generation This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC. This change supports context-sensitive profile data generation into llvm-profgen. With simultaneous sampling for LBR and call stack, we can identify leaf of LBR sample with calling context from stack sample . During the process of deriving fall through path from LBR entries, we unwind LBR by replaying all the calls and returns (including implicit calls/returns due to inlining) backwards on top of the sampled call stack. Then the state of call stack as we unwind through LBR always represents the calling context of current fall through path. we have two types of virtual unwinding 1) LBR unwinding and 2) linear range unwinding. Specifically, for each LBR entry which can be classified into call, return, regular branch, LBR unwinding will replay the operation by pushing, popping or switching leaf frame towards the call stack and since the initial call stack is most recently sampled, the replay should be in anti-execution order, i.e. for the regular case, pop the call stack when LBR is call, push frame on call stack when LBR is return. After each LBR processed, it also needs to align with the next LBR by going through instructions from previous LBR's target to current LBR's source, which we named linear unwinding. As instruction from linear range can come from different function by inlining, linear unwinding will do the range splitting and record counters through the range with same inline context. With each fall through path from LBR unwinding, we aggregate each sample into counters by the calling context and eventually generate full context sensitive profile (without relying on inlining) to driver compiler's PGO/FDO. A breakdown of noteworthy changes: - Added `HybridSample` class as the abstraction perf sample including LBR stack and call stack * Extended `PerfReader` to implement auto-detect whether input perf script output contains CS profile, then do the parsing. Multiple `HybridSample` are extracted * Speed up by aggregating `HybridSample` into `AggregatedSamples` * Added VirtualUnwinder that consumes aggregated `HybridSample` and implements unwinding of calls, returns, and linear path that contains implicit call/return from inlining. Ranges and branches counters are aggregated by the calling context.
 Here calling context is string type, each context is a pair of function name and callsite location info, the whole context is like `main:1 @ foo:2 @ bar`. * Added PorfileGenerater that accumulates counters by ranges unfolding or branch target mapping, then generates context-sensitive function profile including function body, inferring callee's head sample, callsite target samples, eventually records into ProfileMap.
 * Leveraged LLVM build-in(`SampleProfWriter`) writer to support different serialization format with no stop - `getCanonicalFnName` for callee name and name from ELF section - Added regression test for both unwinding and profile generation Test Plan: ninja & ninja check-llvm Reviewed By: hoy, wenlei, wmi Differential Revision: https://reviews.llvm.org/D89723
2020-10-20 03:55:59 +08:00
Generator->generateProfile();
Generator->write();
return EXIT_SUCCESS;
}