This is to account for the change that made CountersPtr in __profd_
relative which landed in a1532ed275.
That change hasn't updated the raw profile version, and while the
profile layout stayed the same, profiles generated by tip-of-tree
LLVM are incompatible with 13.x tooling.
Differential Revision: https://reviews.llvm.org/D111123
Some tests with binary IDs would fail with error: no profile can be merged.
This is because raw profiles could have unaligned headers when emitting binary
IDs. This means padding should be emitted after binary IDs are emitted to
ensure everything else is aligned. This patch adds padding after each binary ID
to ensure the next binary ID size is 8-byte aligned. This also adds extra
checks to ensure we aren't reading corrupted data when printing binary IDs.
Differential Revision: https://reviews.llvm.org/D110365
format.
Currently when we add a new section in the profile format and generate a profile
containing the new section, older compiler which reads the new profile will
issue an error. The forward incompatibility can cause unnecessary churn when
extending the profile. This patch removes the incompatibility when adding a new
section for extbinary format.
Differential Revision: https://reviews.llvm.org/D109398
On some platform (eg: AIX), diff will complain about newline.
diff: Missing newline at the end of file
.../llvm/test/tools/llvm-profdata/Inputs/cs-sample.proftext.
Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context.
A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing.
When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`.
Testing
This is no-diff change in terms of code quality and profile content (for text profile).
For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated.
The compile time of ads is dropped by 25%.
Differential Revision: https://reviews.llvm.org/D107299
We have "-profile-isfs" internal option for text, binary, and
compactbinary format (mostly for debug and test purpose). We
need to set the related flag in FunctionSamples so that ProfileIsFS
is written to the header in extbinary format.
Differential Revision: https://reviews.llvm.org/D108707
Sample profiles are stored in a string map which is basically an unordered map. Printing out profiles by simply walking the string map doesn't enforce an order. I'm sorting the map in the decreasing order of total samples to enable a more stable dump, which is good for comparing two dumps.
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D108147
This fixes support for merging profiles which broke as a consequence
of e50a38840d. The issue was missing
adjustment in merge logic to account for the binary IDs which are
now included in the raw profile just after header.
In addition, this change also:
* Includes the version in module signature that's used for merging
to avoid accidental attempts to merge incompatible profiles.
* Moves the binary IDs size field after version field in the header
as was suggested in the review.
Differential Revision: https://reviews.llvm.org/D107143
This fixes support for merging profiles which broke as a consequence
of e50a38840d. The issue was missing
adjustment in merge logic to account for the binary IDs which are
now included in the raw profile just after header.
In addition, this change also:
* Includes the version in module signature that's used for merging
to avoid accidental attempts to merge incompatible profiles.
* Moves the binary IDs size field after version field in the header
as was suggested in the review.
Differential Revision: https://reviews.llvm.org/D107143
Change `CountersPtr` in `__profd_` to a label difference, which is a link-time
constant. On ELF, when linking a shared object, this requires that `__profc_` is
either private or linkonce/linkonce_odr hidden. On COFF, we need D104564 so that
`.quad a-b` (64-bit label difference) can lower to a 32-bit PC-relative relocation.
```
# ELF: R_X86_64_PC64 (PC-relative)
.quad .L__profc_foo-.L__profd_foo
# Mach-O: a pair of 8-byte X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR
.quad l___profc_foo-l___profd_foo
# COFF: we actually use IMAGE_REL_AMD64_REL32/IMAGE_REL_ARM64_REL32 so
# the high 32-bit value is zero even if .L__profc_foo < .L__profd_foo
# As compensation, we truncate CountersDelta in the header so that
# __llvm_profile_merge_from_buffer and llvm-profdata reader keep working.
.quad .L__profc_foo-.L__profd_foo
```
(Note: link.exe sorts `.lprfc` before `.lprfd` even if the object writer
has `.lprfd` before `.lprfc`, so we cannot work around by reordering
`.lprfc` and `.lprfd`.)
With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
`ld -pie` linked clang is 1.74% smaller due to fewer R_X86_64_RELATIVE relocations.
```
% readelf -r pie | awk '$3~/R.*/{s[$3]++} END {for (k in s) print k, s[k]}'
R_X86_64_JUMP_SLO 331
R_X86_64_TPOFF64 2
R_X86_64_RELATIVE 476059 # was: 607712
R_X86_64_64 2616
R_X86_64_GLOB_DAT 31
```
The absolute function address (used by llvm-profdata to collect indirect call
targets) can be converted to relative as well, but is not done in this patch.
Differential Revision: https://reviews.llvm.org/D104556
This makes it more convenient to get a text format profile.
Add an error for printing non-text format output to a terminal for instrumentation profile.
(It cannot be portably tested. For sample profile, raw_fd_ostream is hidden deeply so it's inconvenient to add a diagnostic.)
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D104600
This patch was split from https://reviews.llvm.org/D102246
[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This is for llvm-profdata part of change. It sets the bit masks for the
profile reader in llvm-profdata. Also add an internal option
"-fs-discriminator-pass" for show and merge command to process the profile
offline.
This patch also moved setDiscriminatorMaskedBitFrom() to
SampleProfileReader::create() to simplify the interface.
Differential Revision: https://reviews.llvm.org/D103550
The change adds support for triming and merging cold context when mergine CSSPGO profiles using llvm-profdata. This is similar to the context profile trimming in llvm-profgen, however the flexibility to trim cold context after profile is generated can be useful.
Differential Revision: https://reviews.llvm.org/D100528
Encountered a crash while running a debug build, where this code path would be taken due to a mismatch in profile coverage data versions. Without consuming the error, an assert would be triggered inside the destructor of Error.
Differential Revision: https://reviews.llvm.org/D99457
This changes adds attribute field for metadata of context profile. Currently we have an inline attribute that indicates whether the leaf frame corresponding to a context profile was inlined in previous build.
This will be used to help estimating inlining and be taken into account when trimming context. Changes for that in llvm-profgen will follow. It will also help tuning.
Differential Revision: https://reviews.llvm.org/D98823
Context-sensitive AutoFDO profile has a different name scheme where full calling contexts are encoded as function names. When processing CS proifle, llvm-profdata should use full contexts instead of leaf function names.
Reviewed By: wmi, wenlei, wlei
Differential Revision: https://reviews.llvm.org/D97998
Dangling probes are the probes associated to an empty block. This usually happens when all real instructions are optimized away from the block. There is a problem with dangling probes during the offline counts processing. The way the sample profiler works is that samples collected on the first physical instruction following a probe will be counted towards the probe. This logically equals to treating the instruction next to a probe as if it is from the same block of the probe. In the dangling probe case, the real instruction following a dangling probe actually starts a new block, and samples collected on the new block may cause issues when counted towards the empty block.
To mitigate this issue, we first try to move around a dangling probe inside its owning block. If there are still native instructions preceding the probe in the same block, we can then use them as a place holder to collect samples for the probe. A pass is added to walk each block backwards looking for probes not followed by any real instruction and moving them before the first real instruction. This is done right before the object emission.
If we are unlucky to find such in-block preceding instructions for a probe, the solution we are taking is to tag such probe as dangling so that the samples reported for them will not be trusted by the compiler. We leave it up to the counts inference algorithm to get such probes a reasonable count. The number `UINT64_MAX` is used to mark sample count as collected for a dangling probe.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D95962
Under certain (currently unknown) conditions, llvm-profdata is outputting
profiles that have two consecutive entries in the MemOPSize section for the
value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch
instruction with two cases for 0. As mentioned, we’re not quite sure what’s
causing this to happen, but this patch prevents llvm-profdata from outputting a
profile that has this problem and gives an error with a request for a
reproducible.
Differential Revision: https://reviews.llvm.org/D92074
On z/OS, the following error message is not matched correctly in lit tests.
```
EDC5129I No such file or directory.
```
This patch uses a lit config substitution to check for platform specific error messages.
Reviewed By: muiez, jhenderson
Differential Revision: https://reviews.llvm.org/D95246
This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names.
Reviewed By: wmi, wenlei
Differential Revision: https://reviews.llvm.org/D95547
On z/OS, the following error message is not matched correctly in lit tests. This patch updates the CHECK expression to match successfully.
```
EDC5129I No such file or directory.
```
Reviewed By: muiez
Differential Revision: https://reviews.llvm.org/D94239
This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`.
Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites).
**Adjustment to profile format **
A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like
```
!CFGChecksum: 12345
```
is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums.
Differential Revision: https://reviews.llvm.org/D92347
MD5 is used.
Currently during sample profile loading, NameTable has to be loaded entirely
up front before any name string is retrieved. That is because NameTable is
stored using ULEB128 encoding and cannot be directly accessed like an array.
However, if MD5 is used to represent name in the NameTable, it has fixed
length. If MD5 names are stored in uint64_t type instead of ULEB128, NameTable
can be accessed like an array then in many cases only part of the NameTable
has to be read. This is helpful for reducing compile time especially when
small source file is compiled. We find that after this change, the elapsed
time to build a large application distributively is reduced by 5% and the
accumulative cpu time used for building is also reduced by 5%. The size of
the profile is slightly reduced with this change by ~0.2%, and that also
indicates encoding MD5 in ULEB128 doesn't save the storage space.
Differential Revision: https://reviews.llvm.org/D92621
llvm-profdata `show` and `overlap` will crash in `getFuncName` on compact binary profile. This change fixed this by switching to use `getName`.
`getFuncName` is misused in llvm-profdata. As showed below, `GUIDToFuncNameMap` is only supported in compilation mode, there is no initialization in llvm-profdata. Compact profile whose MD5 is true would try to query `GUIDToFuncNameMap` then caused the crash. So fix this by switching to `getName`
Reviewed By: MaskRay, wmi, wenlei, weihe, hoy
Differential Revision: https://reviews.llvm.org/D87740
Implemented the `llvm-profdata overlap` feature for sample profiles. It reports weighted //similarity// and unweighted //overlap// metrics at program and function level for two input profiles. Similarity metrics are symmetric with regards to the order of two input profiles. By default, the tool only reports program-level summary. Users can look into function-level details via additional options `--function`, `--similarity-cutoff`, and `--value-cutoff`.
The similarity metrics are designed as follows:
* Program-level summary
* Whole program profile similarity is an aggregate over function-level similarity `FS`: `PS = sum(FS(A) * avg_weight(A))` for all function `A`.
* Whole program sample overlap: `PSO = common_samples / total_samples`.
* Function overlap: `FO = #common_function / #total_function`.
* Hot-function overlap: `HFO = #common_hot_function / #total_hot_function`.
* Hot-block overlap: `HBO = #common_hot_block / #total_hot_block`.
* Function-level details
* Function-level similarity is an aggregate over line/block-level similarities `BS` of all sample lines/blocks in the function, weighted by the closeness of the function's weights in two profiles: `FS = sum(BS(i)) * (1 - weight_distance(A))`.
* Function-level sample overlap: `FSO = common_samples / total_samples` for samples in the function.
Reviewed By: wenlei, hoyFB, wmi
Differential Revision: https://reviews.llvm.org/D83852
PGO profile is usually more precise than sample profile. However, PGO profile
needs to be collected from loadtest and loadtest may not be representative
enough to the production workload. Sample profile collected from production
can be used as a supplement -- for functions cold in loadtest but warm/hot
in production, we can scale up the related function in PGO profile if the
function is warm or hot in sample profile.
The implementation contains changes in compiler side and llvm-profdata side.
Given an instr profile and a sample profile, for a function cold in PGO
profile but warm/hot in sample profile, llvm-profdata will either mark
all the counters in the profile to be -1 or scale up the max count in the
function to be above hot threshold, depending on the zero counter ratio in
the profile. The assumption is if there are too many counters being zero
in the function profile, the profile is more likely to cause harm than good,
then llvm-profdata will mark all the counters to be -1 indicating the
function is hot but the profile is unaccountable. In compiler side, if a
function profile with all -1 counters is seen, the function entry count will
be set to be above hot threshold but its internal profile will be dropped.
In the long run, it may be useful to let compiler support using PGO profile
and sample profile at the same time, but that requires more careful design
and more substantial changes to make two profiles work seamlessly. The patch
here serves as a simple intermediate solution.
Differential Revision: https://reviews.llvm.org/D81981
This patch includes the supporting code that enables always
instrumenting the function entry block by default.
This patch will NOT the default behavior.
It adds a variant bit in the profile version, adds new directives in
text profile format, and changes llvm-profdata tool accordingly.
This patch is a split of D83024 (https://reviews.llvm.org/D83024)
Many test changes from D83024 are also included.
Differential Revision: https://reviews.llvm.org/D84261
Summary:
Add the --hot-func-list feature to llvm-profdata show for sample profiles. This feature prints a list of hot functions whose max sample count are above the 99% threshold, with their numbers of total samples, total samples percentage, max samples, entry samples, and their function names.
Test Plan:
Reviewers: wenlei, hoyFB
Reviewed By: wenlei, hoyFB
Subscribers: hoyFB, wenlei, weihe, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82355
Summary: Add the --hot-func-list feature to llvm-profdata show for sample profiles. This feature prints a list of hot functions whose max sample count are above the 99% threshold, with their numbers of total samples, total samples percentage, max samples, entry samples, and their function names.
Reviewers: wmi, hoyFB, wenlei
Reviewed By: wmi
Subscribers: hoyFB, wenlei, llvm-commits, weihe
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81800
Summary:
According to the comments, we want to convert the profile into two binary formats, and then into the md5text format.
We seems to have ignored the intermediate files.
This patch uses them to complete the full roundtrips.
Reviewers: wmi, wenlei
Reviewed By: wmi
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81202
The internal flag -partial-profile in llvm conflicts with the flag with
the same name in llvm-profdata. The conflict happens in builds with
LLVM_LINK_LLVM_DYLIB enabled. In this case the tools are linked with libLLVM
and we end up with two definitions for the same cl::opt.
The patch renames llvm-profdata flag -partial-profile to -gen-partial-profile.
Summary: Add -detailed-summary support for sample profile dump to match that of instrumentation profile.
Reviewers: wmi, davidxl, hoyFB
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79291
There is no need to use `--check-prefix` multiple times.
It helps to improve readability/test maintainability.
This patch does it for all tools at once.
Differential revision: https://reviews.llvm.org/D78217
Fix the error of show-prof-info.test on some platforms without zlib.
The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them.
Differential Revision: https://reviews.llvm.org/D77426
The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them.
Differential Revision: https://reviews.llvm.org/D77426