llvm-project/llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

75 lines
2.8 KiB
Plaintext
Raw Normal View History

; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --skip-symbolization --profile-summary-cold-count=0 --use-offset=0
[llvm-profgen] Support LBR only perf script This change aims at supporting LBR only sample perf script which is used for regular(Non-CS) profile generation. A LBR perf script includes a batch of LBR sample which starts with a frame pointer and a group of 32 LBR entries is followed. The FROM/TO LBR pair and the range between two consecutive entries (the former entry's TO and the latter entry's FROM) will be used to infer function profile info. An example of LBR perf script(created by `perf script -F ip,brstack -i perf.data`) ``` 40062f 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 ... 4005d7 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 ... ... ``` For implementation: - Extended a new child class `LBRPerfReader` for the sample parsing, reused all the functionalities in `extractLBRStack` except for an extension to parsing leading instruction pointer. - `HybridSample` is reused(just leave the call stack empty) and the parsed samples is still aggregated in `AggregatedSamples`. After that, range samples, branch sample, address samples are computed and recorded. - Reused `ContextSampleCounterMap` to store the raw profile, since it's no need to aggregation by context, here it just registered one sample counter with a fake context key. - Unified to use `show-raw-profile` instead of `show-unwinder-output` to dump the intermediate raw profile, see the comments of the format of the raw profile. For CS profile, it remains to output the unwinder output. Profile generation part will come soon. Differential Revision: https://reviews.llvm.org/D108153
2021-09-01 04:27:42 +08:00
; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-UNWINDER
; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --skip-symbolization --profile-summary-cold-count=0 --use-offset=1
; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-UNWINDER-OFFSET
; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --skip-symbolization --profile-summary-cold-count=0 --use-offset=1 --use-first-loadable-segment-as-base=1
; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-UNWINDER-OFFSET2
; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --profile-summary-cold-count=0 --csspgo-preinliner=0 --gen-cs-nested-profile=0
2021-01-12 01:08:39 +08:00
; RUN: FileCheck %s --input-file %t
; CHECK: [main:2 @ foo]:74:0
[CSSPGO] Report zero-count probe in profile instead of dangling probes. Previously dangling samples were represented by INT64_MAX in sample profile while probes never executed were not reported. This was based on an observation that dangling probes were only at a smaller portion than zero-count probes. However, with compiler optimizations, dangling probes end up becoming at large portion of all probes in general and reporting them does not make sense from profile size point of view. This change flips sample reporting by reporting zero-count probes instead. This enabled dangling probe to be represented by none (missing entry in profile). This has a couple benefits: 1. Reducing sample profile size in optimize mode, even when the number of non-executed probes outperform the number of dangling probes, since INT64_MAX takes more space over 0 to encode. 2. Binary size savings. No need to encode dangling probe anymore, since missing probes are treated as dangling in the profile reader. 3. Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes. 4. Improving counts quality by respecting the counts already collected on the non-dangling copy of a probe. A probe, when duplicated, gets two copies at runtime. If one of them is dangling while the other is not, merging the two probes at profile generation time will cause the real samples collected on the non-dangling one to be discarded. Not reporting the dangling counterpart will keep the real samples. 5. Better readability. 6. Be consistent with non-CS dwarf line number based profile. Zero counts are trusted by the compiler counts inferencer while missing counts will be inferred by the compiler. Note that the current patch does include any work for #3. There will be follow-up changes. For #1, I've seen for a large Facebook service, the text profile is reduced by 7%. For extbinary profile, the size of LBRProfileSection is reduced by 35%. For #4, I have seen general counts quality for SPEC2017 is improved by 10%. Reviewed By: wenlei, wlei, wmi Differential Revision: https://reviews.llvm.org/D104129
2021-06-16 06:59:06 +08:00
; CHECK-NEXT: 1: 0
2021-01-12 01:08:39 +08:00
; CHECK-NEXT: 2: 15
; CHECK-NEXT: 3: 15
; CHECK-NEXT: 4: 14
; CHECK-NEXT: 5: 1
; CHECK-NEXT: 6: 15
[CSSPGO] Report zero-count probe in profile instead of dangling probes. Previously dangling samples were represented by INT64_MAX in sample profile while probes never executed were not reported. This was based on an observation that dangling probes were only at a smaller portion than zero-count probes. However, with compiler optimizations, dangling probes end up becoming at large portion of all probes in general and reporting them does not make sense from profile size point of view. This change flips sample reporting by reporting zero-count probes instead. This enabled dangling probe to be represented by none (missing entry in profile). This has a couple benefits: 1. Reducing sample profile size in optimize mode, even when the number of non-executed probes outperform the number of dangling probes, since INT64_MAX takes more space over 0 to encode. 2. Binary size savings. No need to encode dangling probe anymore, since missing probes are treated as dangling in the profile reader. 3. Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes. 4. Improving counts quality by respecting the counts already collected on the non-dangling copy of a probe. A probe, when duplicated, gets two copies at runtime. If one of them is dangling while the other is not, merging the two probes at profile generation time will cause the real samples collected on the non-dangling one to be discarded. Not reporting the dangling counterpart will keep the real samples. 5. Better readability. 6. Be consistent with non-CS dwarf line number based profile. Zero counts are trusted by the compiler counts inferencer while missing counts will be inferred by the compiler. Note that the current patch does include any work for #3. There will be follow-up changes. For #1, I've seen for a large Facebook service, the text profile is reduced by 7%. For extbinary profile, the size of LBRProfileSection is reduced by 35%. For #4, I have seen general counts quality for SPEC2017 is improved by 10%. Reviewed By: wenlei, wlei, wmi Differential Revision: https://reviews.llvm.org/D104129
2021-06-16 06:59:06 +08:00
; CHECK-NEXT: 7: 0
2021-01-12 01:08:39 +08:00
; CHECK-NEXT: 8: 14 bar:14
[CSSPGO] Report zero-count probe in profile instead of dangling probes. Previously dangling samples were represented by INT64_MAX in sample profile while probes never executed were not reported. This was based on an observation that dangling probes were only at a smaller portion than zero-count probes. However, with compiler optimizations, dangling probes end up becoming at large portion of all probes in general and reporting them does not make sense from profile size point of view. This change flips sample reporting by reporting zero-count probes instead. This enabled dangling probe to be represented by none (missing entry in profile). This has a couple benefits: 1. Reducing sample profile size in optimize mode, even when the number of non-executed probes outperform the number of dangling probes, since INT64_MAX takes more space over 0 to encode. 2. Binary size savings. No need to encode dangling probe anymore, since missing probes are treated as dangling in the profile reader. 3. Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes. 4. Improving counts quality by respecting the counts already collected on the non-dangling copy of a probe. A probe, when duplicated, gets two copies at runtime. If one of them is dangling while the other is not, merging the two probes at profile generation time will cause the real samples collected on the non-dangling one to be discarded. Not reporting the dangling counterpart will keep the real samples. 5. Better readability. 6. Be consistent with non-CS dwarf line number based profile. Zero counts are trusted by the compiler counts inferencer while missing counts will be inferred by the compiler. Note that the current patch does include any work for #3. There will be follow-up changes. For #1, I've seen for a large Facebook service, the text profile is reduced by 7%. For extbinary profile, the size of LBRProfileSection is reduced by 35%. For #4, I have seen general counts quality for SPEC2017 is improved by 10%. Reviewed By: wenlei, wlei, wmi Differential Revision: https://reviews.llvm.org/D104129
2021-06-16 06:59:06 +08:00
; CHECK-NEXT: 9: 0
; CHECK-NEXT: !CFGChecksum: 563088904013236
; CHECK:[main:2 @ foo:8 @ bar]:28:14
2021-01-12 01:08:39 +08:00
; CHECK-NEXT: 1: 14
; CHECK-NEXT: 4: 14
; CHECK-NEXT: !CFGChecksum: 72617220756
; CHECK-UNWINDER: 3
; CHECK-UNWINDER-NEXT: 201800-201858:1
; CHECK-UNWINDER-NEXT: 20180e-20182b:1
; CHECK-UNWINDER-NEXT: 20180e-201858:13
; CHECK-UNWINDER-NEXT: 2
; CHECK-UNWINDER-NEXT: 20182b->201800:1
; CHECK-UNWINDER-NEXT: 201858->20180e:15
; CHECK-UNWINDER-OFFSET: 3
; CHECK-UNWINDER-OFFSET-NEXT: 800-858:1
; CHECK-UNWINDER-OFFSET-NEXT: 80e-82b:1
; CHECK-UNWINDER-OFFSET-NEXT: 80e-858:13
; CHECK-UNWINDER-OFFSET-NEXT: 2
; CHECK-UNWINDER-OFFSET-NEXT: 82b->800:1
; CHECK-UNWINDER-OFFSET-NEXT: 858->80e:15
; CHECK-UNWINDER-OFFSET2: 3
; CHECK-UNWINDER-OFFSET2-NEXT: 1800-1858:1
; CHECK-UNWINDER-OFFSET2-NEXT: 180e-182b:1
; CHECK-UNWINDER-OFFSET2-NEXT: 180e-1858:13
; CHECK-UNWINDER-OFFSET2-NEXT: 2
; CHECK-UNWINDER-OFFSET2-NEXT: 182b->1800:1
; CHECK-UNWINDER-OFFSET2-NEXT: 1858->180e:15
; clang -O3 -fexperimental-new-pass-manager -fuse-ld=lld -fpseudo-probe-for-profiling
; -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Xclang -mdisable-tail-calls
; -g test.c -o a.out
#include <stdio.h>
int bar(int x, int y) {
if (x % 3) {
return x - y;
}
return x + y;
}
void foo() {
int s, i = 0;
while (i++ < 4000 * 4000)
if (i % 91) s = bar(i, s); else s += 30;
printf("sum is %d\n", s);
}
int main() {
foo();
return 0;
}