perf tools: Document --children option in more detail
As the --children option changes the output of perf report (and perf top) it sometimes confuses users. Add more words and examples to help understanding of the option's behavior - and how to disable it ;-). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Taeung Song <treeze.taeung@gmail.com> Link: http://lkml.kernel.org/r/1429684425-14987-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This commit is contained in:
parent
c4fa0d9c1e
commit
dd3092075c
|
@ -0,0 +1,108 @@
|
|||
Overhead calculation
|
||||
--------------------
|
||||
The overhead can be shown in two columns as 'Children' and 'Self' when
|
||||
perf collects callchains. The 'self' overhead is simply calculated by
|
||||
adding all period values of the entry - usually a function (symbol).
|
||||
This is the value that perf shows traditionally and sum of all the
|
||||
'self' overhead values should be 100%.
|
||||
|
||||
The 'children' overhead is calculated by adding all period values of
|
||||
the child functions so that it can show the total overhead of the
|
||||
higher level functions even if they don't directly execute much.
|
||||
'Children' here means functions that are called from another (parent)
|
||||
function.
|
||||
|
||||
It might be confusing that the sum of all the 'children' overhead
|
||||
values exceeds 100% since each of them is already an accumulation of
|
||||
'self' overhead of its child functions. But with this enabled, users
|
||||
can find which function has the most overhead even if samples are
|
||||
spread over the children.
|
||||
|
||||
Consider the following example; there are three functions like below.
|
||||
|
||||
-----------------------
|
||||
void foo(void) {
|
||||
/* do something */
|
||||
}
|
||||
|
||||
void bar(void) {
|
||||
/* do something */
|
||||
foo();
|
||||
}
|
||||
|
||||
int main(void) {
|
||||
bar()
|
||||
return 0;
|
||||
}
|
||||
-----------------------
|
||||
|
||||
In this case 'foo' is a child of 'bar', and 'bar' is an immediate
|
||||
child of 'main' so 'foo' also is a child of 'main'. In other words,
|
||||
'main' is a parent of 'foo' and 'bar', and 'bar' is a parent of 'foo'.
|
||||
|
||||
Suppose all samples are recorded in 'foo' and 'bar' only. When it's
|
||||
recorded with callchains the output will show something like below
|
||||
in the usual (self-overhead-only) output of perf report:
|
||||
|
||||
----------------------------------
|
||||
Overhead Symbol
|
||||
........ .....................
|
||||
60.00% foo
|
||||
|
|
||||
--- foo
|
||||
bar
|
||||
main
|
||||
__libc_start_main
|
||||
|
||||
40.00% bar
|
||||
|
|
||||
--- bar
|
||||
main
|
||||
__libc_start_main
|
||||
----------------------------------
|
||||
|
||||
When the --children option is enabled, the 'self' overhead values of
|
||||
child functions (i.e. 'foo' and 'bar') are added to the parents to
|
||||
calculate the 'children' overhead. In this case the report could be
|
||||
displayed as:
|
||||
|
||||
-------------------------------------------
|
||||
Children Self Symbol
|
||||
........ ........ ....................
|
||||
100.00% 0.00% __libc_start_main
|
||||
|
|
||||
--- __libc_start_main
|
||||
|
||||
100.00% 0.00% main
|
||||
|
|
||||
--- main
|
||||
__libc_start_main
|
||||
|
||||
100.00% 40.00% bar
|
||||
|
|
||||
--- bar
|
||||
main
|
||||
__libc_start_main
|
||||
|
||||
60.00% 60.00% foo
|
||||
|
|
||||
--- foo
|
||||
bar
|
||||
main
|
||||
__libc_start_main
|
||||
-------------------------------------------
|
||||
|
||||
In the above output, the 'self' overhead of 'foo' (60%) was add to the
|
||||
'children' overhead of 'bar', 'main' and '\_\_libc_start_main'.
|
||||
Likewise, the 'self' overhead of 'bar' (40%) was added to the
|
||||
'children' overhead of 'main' and '\_\_libc_start_main'.
|
||||
|
||||
So '\_\_libc_start_main' and 'main' are shown first since they have
|
||||
same (100%) 'children' overhead (even though they have zero 'self'
|
||||
overhead) and they are the parents of 'foo' and 'bar'.
|
||||
|
||||
Since v3.16 the 'children' overhead is shown by default and the output
|
||||
is sorted by its values. The 'children' overhead is disabled by
|
||||
specifying --no-children option on the command line or by adding
|
||||
'report.children = false' or 'top.children = false' in the perf config
|
||||
file.
|
|
@ -193,6 +193,7 @@ OPTIONS
|
|||
Accumulate callchain of children to parent entry so that then can
|
||||
show up in the output. The output will have a new "Children" column
|
||||
and will be sorted on the data. It requires callchains are recorded.
|
||||
See the `overhead calculation' section for more details.
|
||||
|
||||
--max-stack::
|
||||
Set the stack depth limit when parsing the callchain, anything
|
||||
|
@ -323,6 +324,9 @@ OPTIONS
|
|||
--header-only::
|
||||
Show only perf.data header (forces --stdio).
|
||||
|
||||
|
||||
include::callchain-overhead-calculation.txt[]
|
||||
|
||||
SEE ALSO
|
||||
--------
|
||||
linkperf:perf-stat[1], linkperf:perf-annotate[1]
|
||||
|
|
|
@ -168,7 +168,7 @@ Default is to monitor all CPUS.
|
|||
Accumulate callchain of children to parent entry so that then can
|
||||
show up in the output. The output will have a new "Children" column
|
||||
and will be sorted on the data. It requires -g/--call-graph option
|
||||
enabled.
|
||||
enabled. See the `overhead calculation' section for more details.
|
||||
|
||||
--max-stack::
|
||||
Set the stack depth limit when parsing the callchain, anything
|
||||
|
@ -234,6 +234,7 @@ INTERACTIVE PROMPTING KEYS
|
|||
|
||||
Pressing any unmapped key displays a menu, and prompts for input.
|
||||
|
||||
include::callchain-overhead-calculation.txt[]
|
||||
|
||||
SEE ALSO
|
||||
--------
|
||||
|
|
Loading…
Reference in New Issue