OpenCloudOS-Kernel/tools/perf
Andi Kleen 8b7bad58ef perf callchain: Support handling complete branch stacks as histograms
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.

This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.

This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.

Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:

tcall.c:

volatile a = 10000, b = 100000, c;

__attribute__((noinline)) f2()
{
	c = a / b;
}

__attribute__((noinline)) f1()
{
	f2();
	f2();
}
main()
{
	int i;
	for (i = 0; i < 1000000; i++)
		f1();
}

% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
    54.91%  tcall.c:6  [.] f2                      tcall
            |
            |--65.53%-- f2 tcall.c:5
            |          |
            |          |--70.83%-- f1 tcall.c:11
            |          |          f1 tcall.c:10
            |          |          main tcall.c:18
            |          |          main tcall.c:18
            |          |          main tcall.c:17
            |          |          main tcall.c:17
            |          |          f1 tcall.c:13
            |          |          f1 tcall.c:13
            |          |          f2 tcall.c:7
            |          |          f2 tcall.c:5
            |          |          f1 tcall.c:12
            |          |          f1 tcall.c:12
            |          |          f2 tcall.c:7
            |          |          f2 tcall.c:5
            |          |          f1 tcall.c:11
            |          |
            |           --29.17%-- f1 tcall.c:12
            |                     f1 tcall.c:12
            |                     f2 tcall.c:7
            |                     f2 tcall.c:5
            |                     f1 tcall.c:11
            |                     f1 tcall.c:10
            |                     main tcall.c:18
            |                     main tcall.c:18
            |                     main tcall.c:17
            |                     main tcall.c:17
            |                     f1 tcall.c:13
            |                     f1 tcall.c:13
            |                     f2 tcall.c:7
            |                     f2 tcall.c:5
            |                     f1 tcall.c:12

The default output is unchanged.

This is only implemented in perf report, no change to record or anywhere
else.

This adds the basic code to report:

- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.

The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.

- detect overlaps with the callchain
- remove small loop duplicates in the LBR

Current limitations:

- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)

v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
    -b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
    patch. Skip initial entries in callchain. Minor cleanups.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-12-01 20:00:31 -03:00
..
Documentation perf callchain: Support handling complete branch stacks as histograms 2014-12-01 20:00:31 -03:00
arch perf tools: A thread's machine can be found via thread->mg->machine 2014-10-29 10:32:46 -02:00
bench perf bench futex: Sanitize -q option in requeue 2014-09-29 15:43:26 -03:00
config perf tools: Clean up libelf feature support code 2014-11-19 12:33:46 -03:00
python perf python: Remove duplicate TID bit from mask 2013-08-07 17:35:25 -03:00
scripts perf tools: Add call information to Python export 2014-11-03 18:10:06 -03:00
tests perf test: fix typo in python test 2014-11-19 12:33:47 -03:00
ui perf tools: Collapse first level callchain entry if it has sibling 2014-11-24 11:34:33 -03:00
util perf callchain: Support handling complete branch stacks as histograms 2014-12-01 20:00:31 -03:00
.gitignore perf tools: Add perf-read-vdso32 and perf-read-vdsox32 to .gitignore 2014-11-19 12:34:24 -03:00
CREDITS perf_counter tools: Add CREDITS file for Git contributors 2009-06-24 19:54:29 +02:00
MANIFEST perf kvm: Add stat support on s390 2014-07-16 17:57:33 -03:00
Makefile perf tools: Add 'build-test' make target 2014-01-16 16:26:26 -03:00
Makefile.perf perf tools: Clean up libelf feature support code 2014-11-19 12:33:46 -03:00
builtin-annotate.c perf tools: Remove hists from evsel 2014-10-14 17:32:52 -03:00
builtin-bench.c perf bench: Add --repeat option 2014-06-19 16:13:15 -03:00
builtin-buildid-cache.c perf buildid-cache: Use strerror_r instead of strerror 2014-08-15 13:07:59 -03:00
builtin-buildid-list.c perf session: Separating data file properties from session 2013-10-21 17:33:25 -03:00
builtin-diff.c perf diff: Add missing handler for PERF_RECORD_MMAP2 events 2014-11-19 12:33:48 -03:00
builtin-evlist.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-help.c perf help: Use strerror_r instead of strerror 2014-08-15 13:08:26 -03:00
builtin-inject.c perf tools: Add id index 2014-10-29 11:24:47 -02:00
builtin-kmem.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-kvm.c perf kvm: Print kvm specific --help output 2014-10-29 10:32:47 -02:00
builtin-list.c perf list: Add usage 2013-11-05 14:26:41 -03:00
builtin-lock.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-mem.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-probe.c perf probe: Add --quiet option to suppress output result message 2014-10-29 10:32:49 -02:00
builtin-record.c perf record: Add new -I option to sample interrupted machine state 2014-11-16 11:42:02 +01:00
builtin-report.c perf callchain: Support handling complete branch stacks as histograms 2014-12-01 20:00:31 -03:00
builtin-sched.c perf sched: Stop updating hists stats, not used 2014-10-09 11:46:35 -03:00
builtin-script.c perf tools: Export usage string and option table of perf record 2014-10-29 10:32:47 -02:00
builtin-stat.c perf stat: Add support for snapshot counters 2014-12-01 20:00:31 -03:00
builtin-timechart.c perf tools: Export usage string and option table of perf record 2014-10-29 10:32:47 -02:00
builtin-top.c perf tools: Remove hists from evsel 2014-10-14 17:32:52 -03:00
builtin-trace.c perf tools: A thread's machine can be found via thread->mg->machine 2014-10-29 10:32:46 -02:00
builtin.h perf tools: Add new mem command for memory access profiling 2013-04-01 12:21:44 -03:00
command-list.txt perf tools: Add new mem command for memory access profiling 2013-04-01 12:21:44 -03:00
design.txt perf tools: Update some code references in design.txt 2014-03-18 18:17:06 -03:00
perf-archive.sh perf archive: Make 'f' the last parameter for tar 2012-09-17 13:10:42 -03:00
perf-completion.sh perf sched: Introduce --list-cmds for use by scripts 2014-04-16 17:16:05 +02:00
perf-read-vdso.c perf tools: Build programs to copy 32-bit compatibility 2014-10-29 10:32:48 -02:00
perf-sys.h perf tools: Make CPUINFO_PROC an array to support different kernel versions 2014-10-29 10:27:36 -02:00
perf-with-kcore.sh perf tools: Add perf-with-kcore script 2014-09-17 17:08:08 -03:00
perf.c perf: Use strerror_r instead of strerror 2014-08-15 10:54:29 -03:00
perf.h perf tools: Add core support for sampling intr machine state regs 2014-11-16 11:41:59 +01:00