Commit Graph

280 Commits

Author SHA1 Message Date
German Gomez ea15483e7c perf report: Add 'simd' sort field
Add 'simd' sort field to visualize SIMD ops in 'perf report'.

Rows are labeled with the SIMD ISA, and the type of predicate (if any):

  - [p] partial predicate
  - [e] empty predicate (no elements in the vector being used)

Example with Arm SPE and SVE (Scalable Vector Extension):

  #include <arm_sve.h>

  double src[1025], dst[1025];

  int main(void) {
    svfloat64_t vc = svdup_f64(1);
    for(;;)
      for(int i = 0; i < 1025; i += svcntd())
      {
        svbool_t pg = svwhilelt_b64(i, 1025);
        svfloat64_t vsrc = svld1(pg, &src[i]);
        svfloat64_t vdst = svadd_x(pg, vsrc, vc);
        svst1(pg, &dst[i], vdst);
      }
    return 0;
  }

  ... compiled using "gcc-11 -march=armv8-a+sve -O3"

Profiling on a platform that implements FEAT_SVE and FEAT_SPEv1p1:

  $ perf record -e arm_spe_0// -- ./a.out
  $ perf report --itrace=i1i -s overhead,pid,simd,sym

  Overhead      Pid:Command   Simd     Symbol
  ........  ................  .......  ......................

    53.76%    10758:program            [.] main
    46.14%    10758:program   [.] SVE  [.] main
     0.09%    10758:program   [p] SVE  [.] main

The report shows 0.09% of the sampled SVE operations use partial
predicates due to src and dst arrays not being multiples of the vector
register lengths.

Signed-off-by: German Gomez <german.gomez@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman.Khandual@arm.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20 19:28:21 -03:00
Leo Yan ebf39d29b9 perf hist: Add 'kvm_info' field in histograms entry
__hists__add_entry() creates a temporary entry and compare it with
existed histograms entries, if any existed entry equals to the
temporary entry it skips to allocation to avoid duplication.

The problem for support KVM event in histograms is it doesn't contain
any info to identify KVM event and can be used for comparison entries.

This patch adds 'kvm_info' field in the histograms entry which contains
the KVM event's key, this identifier will be used for comparison
histograms entries in later change.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-15 16:47:20 -03:00
Namhyung Kim cb6e92c764 perf hist: Add perf_hpp_fmt->init() callback
In __hists__insert_output_entry(), it calls fmt->sort() for dynamic
entries with NULL to update column width for tracepoint fields.
But it's a hacky abuse of the sort callback, better to have a proper
callback for that.  I'll add more use cases later.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221215192817.2734573-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-21 14:52:40 -03:00
Namhyung Kim 762461f1a5 perf tools: Add 'addr' sort key
Sometimes users want to see actual (virtual) address of sampled instructions.
Add a new 'addr' sort key to display the raw addresses.

  $ perf record -o- true | perf report -i- -s addr
  # To display the perf.data header info, please use --header/--header-only options.
  #
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.000 MB - ]
  #
  # Total Lost Samples: 0
  #
  # Samples: 12  of event 'cycles:u'
  # Event count (approx.): 252512
  #
  # Overhead  Address
  # ........  ..................
  #
      42.96%  0x7f96f08443d7
      29.55%  0x7f96f0859b50
      14.76%  0x7f96f0852e02
       8.30%  0x7f96f0855028
       4.43%  0xffffffff8de01087

Note that it just compares and displays the sample ip.  Each process can
have a different memory layout and the ip will be different even if they run
the same binary.  So this sort key is mostly meaningful for per-process
profile data.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220923173142.805896-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04 08:55:22 -03:00
Namhyung Kim 75b37db096 perf hist: Add nr_lost_samples to hist_stats
This is a preparation to display accurate lost sample counts for
each evsel.

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220901195739.668604-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04 08:55:20 -03:00
Ian Rogers 8e03bb88ab perf hist: Update use of pthread mutex
Switch to the use of mutex wrappers that provide better error checking.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Truong <alexandre.truong@arm.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Colin Ian King <colin.king@intel.com>
Cc: Dario Petrillo <dario.pk1@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Fangrui Song <maskray@google.com>
Cc: Hewenliang <hewenliang4@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jason Wang <wangborong@cdjrlc.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Pavithra Gurushankar <gpavithrasha@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Remi Bernon <rbernon@codeweavers.com>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Weiguo Li <liwg06@foxmail.com>
Cc: Wenyu Liu <liuwenyu7@huawei.com>
Cc: William Cohen <wcohen@redhat.com>
Cc: Zechuan Chen <chenzechuan1@huawei.com>
Cc: bpf@vger.kernel.org
Cc: llvm@lists.linux.dev
Cc: yaowenbin <yaowenbin1@huawei.com>
Link: https://lore.kernel.org/r/20220826164242.43412-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04 08:55:19 -03:00
Stephane Eranian 052747700e perf report: Add "addr_from" and "addr_to" sort dimensions
With the existing symbol_from/symbol_to, branches captured in the same
function would be collapsed into a single function if the latencies
associated with the each branch (cycles) were all the same.  That is the
case on Intel Broadwell, for instance. Since Intel Skylake, the latency
is captured by hardware and therefore is used to disambiguate branches.

Add addr_from/addr_to sort dimensions to sort branches based on their
addresses and not the function there are in. The output is still the
function name but the offset within the function is provided to uniquely
identify each branch.  These new sort dimensions also help with annotate
because they create different entries in the histogram which, in turn,
generates proper branch annotations.

Here is an example using AMD's branch sampling:

  $ perf record -a -b -c 1000037 -e cpu/branch-brs/ test_prg

  $ perf report
  Samples: 6M of event 'cpu/branch-brs/', Event count (approx.): 6901276
  Overhead  Command          Source Shared Object  Source Symbol                                   Target Symbol                                   Basic Block Cycle
    99.65%  test_prg	   test_prg              [.] test_thread                                 [.] test_thread                                 -
     0.02%  test_prg         [kernel.vmlinux]      [k] asm_sysvec_apic_timer_interrupt             [k] error_entry                                 -

  $ perf report -F overhead,comm,dso,addr_from,addr_to
  Samples: 6M of event 'cpu/branch-brs/', Event count (approx.): 6901276
  Overhead  Command          Shared Object     Source Address          Target Address
     4.22%  test_prg         test_prg          [.] test_thread+0x3c    [.] test_thread+0x4
     4.13%  test_prg         test_prg          [.] test_thread+0x4     [.] test_thread+0x3a
     4.09%  test_prg         test_prg          [.] test_thread+0x3a    [.] test_thread+0x6
     4.08%  test_prg         test_prg          [.] test_thread+0x2     [.] test_thread+0x3c
     4.06%  test_prg         test_prg          [.] test_thread+0x3e    [.] test_thread+0x2
     3.87%  test_prg         test_prg          [.] test_thread+0x6     [.] test_thread+0x38
     3.84%  test_prg         test_prg          [.] test_thread         [.] test_thread+0x3e
     3.76%  test_prg         test_prg          [.] test_thread+0x1e    [.] test_thread
     3.76%  test_prg         test_prg          [.] test_thread+0x38    [.] test_thread+0x8
     3.56%  test_prg         test_prg          [.] test_thread+0x22    [.] test_thread+0x1e
     3.54%  test_prg         test_prg          [.] test_thread+0x8     [.] test_thread+0x36
     3.47%  test_prg         test_prg          [.] test_thread+0x1c    [.] test_thread+0x22
     3.45%  test_prg         test_prg          [.] test_thread+0x36    [.] test_thread+0xa
     3.28%  test_prg         test_prg          [.] test_thread+0x24    [.] test_thread+0x1c
     3.25%  test_prg         test_prg          [.] test_thread+0xa     [.] test_thread+0x34
     3.24%  test_prg         test_prg          [.] test_thread+0x1a    [.] test_thread+0x24
     3.20%  test_prg         test_prg          [.] test_thread+0x34    [.] test_thread+0xc
     3.04%  test_prg         test_prg          [.] test_thread+0x26    [.] test_thread+0x1a
     3.01%  test_prg         test_prg          [.] test_thread+0xc     [.] test_thread+0x32
     2.98%  test_prg         test_prg          [.] test_thread+0x18    [.] test_thread+0x26
     2.94%  test_prg         test_prg          [.] test_thread+0x32    [.] test_thread+0xe
     2.76%  test_prg         test_prg          [.] test_thread+0x28    [.] test_thread+0x18
     2.73%  test_prg         test_prg          [.] test_thread+0xe     [.] test_thread+0x30
     2.67%  test_prg         test_prg          [.] test_thread+0x30    [.] test_thread+0x10
     2.67%  test_prg         test_prg          [.] test_thread+0x16    [.] test_thread+0x28
     2.46%  test_prg         test_prg          [.] test_thread+0x10    [.] test_thread+0x2e
     2.44%  test_prg         test_prg          [.] test_thread+0x2a    [.] test_thread+0x16
     2.38%  test_prg         test_prg          [.] test_thread+0x14    [.] test_thread+0x2a
     2.32%  test_prg         test_prg          [.] test_thread+0x2e    [.] test_thread+0x12
     2.28%  test_prg         test_prg          [.] test_thread+0x12    [.] test_thread+0x2c
     2.16%  test_prg         test_prg          [.] test_thread+0x2c    [.] test_thread+0x14
     0.02%  test_prg         [kernel.vmlinux]  [k] asm_sysvec_apic_ti+0x5  [k] error_entry

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20220208211637.2221872-13-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-16 11:21:22 -03:00
Athira Rajeev e3304c2135 perf sort: Include global and local variants for p_stage_cyc sort key
Sort key 'p_stage_cyc' is used to present the latency cycles spent in
pipeline stages.

perf has local 'p_stage_cyc' sort key to display this info. There is no
global variant available for this sort key. The local variant shows
latency in a single sample, whereas the global value will be useful to
present the total latency (sum of latencies) in the hist entry. It
represents the latency number multiplied by the number of samples.

Add global ('p_stage_cyc') and local variant ('local_p_stage_cyc') for
this sort key. Use 'local_p_stage_cyc' as default option for "mem" sort
mode.

Also add this to the list of dynamic sort keys and made the
"dynamic_headers" and "arch_specific_sort_keys" as static.

Reported-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20211203022038.48240-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-10 15:39:00 -03:00
Ian Rogers 0ca1f534a7 perf hist: Fix memory leak of a perf_hpp_fmt
perf_hpp__column_unregister() removes an entry from a list but doesn't
free the memory causing a memory leak spotted by leak sanitizer.

Add the free while at the same time reducing the scope of the function
to static.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211118071247.2140392-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-18 10:16:56 -03:00
Namhyung Kim 2775de0b11 perf report: Add --skip-empty option to suppress 0 event stat
To make the output more readable, I think it's better to remove 0's in
the output.  Also the dummy event has no event stats so it just wasts
the space.  Let's use the --skip-empty option to suppress it.

  $ perf report --stat --skip-empty

  Aggregated stats:
             TOTAL events:      16530
              MMAP events:        226
              COMM events:       1596
              EXIT events:          2
          THROTTLE events:        121
        UNTHROTTLE events:        117
              FORK events:       1595
            SAMPLE events:        719
             MMAP2 events:      12147
            CGROUP events:          2
    FINISHED_ROUND events:          2
        THREAD_MAP events:          1
           CPU_MAP events:          1
         TIME_CONV events:          1
  cycles stats:
            SAMPLE events:        719

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210427013717.1651674-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-29 10:30:59 -03:00
Namhyung Kim 0f0abbace3 perf hists: Split hists_stats from events_stats
Each struct hists have events_stats but most of the fields were not
used.  It's to count number of samples and periods whether filtered or
not.  And other fields are used only by evlist.

So it'd be better to split hists_stats and events_stats to reduce
wasted memory in the struct hists.  This makes the output of event
statistics in the perf report compact by skipping 0 events in each
evsel/hists.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210427013717.1651674-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-29 10:30:58 -03:00
Athira Rajeev 06e5ca746c perf tools: Support pipeline stage cycles for powerpc
The pipeline stage cycles details can be recorded on powerpc from the
contents of Performance Monitor Unit (PMU) registers. On ISA v3.1
platform, sampling registers exposes the cycles spent in different
pipeline stages. Patch adds perf tools support to present two of the
cycle counter information along with memory latency (weight).

Re-use the field 'ins_lat' for storing the first pipeline stage cycle.
This is stored in 'var2_w' field of 'perf_sample_weight'.

Add a new field 'p_stage_cyc' to store the second pipeline stage cycle
which is stored in 'var3_w' field of perf_sample_weight.

Add new sort function 'Pipeline Stage Cycle' and include this in
default_mem_sort_order[]. This new sort function may be used to denote
some other pipeline stage in another architecture. So add this to list
of sort entries that can have dynamic header string.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: https://lore.kernel.org/r/1616425047-1666-5-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-26 08:49:54 -03:00
Kan Liang 590db42de0 perf report: Support instruction latency
The instruction latency information can be recorded on some platforms,
e.g., the Intel Sapphire Rapids server. With both memory latency
(weight) and the new instruction latency information, users can easily
locate the expensive load instructions, and also understand the time
spent in different stages. The users can optimize their applications in
different pipeline stages.

The 'weight' field is shared among different architectures. Reusing the
'weight' field may impacts other architectures. Add a new field to store
the instruction latency.

Like the 'weight' support, introduce a 'ins_lat' for the global
instruction latency, and a 'local_ins_lat' for the local instruction
latency version.

Add new sort functions, INSTR Latency and Local INSTR Latency,
accordingly.

Add local_ins_lat to the default_mem_sort_order[].

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/1612296553-21962-7-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 16:25:00 -03:00
Kan Liang a054c2989f perf tools: Support data block and addr block
Two new data source fields, to indicate the block reasons of a load
instruction, are introduced on the Intel Sapphire Rapids server. The
fields can be used by the memory profiling.

Add a new sort function, SORT_MEM_BLOCKED, for the two fields.

For the previous platforms or the block reason is unknown, print "N/A"
for the block reason.

Add blocked as a default mem sort key for perf report and perf mem
report.

Committer testing:

So in machines without this capability we get a "N/A" filling the new "Blocked"
column:

  $ perf mem record ls
  arch     certs	 CREDITS  Documentation  include  ipc     Kconfig  lib       MAINTAINERS  mm   samples  security  usr    block
  COPYING	 crypto	 drivers  fs             init     Kbuild  kernel   LICENSES  Makefile     net  README   scripts   sound  tools
  virt
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.008 MB perf.data (17 samples) ]
  $
  $ perf mem report --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  # Total Lost Samples: 0
  #
  # Samples: 6  of event 'cpu/mem-loads,ldlat=30/Pu'
  # Total weight : 1381
  # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked
  #
  # Overhead  Samples  Local Weight  Memory access         Symbol                   Shared Object  Data Symbol             Data Object   Snoop  TLB access    Locked  Blocked
  # ........  .......  ............  ....................  .......................  .............  ......................  ............  .....  ............  ......  .......
  #
      32.87%        1  454           Local RAM or RAM hit  [.] _dl_relocate_object  ld-2.31.so     [.] 0x00007fe91cef3078  libc-2.31.so  Hit    L1 or L2 hit  No       N/A
      25.56%        1  353           LFB or LFB hit        [.] strcmp               ld-2.31.so     [.] 0x00005586973855ca  ls            None   L1 or L2 hit  No       N/A
      22.59%        1  312           LFB or LFB hit        [.] _dl_cache_libcmp     ld-2.31.so     [.] 0x00007fe91d0e3b18  ld.so.cache   None   L1 or L2 hit  No       N/A
       8.47%        1  117           LFB or LFB hit        [.] _dl_relocate_object  ld-2.31.so     [.] 0x00007fe91ceee570  libc-2.31.so  None   L1 or L2 hit  No       N/A
       6.88%        1  95            LFB or LFB hit        [.] _dl_relocate_object  ld-2.31.so     [.] 0x00007fe91ceed490  libc-2.31.so  None   L1 or L2 hit  No       N/A
       3.62%        1  50            LFB or LFB hit        [.] _dl_cache_libcmp     ld-2.31.so     [.] 0x00007fe91d0ebe60  ld.so.cache   None   L1 or L2 hit  No       N/A

  # Samples: 11  of event 'cpu/mem-stores/Pu'
  # Total weight : 11
  # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked
  #
  # Overhead  Samples  Local Weight  Memory access  Symbol                   Shared Object  Data Symbol             Data Object  Snoop  TLB access  Locked  Blocked
  # ........  .......  ............  .............  .......................  .............  ......................  ...........  .....  ..........  ......  .......
  #
       9.09%        1  0             L1 hit         [.] __strcoll_l          libc-2.31.so   [.] 0x00007fffe5648fc8  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] _dl_lookup_symbol_x  ld-2.31.so     [.] 0x00007fffe56490b8  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] _dl_name_match_p     ld-2.31.so     [.] 0x00007fffe56487d8  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] _dl_start            ld-2.31.so     [.] start_time+0x0      ld-2.31.so   N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] _dl_sysdep_start     ld-2.31.so     [.] 0x00007fffe56494b8  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] do_lookup_x          ld-2.31.so     [.] 0x00007fffe5648ff8  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] do_lookup_x          ld-2.31.so     [.] 0x00007fffe5649064  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] do_lookup_x          ld-2.31.so     [.] 0x00007fffe5649130  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 miss        [.] _dl_start            ld-2.31.so     [.] _rtld_global+0xaf8  ld-2.31.so   N/A    N/A         N/A      N/A
       9.09%        1  0             L1 miss        [.] _dl_start            ld-2.31.so     [.] _rtld_global+0xc28  ld-2.31.so   N/A    N/A         N/A      N/A
       9.09%        1  0             L1 miss        [.] _dl_start            ld-2.31.so     [.] 0x00007fffe56495b8  [stack]      N/A    N/A         N/A      N/A

  # (Tip: Show user configuration overrides: perf config --user --list)
  $

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/1612296553-21962-4-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 16:25:00 -03:00
Stephane Eranian 9fd74f209c perf report: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
Add a new sort dimension "code_page_size" for common sort.
With this option applied, perf can sort and report by sample's code page
size.

For example:

  # perf report --stdio --sort=comm,symbol,code_page_size
  # To display the perf.data header info, please use
  # --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 3K of event 'mem-loads:uP'
  # Event count (approx.): 1470769
  #
  # Overhead  Command  Symbol                        Code Page Size IPC [IPC Coverage]
  # ........  .......  ............................  .............. ....................
  #
      69.56%  dtlb     [.] GetTickCount              4K              -   -
      17.93%  dtlb     [.] Calibrate                 4K              -   -
      11.40%  dtlb     [.] __gettimeofday            4K              -   -
  #

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210105195752.43489-6-kan.liang@linux.intel.com
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Kan Liang a50d03e3b8 perf sort: Add sort option for data page size
Add a new sort option "data_page_size" for --mem-mode sort.  With this
option applied, perf can sort and report by sample's data page size.

Here is an example:

perf report --stdio --mem-mode
--sort=comm,symbol,phys_daddr,data_page_size

 # To display the perf.data header info, please use
 # --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 9K of event 'mem-loads:uP'
 # Total weight : 9028
 # Sort order   : comm,symbol,phys_daddr,data_page_size
 #
 # Overhead  Command  Symbol                        Data Physical
 # Address
 # Data Page Size
 # ........  .......  ............................
 # ......................  ......................
 #
    11.19%  dtlb     [.] touch_buffer              [.] 0x00000003fec82ea8  4K
     8.61%  dtlb     [.] GetTickCount              [.] 0x00000003c4f2c8a8  4K
     4.52%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f58  4K
     4.33%  dtlb     [.] __gettimeofday            [.] 0x00000003fec82f48  4K
     4.32%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f78  4K
     4.28%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f50  4K
     4.23%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f70  4K
     4.11%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f68  4K
     4.00%  dtlb     [.] Calibrate                 [.] 0x00000003fec82f98  4K
     3.91%  dtlb     [.] Calibrate                 [.] 0x00000003fec82f90  4K
     3.43%  dtlb     [.] touch_buffer              [.] 0x00000003fec82e98  4K
     3.42%  dtlb     [.] touch_buffer              [.] 0x00000003fec82e90  4K
     0.09%  dtlb     [.] DoDependentLoads          [.] 0x000000036ea084c0  2M
     0.08%  dtlb     [.] DoDependentLoads          [.] 0x000000032b010b80  2M

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20201216185805.9981-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-19 17:52:24 -03:00
Arnaldo Carvalho de Melo 7127372419 perf evlist: Use the right prefix for 'struct evlist' print methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 14:55:12 -03:00
Arnaldo Carvalho de Melo f4bd0b4a9b perf evlist: Use the right prefix for 'struct evlist' browser methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 14:23:35 -03:00
Arnaldo Carvalho de Melo 10c513f798 perf evsel: Rename perf_evsel__resort*() to evsel__resort*()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-28 10:03:24 -03:00
Namhyung Kim b629f3e9d0 perf report: Add 'cgroup' sort key
The cgroup sort key is to show cgroup membership of each task.
Currently it shows full path in the cgroupfs (not relative to the root
of cgroup namespace) since it'd be more intuitive IMHO.  Otherwise root
cgroup in different namespaces will all show same name - "/".

The cgroup sort key should come before cgroup_id otherwise
sort_dimension__add() will match it to cgroup_id as it only matches with
the given substring.

For example it will look like following.  Note that record patch adding
--all-cgroups patch will come later.

  $ perf record -a --namespace --all-cgroups  cgtest
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.208 MB perf.data (4090 samples) ]

  $ perf report -s cgroup_id,cgroup,pid
  ...
  # Overhead  cgroup id (dev/inode)  Cgroup          Pid:Command
  # ........  .....................  ..........  ...............
  #
      93.96%  0/0x0                  /                 0:swapper
       1.25%  3/0xeffffffb           /               278:looper0
       0.86%  3/0xf000015f           /sub/cgrp1      280:cgtest
       0.37%  3/0xf0000160           /sub/cgrp2      281:cgtest
       0.34%  3/0xf0000163           /sub/cgrp3      282:cgtest
       0.22%  3/0xeffffffb           /sub            278:looper0
       0.20%  3/0xeffffffb           /               280:cgtest
       0.15%  3/0xf0000163           /sub/cgrp3      285:looper3

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03 09:37:55 -03:00
Jin Yao 5e3b810aac perf report: Support a new key to reload the browser
Sometimes we may need to reload the browser to update the output since
some options are changed.

This patch creates a new key K_RELOAD. Once the __cmd_report() returns
K_RELOAD, it would repeat the whole process, such as, read samples from
data file, sort the data and display in the browser.

 v5:
 ---
 1. Fix the 'make NO_SLANG=1' error. Define K_RELOAD in util/hist.h.
 2. Skip setup_sorting() in repeat path if last key is K_RELOAD.

 v4:
 ---
 Need to quit in perf_evsel_menu__run if key is K_RELOAD.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200220013616.19916-3-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24 09:37:27 -03:00
Yuya Fujita 55347ec340 perf hists: Fix variable name's inconsistency in hists__for_each() macro
Variable names are inconsistent in hists__for_each macro().

Due to this inconsistency, the macro replaces its second argument with
"fmt" regardless of its original name.

So far it works because only "fmt" is passed to the second argument.
However, this behavior is not expected and should be fixed.

Fixes: f0786af536 ("perf hists: Introduce hists__for_each_format macro")
Fixes: aa6f50af82 ("perf hists: Introduce hists__for_each_sort_list macro")
Signed-off-by: Yuya Fujita <fujita.yuya@fujitsu.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/OSAPR01MB1588E1C47AC22043175DE1B2E8520@OSAPR01MB1588.jpnprd01.prod.outlook.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-12-20 18:58:13 -03:00
Jin Yao 848a5e507e perf report: Jump to symbol source view from total cycles view
This patch supports jumping from tui total cycles view to symbol source
view.

For example,

  perf record -b ./div
  perf report --total-cycles

In total cycles view, we can select one entry and press 'a' or press
ENTER key to jump to symbol source view.

This patch also sets sort_order to NULL in cmd_report() which will use
the default branch sort order. The percent value in new annotate view
will be consistent with the percent in annotate view switched from perf
report (we observed the original percent gap with previous patches).

 v2:
 ---
 Fix the 'make NO_SLANG=1' error. (set __maybe_unused to
 annotation_opts in block_hists_tui_browse()).

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191118140849.20714-2-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-19 19:37:04 -03:00
Jin Yao 5cb456af99 perf util: Move block TUI function to ui browsers
It would be nice if we could jump to the assembler/source view (like the
normal perf report) from total cycles view.

This patch moves the block_hists_tui_browse from block-info.c to
ui/browsers/hists.c in order to reuse some browser codes (i.e
do_annotate) for implementing new annotation view.

 v2:
 ---
 Fix the 'make NO_SLANG=1' error. (Change 'int block_hists_tui_browse()'
 to 'static inline int block_hists_tui_browse()')

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191118140849.20714-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-19 19:33:40 -03:00
Jin Yao 7841f40aed perf hist: Count the total cycles of all samples
We can get the per sample cycles by hist__account_cycles(). It's also
useful to know the total cycles of all samples in order to get the
cycles coverage for a single program block in further. For example:

  coverage = per block sampled cycles / total sampled cycles

This patch creates a new argument 'total_cycles' in hist__account_cycles(),
which will be added with the cycles of each sample.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07 09:14:15 -03:00
Arnaldo Carvalho de Melo 3793d4de06 perf hist: Add missing 'struct branch_stack' forward declaration
Its needed, was being obtained indirectly, fix it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-srzphk0ehptfn3zqmpkgsi65@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-20 09:19:22 -03:00
Arnaldo Carvalho de Melo 4772925885 perf tools: Move 'struct events_stats' and prototypes to separate header
This will allow us to untangle the header dependency a bit more, as some
places will not need event.h anymore.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-enqncj29ovzaat3cd9203rwl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-31 22:24:10 -03:00
Arnaldo Carvalho de Melo 171f7474b6 perf hist: Remove needless ui/progress.h from hist.h
We only need a forward declaration, add it and fixup all the files that
need ui_progress definitions but were wrongly getting it from hist.h.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-84a90o9jdxybffxo9jmouokw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-31 22:24:10 -03:00
Namhyung Kim be5863b7d9 perf top: Fix event group with more than two events
The event group feature links relevant hist entries among events so that
they can be displayed together.  During the link process, each hist
entry in non-leader events is connected to a hist entry in the leader
event.  This is done in order of events specified in the command line so
it assumes that events are linked in the order.

But 'perf top' can break the assumption since it does the link process
multiple times.  For example, a hist entry can be in the third event
only at first so it's linked after the leader.  Some time later, second
event has a hist entry for it and it'll be linked after the entry of the
third event.

This makes the code compilicated to deal with such unordered entries.
This patch simply unlink all the entries after it's printed so that they
can assume the correct order after the repeated link process.  Also it'd
be easy to deal with decaying old entries IMHO.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190827231555.121411-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-28 18:15:03 -03:00
Jiri Olsa 63503dba87 perf evlist: Rename struct perf_evlist to struct evlist
Rename struct perf_evlist to struct evlist, so we don't have a name
clash when we add struct perf_evlist in libperf.

Committer notes:

Added fixes to build on arm64, from Jiri and from me
(tools/perf/util/cs-etm.c)

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-29 18:34:42 -03:00
Jiri Olsa 32dcd021d0 perf evsel: Rename struct perf_evsel to struct evsel
Rename struct perf_evsel to struct evsel, so we don't have a name clash
when we add struct perf_evsel in libperf.

Committer notes:

Added fixes for arm64, provided by Jiri.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-29 18:34:42 -03:00
Jin Yao b10c78c509 perf diff: Print the basic block cycles diff
$ perf record -b ./div
 $ perf record -b ./div

Following is the default perf diff output

 $ perf diff

 # Event 'cycles'
 #
 # Baseline  Delta Abs  Shared Object     Symbol
 # ........  .........  ................  ..................................
 #
     48.75%     +0.33%  div               [.] main
      8.21%     -0.20%  div               [.] compute_flag
     19.02%     -0.12%  libc-2.23.so      [.] __random_r
     16.17%     -0.09%  libc-2.23.so      [.] __random
      2.27%     -0.03%  div               [.] rand@plt
                +0.02%  [i915]            [k] gen8_irq_handler
      5.52%     +0.02%  libc-2.23.so      [.] rand

This patch creates a new computation selection 'cycles'.

 $ perf diff -c cycles

 # Event 'cycles'
 #
 # Baseline       [Program Block Range] Cycles Diff Shared Object Symbol
 # ........ ....................................... .........................................
 #
     48.75%             [div.c:42 -> div.c:45]  147 div           [.] main
     48.75%             [div.c:31 -> div.c:40]    4 div           [.] main
     48.75%             [div.c:40 -> div.c:40]    0 div           [.] main
     48.75%             [div.c:42 -> div.c:42]    0 div           [.] main
     48.75%             [div.c:42 -> div.c:44]    0 div           [.] main
     19.02% [random_r.c:357 -> random_r.c:360]    0 libc-2.23.so  [.] __random_r
     19.02% [random_r.c:357 -> random_r.c:373]    0 libc-2.23.so  [.] __random_r
     19.02% [random_r.c:357 -> random_r.c:376]    0 libc-2.23.so  [.] __random_r
     19.02% [random_r.c:357 -> random_r.c:380]    0 libc-2.23.so  [.] __random_r
     19.02% [random_r.c:357 -> random_r.c:392]    0 libc-2.23.so  [.] __random_r
     16.17%     [random.c:288 -> random.c:291]    0 libc-2.23.so  [.] __random
     16.17%     [random.c:288 -> random.c:291]    0 libc-2.23.so  [.] __random
     16.17%     [random.c:288 -> random.c:295]    0 libc-2.23.so  [.] __random
     16.17%     [random.c:288 -> random.c:297]    0 libc-2.23.so  [.] __random
     16.17%     [random.c:291 -> random.c:291]    0 libc-2.23.so  [.] __random
     16.17%     [random.c:293 -> random.c:293]    0 libc-2.23.so  [.] __random
      8.21%             [div.c:22 -> div.c:22]  148 div           [.] compute_flag
      8.21%             [div.c:22 -> div.c:25]    0 div           [.] compute_flag
      8.21%             [div.c:27 -> div.c:28]    0 div           [.] compute_flag
      5.52%           [rand.c:26 -> rand.c:27]    0 libc-2.23.so  [.] rand
      5.52%           [rand.c:26 -> rand.c:28]    0 libc-2.23.so  [.] rand
      2.27%         [rand@plt+0 -> rand@plt+0]    0 div           [.] rand@plt
      0.01% [entry_64.S:694 -> entry_64.S:694]   16 [vmlinux]     [k] native_irq_return_iret
      0.00%       [fair.c:7676 -> fair.c:7665]  162 [vmlinux]     [k] update_blocked_averages

"[Program Block Range]" indicates the range of program basic block
(start -> end). If we can find the source line it prints the source line
otherwise it prints the symbol+offset instead.

 v4:
 ---
 Use source lines or symbol+offset to indicate the basic block. It should
 be easier to understand.

 v3:
 ---
 Cast 'struct hist_entry' to 'struct block_hist' in hist_entry__block_fprintf.
 Use symbol_conf.report_block to check if executing hist_entry__block_fprintf.

 v2:
 ---
 Keep standard perf diff format and display the 'Baseline' and
 'Shared Object'.

The output is sorted by "Baseline" and the basic blocks in the same
function are sorted by cycles diff.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1561713784-30533-7-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-02 13:20:51 -03:00
Jin Yao fe96245c7f perf hists: Add block_info in hist_entry
The block_info contains the program basic block information, i.e,
contains the start address and the end address of this basic block and
how much cycles it takes.

We need to compare, sort and even print out the basic block by some
orders, i.e. sort by cycles.

For this purpose, we add block_info field to hist_entry. In order not to
impact current interface, we creates a new function
hists__add_entry_block.

 v6:
 ---
 Remove the 'ops' argument in hists__add_entry_block

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1561713784-30533-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-02 12:45:23 -03:00
Andi Kleen 4968ac8fb7 perf report: Implement browsing of individual samples
Now 'perf report' can show whole time periods with 'perf script', but
the user still has to find individual samples of interest manually.

It would be expensive and complicated to search for the right samples in
the whole perf file. Typically users only need to look at a small number
of samples for useful analysis.

Also the full scripts tend to show samples of all CPUs and all threads
mixed up, which can be very confusing on larger systems.

Add a new --samples option to save a small random number of samples per
hist entry.

Use a reservoir sample technique to select a representatve number of
samples.

Then allow browsing the samples using 'perf script' as part of the hist
entry context menu. This automatically adds the right filters, so only
the thread or cpu of the sample is displayed. Then we use less' search
functionality to directly jump the to the time stamp of the selected
sample.

It uses different menus for assembler and source display.  Assembler
needs xed installed and source needs debuginfo.

Currently it only supports as many samples as fit on the screen due to
some limitations in the slang ui code.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190311174605.GA29294@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-11 16:33:19 -03:00
Andi Kleen 6f3da20e15 perf report: Support builtin perf script in scripts menu
The scripts menu traditionally only showed custom perf scripts.

Allow to run standard perf script with useful default options too.

- Normal perf script
- perf script with assembler (needs xed installed)
- perf script with source code output (needs debuginfo)
- perf script with custom arguments

Then we automatically select the right options to display the
information in the perf.data file.

For example with -b display branch contexts.

It's not easily possible to check for xed's existence in advance.  perf
script usually gives sensible error messages when it's not available.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190311144502.15423-7-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-11 16:33:19 -03:00
Andi Kleen 3723908d05 perf report: Support time sort key
Add a time sort key to perf report to display samples for different time
quantums separately. This allows easier analysis of workloads that
change over time, and also will allow looking at the context of samples.

% perf record ...
% perf report --sort time,overhead,symbol --time-quantum 1ms --stdio
...
     0.67%  277061.87300  [.] _dl_start
     0.50%  277061.87300  [.] f1
     0.50%  277061.87300  [.] f2
     0.33%  277061.87300  [.] main
     0.29%  277061.87300  [.] _dl_lookup_symbol_x
     0.29%  277061.87300  [.] dl_main
     0.29%  277061.87300  [.] do_lookup_x
     0.17%  277061.87300  [.] _dl_debug_initialize
     0.17%  277061.87300  [.] _dl_init_paths
     0.08%  277061.87300  [.] check_match
     0.04%  277061.87300  [.] _dl_count_modids
     1.33%  277061.87400  [.] f1
     1.33%  277061.87400  [.] f2
     1.33%  277061.87400  [.] main
     1.17%  277061.87500  [.] main
     1.08%  277061.87500  [.] f1
     1.08%  277061.87500  [.] f2
     1.00%  277061.87600  [.] main
     0.83%  277061.87600  [.] f1
     0.83%  277061.87600  [.] f2
     1.00%  277061.87700  [.] main

Committer notes:

Rename 'time' argument to hist_time() to htime to overcome this in older
distros:

  cc1: warnings being treated as errors
  util/hist.c: In function 'hist_time':
  util/hist.c:251: error: declaration of 'time' shadows a global declaration
  /usr/include/time.h:186: error: shadowed declaration is here

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190311144502.15423-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-11 16:32:31 -03:00
Jiri Olsa 5749618764 perf evsel: Add output_resort_cb method
Add perf_evsel__output_resort_cb() so we have an interface with a
callback for each hist entry. It will be used in the following patch.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190204141808.23031-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06 10:00:40 -03:00
Jiri Olsa e4c38fd4a0 perf hists: Add argument to hists__resort_cb_t callback
Add argument to hists__resort_cb_t so that we can pass data from upper
layers to the callback function. It will be used in the following
patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190204141808.23031-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06 10:00:39 -03:00
Arnaldo Carvalho de Melo 71551288d2 perf hist: Remove the needless callchain.h include from hist.h
Nothing that is provided by callchain.h is used there, just things that
should've be directly included in hist.h, such as rbtree.h and a
map_symbol forward declaration.

Remove it so that we reduce the headers dependency tree.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-zivvqfx93w5zzur7hr7h0nlh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06 10:00:38 -03:00
Arnaldo Carvalho de Melo 7cadca8e1b perf hist: Remove symbol.h from hist.h, just fwd decls are needed
To reduce the includes dependencies.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-cmvg5ght75mmfg1efeyna9rn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06 10:00:38 -03:00
Arnaldo Carvalho de Melo 9f4e8ff27a perf symbols: Introduce map_symbol.h
To allow headers just wanting this definition to be able to get it
without all the things in symbol.h, to reduce the include dep tree.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-l32z2qyhs6fe8unf4gk2ead2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06 10:00:38 -03:00
Davidlohr Bueso 2eb3d6894a perf hist: Use cached rbtrees
At the cost of an extra pointer, we can avoid the O(logN) cost of
finding the first element in the tree (smallest node), which is
something heavily required for histograms. Specifically, the following
are converted to rb_root_cached, and users accordingly:

hist::entries_in_array
hist::entries_in
hist::entries
hist::entries_collapsed
hist_entry::hroot_in
hist_entry::hroot_out

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181206191819.30182-7-dave@stgolabs.net
[ Added some missing conversions to rb_first_cached() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-25 15:12:10 +01:00
Jin Yao ec6ae74fe8 perf report: Display average IPC and IPC coverage per symbol
Support displaying the average IPC and IPC coverage per symbol in 'perf
report' --tui and --stdio modes.

For example,

 $ perf record -b ...
 $ perf report -s symbol

 Overhead  Symbol                           IPC   [IPC Coverage]
   39.60%  [.] __random                     2.30  [ 54.8%]
   18.02%  [.] main                         0.43  [ 54.3%]
   14.21%  [.] compute_flag                 2.29  [100.0%]
   14.16%  [.] rand                         0.36  [100.0%]
    7.06%  [.] __random_r                   2.57  [ 70.5%]
    6.85%  [.] rand@plt                     0.00  [  0.0%]

Jiri Olsa <jolsa@redhat.com> provided the patch to support the --stdio
mode. I merged Jiri's code in this patch.

  $ perf report -s symbol --stdio

    # Overhead  Symbol                       IPC   [IPC Coverage]
    # ........  ...........................  ....................
    #
      39.60%  [.] __random                   2.30  [ 54.8%]
      18.02%  [.] main                       0.43  [ 54.3%]
      14.21%  [.] compute_flag               2.29  [100.0%]
      14.16%  [.] rand                       0.36  [100.0%]
       7.06%  [.] __random_r                 2.57  [ 70.5%]
       6.85%  [.] rand@plt                   0.00  [  0.0%]
       0.02%  [k] run_timer_softirq          1.60  [ 57.2%]

The columns "IPC" and "[IPC Coverage]" are automatically enabled when
the sort-key "symbol" is specified. If the perf.data file doesn't
contain timed LBR information, columns are filled with "-".

For example,

  # Overhead  Symbol                       IPC   [IPC Coverage]
  # ........  ...........................  ....................
  #
      46.57%  [.] main                     -      -
      17.60%  [.] rand                     -      -
      15.84%  [.] __random_r               -      -
      11.90%  [.] __random                 -      -
       6.50%  [.] compute_flag             -      -
       1.59%  [.] rand@plt                 -      -
       0.00%  [.] _dl_relocate_object      -      -
       0.00%  [k] tlb_flush_mmu            -      -
       0.00%  [k] perf_event_mmap          -      -
       0.00%  [k] native_sched_clock       -      -
       0.00%  [k] intel_pmu_handle_irq_v4  -      -
       0.00%  [k] native_write_msr         -      -

 v3:
 ---
 Removed the sortkey 'ipc' from command-line. The columns "IPC"
 and "[IPC Coverage]" are automatically enabled when "symbol"
 is specified.

 v2:
 ---
 Merge in Jiri's patch to support stdio mode

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1543586097-27632-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:55:44 -03:00
Arnaldo Carvalho de Melo e9de7e2f7e perf hists: Clarify callchain disabling when available
We want to allow having mixed events with/without callchains, not
using a global flag to show callchains, but allowing supressing
callchains when they are present.

So invert the logic of the last parameter to hists__fprint() to
that effect.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ohqyisr6qge79qa95ojslptx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-24 14:37:33 -03:00
Arnaldo Carvalho de Melo c9d3662870 perf hists: Reimplement hists__has_callchains()
There are places where we have only access to struct hists and need to
know if any of its hist_entries has callchains, like when drawing
headers for the various output modes (stdio, TUI, etc), so, when adding
a new hist_entry, check if it has callchains, storing this info for
later use by hists__has_callchains().

This reimplementation is necessary because not always a 'struct hists'
is allocated together with a 'struct perf evsel', so we can't go from
'hists' to 'perf_event_attr.sample_type & PERF_SAMPLE_CALLCHAIN'.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-hg5g7yddjio3ljwyqnnaj5dt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-07 14:42:27 -03:00
Arnaldo Carvalho de Melo fabd37b837 perf hists: Check if a hist_entry has callchains before using them
So far if we use 'perf record -g' this will make
symbol_conf.use_callchain 'true' and logic will assume that all events
have callchains enabled, but ever since we added the possibility of
setting up callchains for some events (e.g.: -e
cycles/call-graph=dwarf/) while not for others, we limit usage scenarios
by looking at that symbol_conf.use_callchain global boolean, we better
look at each event attributes.

On the road to that we need to look if a hist_entry has callchains, that
is, to go from hist_entry->hists to the evsel that contains it, to then
look at evsel->sample_type for PERF_SAMPLE_CALLCHAIN.

The next step is to add a symbol_conf.ignore_callchains global, to use
in the places where what we really want to know is if callchains should
be ignored, even if present.

Then -g will mean just to select a callchain mode to be applied to all
events not explicitely setting some other callchain mode, i.e. a default
callchain mode, and --no-call-graph will set
symbol_conf.ignore_callchains with that clear intention.

That too will at some point become a per evsel thing, that tools can set
for all or just a few of its evsels.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-0sas5cm4dsw2obn75g7ruz69@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06 12:52:03 -03:00
Arnaldo Carvalho de Melo cd0cccbae9 perf hists browser: Pass annotation_options from tool to browser
So that things changed in the command line may percolate to the browser
code without using globals.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-5daawc40zhl6gcs600com1ua@git.kernel.org
[ Merged fix for NO_SLANG=1 build provided by Jiri Olsa ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:53 -03:00
Arnaldo Carvalho de Melo 967a464a7e perf hists: Introduce hists__scnprint_title()
That is not use any struct hists_browser internals, so that it can be
shared with the other UIs and tools.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196935
Link: https://lkml.kernel.org/n/tip-w8mczjnqnbcj9yzfkv9ja6ro@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-03 10:23:18 -03:00
Kim Phillips b74d12d598 perf tools: Add a "dso_size" sort order
Add DSO size to perf report/top sort output list.

This includes adding a map__size fn to map.h, which is
approximately equal to the DSO data file_size:

  DSO				file size	map (end-start)	file / (end-start)
  libwebkit2gtk-4.0.so.37.24.9	43260072	41295872	95%
  libglib-2.0.so.0.5400.1		 1125680	 1118208	99%
  libc-2.26.so			 1960656 	 1925120	101%
  libdbus-1.so.3.14.13		  309456 	  303104	102%

Sample output:

  $ ./perf report -s dso_size,dso
  Samples: 2K of event 'cycles:uppp', Event count (approx.): 128373340
  Overhead  DSO size  Shared Object
    90.62%   unknown  [unknown]
     2.87%   1118208  libglib-2.0.so.0.5400.1
     1.92%    303104  libdbus-1.so.3.14.13
     1.42%   1925120  libc-2.26.so
     0.77%  41295872  libwebkit2gtk-4.0.so.37.24.9
     0.61%    335872  libgobject-2.0.so.0.5400.1
     0.41%   1052672  libgdk-3.so.0.2200.25
     0.36%    106496  libpthread-2.26.so
     0.29%    221184  dbus-daemon
     0.17%    159744  ld-2.26.so
     0.13%     49152  libwayland-client.so.0.3.0
     0.12%   1642496  libgio-2.0.so.0.5400.1
     0.09%   7327744  libgtk-3.so.0.2200.25
     0.09%  12324864  libmozjs-52.so.0.0.0
     0.05%   4796416  perf
     0.04%    843776  libgjs.so.0.0.0
     0.03%   1409024  libmutter-clutter-1.so

Committer testing:

To sort by DSO size, use:

  # perf report -F dso_size,dso,overhead -s dso_size
  <SNIP>
     3465216  libdns-export.so.174.0.1   0.00%
     3522560  libgc.so.1.0.3             0.00%
     3538944  libbfd-2.29-13.fc27.so     0.59%
     3670016  libunistring.so.2.1.0      0.00%
     3723264  libguile-2.0.so.22.8.1     0.00%
     3776512  libgio-2.0.so.0.5400.3     0.00%
     3891200  libc-2.26.so               0.96%
     3944448  libmozjs-17.0.so           0.00%
     4218880  libperl.so.5.26.1          0.18%
     4452352  libpython2.7.so.1.0        0.02%
     4472832  perf                       0.02%
     4603904  git                        0.01%
     4751360  libcrypto.so.1.1.0g        0.00%
     5005312  libslang.so.2.3.1          0.00%
     7315456  libgtk-3.so.0.2200.26      0.09%
     8818688  i965_dri.so                2.46%
     8818688  i965_dri.so (deleted)      1.26%
    12414976  libmozjs-52.so.0.0.0       0.03%
    23642112  cc1                        2.02%
    27889664  [kernel.kallsyms]         25.41%
    80834560  libxul.so (deleted)       15.68%
    98078720  chrome                    32.03%
  1056964608  [kernel.kallsyms]          1.59%
  #

Signed-off-by: Kim Phillips <kim.phillips@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180327060956.1c01ebe67a2a941bb4468c6f@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-02 07:57:37 -03:00
Jiri Olsa e3ebaa4651 perf report: Fix memory corruption in --branch-history mode --branch-history
Jin Yao reported memory corrupton in perf report with
branch info used for stack trace:

  > Following command lines will cause perf crash.

  > perf record -j call -g -a <application>
  > perf report --branch-history
  >
  > *** Error in `perf': double free or corruption (!prev): 0x00000000104aa040 ***
  > ======= Backtrace: =========
  > /lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7f6b37254725]
  > /lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7f6b3725cf4a]
  > /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f6b37260abc]
  > perf[0x51b914]
  > perf(hist_entry_iter__add+0x1e5)[0x51f305]
  > perf[0x43cf01]
  > perf[0x4fa3bf]
  > perf[0x4fa923]
  > perf[0x4fd396]
  > perf[0x4f9614]
  > perf(perf_session__process_events+0x89e)[0x4fc38e]
  > perf(cmd_report+0x15d2)[0x43f202]
  > perf[0x4a059f]
  > perf(main+0x631)[0x427b71]
  > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6b371fd830]
  > perf(_start+0x29)[0x427d89]

For the cumulative output, we allocate the he_cache array based on the
--max-stack option value and populate it with data from 'callchain_cursor'.

The --max-stack option value does not ensure now the limit for number of
callchain_cursor nodes, so the cumulative iter code will allocate smaller array
than it's actually needed and cause above corruption.

I think the --max-stack limit does not apply here anyway, because we add
callchain data as normal hist entries, while the --max-stack control the limit
of single entry callchain depth.

Using the callchain_cursor.nr as he_cache array count to fix this. Also
removing struct hist_entry_iter::max_stack, because there's no longer any use
for it.

We need more fixes to ensure that the branch stack code follows properly the
logic of --max-stack, which is not the case at the moment.

Original-patch-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reported-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180216123619.GA9945@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16 14:55:47 -03:00