Commit Graph

12156 Commits

Author SHA1 Message Date
Namhyung Kim d6332a176b perf callchain: Fix double mapping al->addr for children without self period
Milian Wolff found a problem he described in [1] and that for him would
get fixed:

"Note how most of the large offset values are now gone. Most notably, we
get proper srcline resolution for the random.h and complex headers."

Then Namhyung found the root cause:

"I looked into it and found a bug handling cumulative (children)
entries.  For children entries that have no self period, the al->addr (so
he->ip) ends up having an doubly-mapped address.

It seems to be there from the beginning but only affects entries that
have no srclines - finding srcline itself is done using a different
address but it will show the invalid address if no srcline was found.  I
think we should fix the commit c7405d85d7 ("perf tools: Update cpumode
for each cumulative entry")."

[1] https://lkml.kernel.org/r/20171018185350.14893-7-milian.wolff@kdab.com

Reported-by: Milian Wolff <milian.wolff@kdab.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: kernel-team@lge.com
Fixes: c7405d85d7 ("perf tools: Update cpumode for each cumulative entry")
Link: https://lkml.kernel.org/r/20171020051533.GA2746@sejong
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-31 16:14:50 -03:00
Jiri Olsa 021b462a51 perf stat: Make --per-thread update shadow stats to show metrics
We should support this because it would allow easily to collect metrics
for different threads in applications.

Original patch from posted by Jin Yao in here [1].

1. Current output, for example:

root@skl:/tmp# perf stat --per-thread -p 21623
^C
 Performance counter stats for process id '21623':

          vmstat-21623              0.517479      task-clock (msec)         #    0.000 CPUs utilized
          vmstat-21623                     1      context-switches
          vmstat-21623                     0      cpu-migrations
          vmstat-21623                     0      page-faults
          vmstat-21623               461,306      cycles
          vmstat-21623               630,724      instructions
          vmstat-21623               136,265      branches
          vmstat-21623                 2,520      branch-misses

       1.444020756 seconds time elapsed

root@skl:/tmp# perf stat --per-thread --metrics ipc -p 21623
^C
 Performance counter stats for process id '21623':

          vmstat-21623               631,185      inst_retired.any
          vmstat-21623               605,893      cpu_clk_unhalted.thread

       1.415679293 seconds time elapsed

2. With this patch, the result would be:

root@skl:/tmp# perf stat --per-thread -p 21623
^C
 Performance counter stats for process id '21623':

          vmstat-21623              0.533759      task-clock (msec)         #    0.000 CPUs utilized
          vmstat-21623                     1      context-switches          #    0.002 M/sec
          vmstat-21623                     0      cpu-migrations            #    0.000 K/sec
          vmstat-21623                     0      page-faults               #    0.000 K/sec
          vmstat-21623               473,896      cycles                    #    0.888 GHz
          vmstat-21623               631,072      instructions              #    1.33  insn per cycle
          vmstat-21623               136,307      branches                  #  255.372 M/sec
          vmstat-21623                 2,524      branch-misses             #    1.85% of all branches

       1.544862861 seconds time elapsed

root@skl:/tmp# perf stat --per-thread --metrics ipc -p 21623
^C
 Performance counter stats for process id '21623':

          vmstat-21623             1,259,104      inst_retired.any          #      1.2 IPC
          vmstat-21623             1,056,756      cpu_clk_unhalted.thread

       2.040954502 seconds time elapsed

[1] https://marc.info/?l=linux-kernel&m=150777054620511&w=2

Originally-from: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-tr8ntktxmy4qc5769ajg5u6c@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-30 13:41:48 -03:00
Jiri Olsa 54830dd0c3 perf stat: Move the shadow stats scale computation in perf_stat__update_shadow_stats
Move the shadow stats scale computation to the
perf_stat__update_shadow_stats() function, so it's centralized and we
don't forget to do it. It also saves few lines of code.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-htg7mmyxv6pcrf57qyo6msid@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-30 13:40:33 -03:00
Jiri Olsa e268687bfb perf tools: Add perf_data_file__write function
Adding perf_data_file__write function to provide single file write
operation.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-c3f9p4xzykr845ktqcek6p4t@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-30 13:38:50 -03:00
Jiri Olsa eae8ad8042 perf tools: Add struct perf_data_file
Add struct perf_data_file to represent a single file within a perf_data
struct.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-c3f9p4xzykr845ktqcek6p4t@git.kernel.org
[ Fixup recent changes in 'perf script --per-event-dump' ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-30 13:37:37 -03:00
Jiri Olsa 8ceb41d7e3 perf tools: Rename struct perf_data_file to perf_data
Rename struct perf_data_file to perf_data, because we will add the
possibility to have multiple files under perf.data, so the 'perf_data'
name fits better.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Changbin Du <changbin.du@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-39wn4d77phel3dgkzo3lyan0@git.kernel.org
[ Fixup recent changes in 'perf script --per-event-dump' ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-30 13:36:09 -03:00
Arnaldo Carvalho de Melo 642ee1c6df perf script: Print information about per-event-dump files
For a file generated by "perf sched record sleep 50":

  # perf script --per-event-dump
  [ perf script: Wrote 23.121 MB perf.data.sched:sched_switch.dump (206015 samples) ]
  [ perf script: Wrote 0.000 MB perf.data.sched:sched_stat_wait.dump (0 samples) ]
  [ perf script: Wrote 0.000 MB perf.data.sched:sched_stat_sleep.dump (0 samples) ]
  [ perf script: Wrote 0.000 MB perf.data.sched:sched_stat_iowait.dump (0 samples) ]
  [ perf script: Wrote 17.680 MB perf.data.sched:sched_stat_runtime.dump (129342 samples) ]
  [ perf script: Wrote 0.000 MB perf.data.sched:sched_process_fork.dump (24 samples) ]
  [ perf script: Wrote 11.328 MB perf.data.sched:sched_wakeup.dump (106770 samples) ]
  [ perf script: Wrote 0.000 MB perf.data.sched:sched_wakeup_new.dump (24 samples) ]
  [ perf script: Wrote 2.477 MB perf.data.sched:sched_migrate_task.dump (20434 samples) ]
  #

Similar to what is generated by 'perf record'.

Based-on-a-patch-by: yuzhoujian <yuzhoujian@didichuxing.com>
Suggested-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1508921599-10832-3-git-send-email-yuzhoujian@didichuxing.com
Link: http://lkml.kernel.org/n/tip-xuketkkjuk2c0qz546ypd1u7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-30 13:11:15 -03:00
Arnaldo Carvalho de Melo d688d0376c perf trace beauty prctl: Generate 'option' string table from kernel headers
This is one more case where the way that syscall parameter values are
defined in kernel headers are easy to parse using a shell script that
will then generate the string table that gets used by the prctl 'option'
argument beautifier.

This way as soon as the header syncronization mechanism in perf's build
system detects a change in a copy of a kernel ABI header and that file
is syncronized, we get 'perf trace' updated automagically.

Further work needed for the PR_SET_ values, as well for using eBPF to
copy the non-integer arguments to/from the kernel.

E.g.: System wide prctl tracing:

  # perf trace -e prctl
  1668.028 ( 0.025 ms): TaskSchedulerR/10649 prctl(option: SET_NAME, arg2: 0x2b61d5db15d0) = 0
  3365.663 ( 0.018 ms): chrome/10650 prctl(option: SET_SECCOMP, arg2: 2, arg4: 8         ) = -1 EFAULT Bad address
  3366.585 ( 0.010 ms): chrome/10650 prctl(option: SET_NO_NEW_PRIVS, arg2: 1             ) = 0
  3367.173 ( 0.009 ms): TaskSchedulerR/10652 prctl(option: SET_NAME, arg2: 0x2b61d2aaa300) = 0
  3367.222 ( 0.003 ms): TaskSchedulerR/10653 prctl(option: SET_NAME, arg2: 0x2b61d2aaa1e0) = 0
  3367.244 ( 0.002 ms): TaskSchedulerR/10654 prctl(option: SET_NAME, arg2: 0x2b61d2aaa0c0) = 0
  3367.265 ( 0.002 ms): TaskSchedulerR/10655 prctl(option: SET_NAME, arg2: 0x2b61d2ac7f90) = 0
  3367.281 ( 0.002 ms): Chrome_ChildIO/10656 prctl(option: SET_NAME, arg2: 0x7efbe406bb11) = 0
  3367.220 ( 0.004 ms): TaskSchedulerS/10651 prctl(option: SET_NAME, arg2: 0x2b61d2ac1be0) = 0
  3370.906 ( 0.010 ms): GpuMemoryThrea/10657 prctl(option: SET_NAME, arg2: 0x7efbe386ab11) = 0
  3370.983 ( 0.003 ms): File/10658 prctl(option: SET_NAME, arg2: 0x7efbe3069b11          ) = 0
  3384.272 ( 0.020 ms): Compositor/10659 prctl(option: SET_NAME, arg2: 0x7efbe2868b11    ) = 0
  3612.091 ( 0.012 ms): DOM Worker/11489 prctl(option: SET_NAME, arg2: 0x7f49ab97ebf2    ) = 0
<SNIP>
  4512.437 ( 0.004 ms): (sa1)/11490 prctl(option: SET_NAME, arg2: 0x7ffca15af844         ) = 0
  4512.468 ( 0.002 ms): (sa1)/11490 prctl(option: SET_MM, arg2: ARG_START, arg3: 0x7f5cb7c81000) = 0
  4512.472 ( 0.001 ms): (sa1)/11490 prctl(option: SET_MM, arg2: ARG_END, arg3: 0x7f5cb7c81006) = 0
  4514.667 ( 0.002 ms): (sa1)/11490 prctl(option: GET_SECUREBITS                         ) = 0

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-q0s2uw579o5ei6xlh2zjirgz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27 09:10:10 -03:00
Arnaldo Carvalho de Melo 4337279489 tools include uapi: Grab a copy of linux/prctl.h
We will use it to generate tables for beautifying prctl's 'option' arg
and some of the others eventually.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-cg8mpmz4hk9nfih685emnbk9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27 09:10:10 -03:00
Arnaldo Carvalho de Melo a14390fde6 perf script: Allow creating per-event dump files
Introduce a new option to dump trace output to files named by the
monitored events and update perf-script documentation accordingly.

Shown below is output of perf script command with the newly introduced
option.

         $ perf record -e cycles -e cs -ag -- sleep 1
         $ perf script --per-event-dump
         $ ls
         perf.data.cycles.dump perf.data.cs.dump

Without per-event-dump support, drawing flamegraphs for different events
would require post processing to separate events. You can monitor only
one event at a time if you want to get flamegraphs for different events.
Using this option, you can get the trace output files named by the
monitored events, and could draw flamegraphs according to the event's
name.

Based-on-a-patch-by: yuzhoujian <yuzhoujian@didichuxing.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1508921599-10832-3-git-send-email-yuzhoujian@didichuxing.com
Link: http://lkml.kernel.org/n/tip-8ngzsjdhgiovkupl3r5yy570@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27 09:10:10 -03:00
Arnaldo Carvalho de Melo e669e833da perf evsel: Restore evsel->priv as a tool private area
When we started using it for stats and did it not just in
builtin-stat.c, but also for builtin-script.c, then it stopped being a
tool private area, so introduce a new pointer for these stats and leave
->priv to its original purpose.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: yuzhoujian <yuzhoujian@didichuxing.com>
Fixes: cfc8874a48 ("perf script: Process cpu/threads maps")
Link: http://lkml.kernel.org/n/tip-jtpzx3rjqo78snmmsdzwb2eb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27 09:10:10 -03:00
Arnaldo Carvalho de Melo 894f3f1732 perf script: Use event_format__fprintf()
Another case where we a1a587073c ("perf script: Use fprintf like
printing uniformly") forgot to redirect output to the FILE descriptor,
fix this too.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: yuzhoujian <yuzhoujian@didichuxing.com>
Link: http://lkml.kernel.org/n/tip-jmwx4pgfezw98ezfoj9t957s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27 09:10:09 -03:00
Arnaldo Carvalho de Melo 5ce2c5b4e4 perf script: Use pr_debug where appropriate
We have facilities for reporting unexpected, unlikely errors, use them.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: yuzhoujian <yuzhoujian@didichuxing.com>
Link: http://lkml.kernel.org/n/tip-c7j22xfjf1j773g7ufp607q0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27 09:10:09 -03:00
Arnaldo Carvalho de Melo 69c7125229 perf script: Add a few missing conversions to fprintf style
In a1a587073c ("perf script: Use fprintf like printing uniformly")
there were a few cases that were missed, fix it.

Reported-by: yuzhoujian <yuzhoujian@didichuxing.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-sq9hvfk5mkjdqzlpyiq7jkos@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27 09:10:09 -03:00
Ingo Molnar 6856b8e536 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27 10:31:44 +02:00
Milian Wolff d8a88dd243 perf util: Enable handling of inlined frames by default
Now that we have caches in place to speed up the process of finding
inlined frames and srcline information repeatedly, we can enable this
useful option by default.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-6-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25 10:50:47 -03:00
Milian Wolff 1fb7d06a50 perf report: Use srcline from callchain for hist entries
This also removes the symbol name from the srcline column, more on this
below.

This ensures we use the correct srcline, which could originate from a
potentially inlined function. The hist entries used to query for the
srcline based purely on the IP, which leads to wrong results for inlined
entries.

Before:

~~~~~
  perf report --inline -s srcline -g none --stdio
  ...
  # Children      Self  Source:Line
  # ........  ........  ..................................................................................................................................
  #
      94.23%     0.00%  __libc_start_main+18446603487898210537
      94.23%     0.00%  _start+41
      44.58%     0.00%  main+100
      44.58%     0.00%  std::_Norm_helper<true>::_S_do_it<double>+100
      44.58%     0.00%  std::__complex_abs+100
      44.58%     0.00%  std::abs<double>+100
      44.58%     0.00%  std::norm<double>+100
      36.01%     0.00%  hypot+18446603487892193300
      25.81%     0.00%  main+41
      25.81%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
      25.81%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
      25.75%    25.75%  random.h:143
      18.39%     0.00%  main+57
      18.39%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
      18.39%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
      13.80%    13.80%  random.tcc:3330
       5.64%     0.00%  ??:0
       4.13%     4.13%  __hypot_finite+163
       4.13%     0.00%  __hypot_finite+18446603487892193443
...
~~~~~

After:

~~~~~
  perf report --inline -s srcline -g none --stdio
  ...
  # Children      Self  Source:Line
  # ........  ........  ...........................................
  #
      94.30%     1.19%  main.cpp:39
      94.23%     0.00%  __libc_start_main+18446603487898210537
      94.23%     0.00%  _start+41
      48.44%     1.70%  random.h:1823
      48.44%     0.00%  random.h:1814
      46.74%     2.53%  random.h:185
      44.68%     0.10%  complex:589
      44.68%     0.00%  complex:597
      44.68%     0.00%  complex:654
      44.68%     0.00%  complex:664
      40.61%    13.80%  random.tcc:3330
      36.01%     0.00%  hypot+18446603487892193300
      26.81%     0.00%  random.h:151
      26.81%     0.00%  random.h:332
      25.75%    25.75%  random.h:143
       5.64%     0.00%  ??:0
       4.13%     4.13%  __hypot_finite+163
       4.13%     0.00%  __hypot_finite+18446603487892193443
...
~~~~~

Note that this change removes the symbol from the source:line hist
column. If this information is desired, users should explicitly query
for it if needed. I.e. run this command instead:

~~~~~
  perf report --inline -s sym,srcline -g none --stdio
  ...
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 1K of event 'cycles:uppp'
  # Event count (approx.): 1381229476
  #
  # Children      Self  Symbol                                                                                                                               Source:Line
  # ........  ........  ...................................................................................................................................  ...........................................
  #
      94.30%     1.19%  [.] main                                                                                                                             main.cpp:39
      94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
      94.23%     0.00%  [.] _start                                                                                                                           _start+41
      48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1814
      48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1823
      46.74%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  random.h:185
      44.68%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              complex:654
      44.68%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     complex:589
      44.68%     0.00%  [.] std::abs<double> (inlined)                                                                                                       complex:597
      44.68%     0.00%  [.] std::norm<double> (inlined)                                                                                                      complex:664
      39.80%    13.59%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
      36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
      26.81%     0.00%  [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)                                                        random.h:151
      26.81%     0.00%  [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)                                 random.h:332
      25.75%     0.00%  [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)                                     random.h:143
      25.19%    25.19%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
       4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
       4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
...
~~~~~

Compared to the old behavior, this reduces duplication in the output.
Before we used to print the symbol name in the srcline column even
when the sym column was explicitly requested. I.e. the output was:

~~~~~
  perf report --inline -s sym,srcline -g none --stdio
  ...
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 1K of event 'cycles:uppp'
  # Event count (approx.): 1381229476
  #
  # Children      Self  Symbol                                                                                                                               Source:Line
  # ........  ........  ...................................................................................................................................  ..................................................................................................................................
  #
      94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
      94.23%     0.00%  [.] _start                                                                                                                           _start+41
      44.58%     0.00%  [.] main                                                                                                                             main+100
      44.58%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              std::_Norm_helper<true>::_S_do_it<double>+100
      44.58%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     std::__complex_abs+100
      44.58%     0.00%  [.] std::abs<double> (inlined)                                                                                                       std::abs<double>+100
      44.58%     0.00%  [.] std::norm<double> (inlined)                                                                                                      std::norm<double>+100
      36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
      25.81%     0.00%  [.] main                                                                                                                             main+41
      25.81%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
      25.81%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
      25.69%    25.69%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
      18.39%     0.00%  [.] main                                                                                                                             main+57
      18.39%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
      18.39%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
      13.80%    13.80%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
       4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
       4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
...
~~~~~

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-5-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25 10:50:46 -03:00
Milian Wolff 21ac9d547f perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:

Before:

 Performance counter stats for 'perf report -s srcline -g srcline --stdio':

      52496.495043      task-clock (msec)         #    0.999 CPUs utilized
               634      context-switches          #    0.012 K/sec
                 2      cpu-migrations            #    0.000 K/sec
           191,561      page-faults               #    0.004 M/sec
   165,074,498,235      cycles                    #    3.144 GHz
   334,170,832,408      instructions              #    2.02  insn per cycle
    90,220,029,745      branches                  # 1718.591 M/sec
       654,525,177      branch-misses             #    0.73% of all branches

      52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!

After:

 Performance counter stats for 'perf report -s srcline -g srcline --stdio':

      22606.323706      task-clock (msec)         #    1.000 CPUs utilized
                31      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
           185,471      page-faults               #    0.008 M/sec
    71,188,113,681      cycles                    #    3.149 GHz
   133,204,943,083      instructions              #    1.87  insn per cycle
    34,886,384,979      branches                  # 1543.214 M/sec
       278,214,495      branch-misses             #    0.80% of all branches

      22.609857253 seconds time elapsed

Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.

I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25 10:50:46 -03:00
Milian Wolff b38775cf76 perf report: Cache failed lookups of inlined frames
When no inlined frames could be found for a given address, we did not
store this information anywhere. That means we potentially do the costly
inliner lookup repeatedly for cases where we know it can never succeed.

This patch makes dso__parse_addr_inlines always return a valid
inline_node. It will be empty when no inliners are found. This enables
us to cache the empty list in the DSO, thereby improving the performance
when many addresses fail to find the inliners.

For my trivial example, the performance impact is already quite
significant:

Before:

~~~~~
 Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):

        594.804032      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.07% )
                53      context-switches          #    0.089 K/sec                    ( +-  4.09% )
                 0      cpu-migrations            #    0.000 K/sec                    ( +-100.00% )
             5,687      page-faults               #    0.010 M/sec                    ( +-  0.02% )
     2,300,918,213      cycles                    #    3.868 GHz                      ( +-  0.09% )
     4,395,839,080      instructions              #    1.91  insn per cycle           ( +-  0.00% )
       939,177,205      branches                  # 1578.969 M/sec                    ( +-  0.00% )
        11,824,633      branch-misses             #    1.26% of all branches          ( +-  0.10% )

       0.596246531 seconds time elapsed                                          ( +-  0.07% )
~~~~~

After:

~~~~~
 Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):

        113.111405      task-clock (msec)         #    0.990 CPUs utilized            ( +-  0.89% )
                29      context-switches          #    0.255 K/sec                    ( +- 54.25% )
                 0      cpu-migrations            #    0.000 K/sec
             5,380      page-faults               #    0.048 M/sec                    ( +-  0.01% )
       432,378,779      cycles                    #    3.823 GHz                      ( +-  0.75% )
       670,057,633      instructions              #    1.55  insn per cycle           ( +-  0.01% )
       141,001,247      branches                  # 1246.570 M/sec                    ( +-  0.01% )
         2,346,845      branch-misses             #    1.66% of all branches          ( +-  0.19% )

       0.114222393 seconds time elapsed                                          ( +-  1.19% )
~~~~~

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25 10:50:45 -03:00
Milian Wolff bf36eb5c4b perf report: Properly handle branch count in match_chain()
Some of the code paths I introduced before returned too early without
running the code to handle a node's branch count.  By refactoring
match_chain to only have one exit point, this can be remedied.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1707691.qaJ269GSZW@agathebauer
Link: http://lkml.kernel.org/r/20171018185350.14893-2-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25 10:50:37 -03:00
Milian Wolff aa441895f7 perf report: Compare symbol name for inlined frames when sorting
Similar to the callstack frame matching, we also have to compare the
symbol name when sorting hist entries. The reason is twofold: On one
hand, multiple inlined functions will use the same symbol start/end
values of the parent, non-inlined symbol.

As such, all of these symbols often end up missing from top-level
report, as they get merged with the non-inlined frame. On the other
hand, multiple different functions may end up inlining the same
function, and we need to aggregate these values properly.

Before:

~~~~~
  perf report --stdio --inline -g none
  # Children     Self  Command       Shared Object Symbol
  # ........ ........  ............  ............. ...................................
  #
     100.00%   39.69%  cpp-inlining  cpp-inlining  [.] main
     100.00%    0.00%  cpp-inlining  cpp-inlining  [.] _start
     100.00%    0.00%  cpp-inlining  libc-2.25.so  [.] __libc_start_main
      97.03%    0.00%  cpp-inlining  cpp-inlining  [.] std::norm<double> (inlined)
      59.53%    4.26%  cpp-inlining  libm-2.25.so  [.] hypot
      55.21%   55.08%  cpp-inlining  libm-2.25.so  [.] __hypot_finite
       0.52%    0.52%  cpp-inlining  libm-2.25.so  [.] cabs
~~~~~

After:

~~~~~
  perf report --stdio --inline -g none
  # Children     Self  Command       Shared Object Symbol
  # ........ ........  ............  ............. ...................................................................................................................................
  #
     100.00%   39.69%  cpp-inlining  cpp-inlining  [.] main
     100.00%    0.00%  cpp-inlining  cpp-inlining  [.] _start
     100.00%    0.00%  cpp-inlining  libc-2.25.so  [.] __libc_start_main
      62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)
      62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::__complex_abs (inlined)
      62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::abs<double> (inlined)
      62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::norm<double> (inlined)
      59.53%    4.26%  cpp-inlining  libm-2.25.so  [.] hypot
      55.21%   55.08%  cpp-inlining  libm-2.25.so  [.] __hypot_finite
      34.46%    0.00%  cpp-inlining  cpp-inlining  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
      32.39%    0.00%  cpp-inlining  cpp-inlining  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
      32.39%    0.00%  cpp-inlining  cpp-inlining  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
      12.29%    0.00%  cpp-inlining  cpp-inlining  [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
      12.29%    0.00%  cpp-inlining  cpp-inlining  [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
      12.29%    0.00%  cpp-inlining  cpp-inlining  [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
       0.52%    0.52%  cpp-inlining  libm-2.25.so  [.] cabs
~~~~~

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-11-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24 09:59:56 -03:00
Milian Wolff 9856240ad3 perf callchain: Compare symbol name for inlined frames when matching
The fake symbols we create for inlined frames will represent different
functions but can use the symbol start address. This leads to issues
when different inline branches all lead to the same function.

Before:
~~~~~
$ perf report -s sym -i perf.inlining.data --inline --stdio -g function
...
             --38.86%--_start
                       __libc_start_main
                       main
                       |
                        --37.57%--std::norm<double> (inlined)
                                  std::_Norm_helper<true>::_S_do_it<double> (inlined)
                                  |
                                   --36.36%--std::abs<double> (inlined)
                                             std::__complex_abs (inlined)
                                             |
                                              --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
                                                        std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
                                                        std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
~~~~~

Note that this backtrace representation is completely bogus.
Complex abs does not call the linear congruential engine! It
is just a side-effect of a longer inlined stack being appended
to a shorter, different inlined stack, both of which originate
in the same function (main).

This patch fixes the issue:

~~~~~
$ perf report -s sym -i perf.inlining.data --inline --stdio -g function
...
             --38.86%--_start
                       __libc_start_main
                       main
                       |
                       |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                       |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                       |          |
                       |           --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
                       |                     std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                       |                     |
                       |                      --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
                       |                                std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
                       |                                std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
                       |
                        --1.99%--std::norm<double> (inlined)
                                  std::_Norm_helper<true>::_S_do_it<double> (inlined)
                                  std::abs<double> (inlined)
                                  std::__complex_abs (inlined)
~~~~~

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-10-milian.wolff@kdab.com
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
[ Fix up conflict with c1fbc0cf81 ("perf callchain: Compare dsos (as well) for CCKEY_FUNCTION"), remove unneeded hunk ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24 09:59:56 -03:00
Milian Wolff 9628b56dc1 perf script: Mark inlined frames and do not print DSO for them
Instead of showing the (repeated) DSO name of the non-inlined frame, we
now show the "(inlined)" suffix instead.

Before:
                   214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                    ace3 hypot (/usr/lib/libm-2.25.so)
                     a4a std::__complex_abs (/home/milian/projects/src/perf-tests/inlining)
                     a4a std::abs<double> (/home/milian/projects/src/perf-tests/inlining)
                     a4a std::_Norm_helper<true>::_S_do_it<double> (/home/milian/projects/src/perf-tests/inlining)
                     a4a std::norm<double> (/home/milian/projects/src/perf-tests/inlining)
                     a4a main (/home/milian/projects/src/perf-tests/inlining)
                   20510 __libc_start_main (/usr/lib/libc-2.25.so)
                     bd9 _start (/home/milian/projects/src/perf-tests/inlining)

After:
                   214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                    ace3 hypot (/usr/lib/libm-2.25.so)
                     a4a std::__complex_abs (inlined)
                     a4a std::abs<double> (inlined)
                     a4a std::_Norm_helper<true>::_S_do_it<double> (inlined)
                     a4a std::norm<double> (inlined)
                     a4a main (/home/milian/projects/src/perf-tests/inlining)
                   20510 __libc_start_main (/usr/lib/libc-2.25.so)
                     bd9 _start (/home/milian/projects/src/perf-tests/inlining)

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-9-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24 09:59:56 -03:00
Milian Wolff 8932f8071c perf callchain: Mark inlined frames in output by " (inlined)" suffix
The original patch that introduced inline frame output in the various
browsers used this suffix already. The new centralized approach that
uses fake symbols for inlined frames was missing this approach so far.

Instead of changing the symbol name itself, we only print the suffix
where needed. This allows us to efficiently lookup the symbol for a
given name without first having to append the suffix before the lookup.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-8-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24 09:59:56 -03:00
Milian Wolff cbe50f6172 perf report: Fall-back to function name comparison for -g srcline
When a callchain entry has no srcline available, we ended up comparing
the instruction pointer. I consider this to be not too useful. Rather, I
think we should group the entries by function name, which this patch
adds. For people who want to split the data on the IP boundary, using
`-g address` is the correct choice.

Before:

~~~~~
   100.00%    38.86%  [.] main
            |
            |--61.14%--main inlining.cpp:14
            |          std::norm<double> complex:664
            |          std::_Norm_helper<true>::_S_do_it<double> complex:654
            |          std::abs<double> complex:597
            |          std::__complex_abs complex:589
            |          |
            |          |--56.03%--hypot
            |          |          |
            |          |          |--8.45%--__hypot_finite
            |          |          |
            |          |          |--7.62%--__hypot_finite
            |          |          |
            |          |          |--2.29%--__hypot_finite
            |          |          |
            |          |          |--2.24%--__hypot_finite
            |          |          |
            |          |          |--2.06%--__hypot_finite
            |          |          |
            |          |          |--1.81%--__hypot_finite
...
~~~~~

After:

~~~~~
   100.00%    38.86%  [.] main
            |
            |--61.14%--main inlining.cpp:14
            |          std::norm<double> complex:664
            |          std::_Norm_helper<true>::_S_do_it<double> complex:654
            |          std::abs<double> complex:597
            |          std::__complex_abs complex:589
            |          |
            |          |--60.29%--hypot
            |          |          |
            |          |           --56.03%--__hypot_finite
            |          |
            |           --0.85%--cabs
~~~~~

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-7-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24 09:59:55 -03:00
Milian Wolff 11ea2515f3 perf callchain: Create real callchain entries for inlined frames
The inline_node structs are maintained by the new dso->inlines tree.
This in turn keeps ownership of the fake symbols and srcline string
representing an inline frame.

This tree is sorted by address to allow quick lookups. All other entries
of the symbol beside the function name are unused for inline frames. The
advantage of this approach is that all existing users of the callchain
API can now transparently display inlined frames without having to patch
their code.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-6-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24 09:59:55 -03:00
Milian Wolff 2be8832f3c perf callchain: Refactor inline_list to store srcline string directly
This is a preparation for the creation of real callchain entries for
inlined frames. The rest of the perf code uses the srcline string. As
such, using that also for the srcline API allows us to simplify some of
the upcoming code. Most notably, it will allow us to cache the srcline
for a given inline node and reuse it for different callchain entries.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-5-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24 09:59:55 -03:00
Milian Wolff fea0cf842c perf callchain: Refactor inline_list to operate on symbols
This is a requirement to create real callchain entries for inlined
frames.

Since the list of inlines usually contains the target symbol too, i.e.
the location where the frames get inlined to, we alias that symbol and
reuse it as-is is. This ensures that other dependent functionality keeps
working, most notably annotation of the target frames.

For all other entries in the inline_list, a fake symbol is created.
These are marked by new 'inlined' member which is set to true. Only
those symbols are managed by the inline_list and get freed when the
inline_list is deleted from within inline_node__delete.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24 09:59:55 -03:00
Milian Wolff 40a342cda2 perf callchain: Store srcline in callchain_cursor_node
This is mostly a preparation to enable the creation of full callchain
nodes for inline frames. Such frames will reference the IP of the
non-inlined frame, but hold the symbol and srcline for an inlined
location. As such, we won't be able to query the srcline on-demand based
on the IP alone. Instead, we will leverage the functionality provided by
this patch here, and store the srcline for the inlined nodes in the new
srcline member of callchain_cursor_node.

Note that this patch on its own leaks the srcline, as there is no
free_callchain_cursor_node or similar. A future patch will add caching
of the srcline and handle deletion properly.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24 09:59:55 -03:00
Milian Wolff 2a704fc8db perf report: Remove code to handle inline frames from browsers
The follow-up commits will make inline frames first-class citizens in
the callchain, thereby obsoleting all of this special code.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20171009203310.17362-2-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24 09:59:55 -03:00
Kan Liang 65db92e096 perf vendor events: Add Goldmont Plus V1 event file
Add a Intel event file for perf.

Signed-off-by: Kan Liang <Kan.liang@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1508331907-395162-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 16:30:54 -03:00
Arnaldo Carvalho de Melo b1f03ca4ee perf namespaces: Add more appropriate set of headers
We don't need perf.h, that is a kitchen sink, all we need is
perf_events.h for perf_ns_link_info, sys/types.h for pid_t and
linux/types.h for u64, list_head.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zhijian <lizhijian@cn.fujitsu.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-f2uxyaj4s2hmntkrezpa6dqz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 16:30:54 -03:00
Christophe JAILLET 79f56ebe2a perf kmem: Perform some cleanup if '--time' is given an invalid value
If the string passed in '--time' is invalid, we must do some cleanup
before leaving. As in the other error handling paths of this function.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Fixes: 2a865bd8dd ("perf kmem: Add option to specify time window of interest")
Link: http://lkml.kernel.org/r/20170916060936.28199-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 16:30:53 -03:00
Christophe JAILLET db49bc155a perf script: Fix error handling path
If the string passed in '--time' is invalid, or if failed to set
libtraceevent function resolver, we must do some cleanup before leaving.
As in the other error handling paths of this function.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/20170916062537.28921-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 16:30:53 -03:00
Arnaldo Carvalho de Melo a1a587073c perf script: Use fprintf like printing uniformly
We've been mixing print() with fprintf() style printing for a while, but
now we need to use fprintf() like syntax uniformly as a preparatory
patch for supporting printing to different files, one per event.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: yuzhoujian <yuzhoujian@didichuxing.com>
Link: http://lkml.kernel.org/n/tip-kv5z3v8ptfghbarv3a9usvin@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 16:30:52 -03:00
Arnaldo Carvalho de Melo 923d0c9ae5 perf tools: Introduce binary__fprintf()
Out of print_binary() but receiving a fp pointer and expecting that the
printer be a fprintf like function, i.e. receive a FILE pointer and
return the number of characters printed.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: yuzhoujian <yuzhoujian@didichuxing.com>
Link: http://lkml.kernel.org/n/tip-6oqnxr6lmgqe6q6p3iugnscx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 16:30:52 -03:00
Andi Kleen 7958e54149 perf vendor events: Fix incorrect cmask syntax for some Intel metrics
Some of the metrics use an incorrect syntax for specifying the cmask for
an event. Convert to perf syntax so that they can be resolved.

Fixes metrics on Broadwell, SandyBridge.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-3k3fkfj8obek9dkmryyrqzhu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 16:30:51 -03:00
Arnaldo Carvalho de Melo d7e05ceaa9 perf tools: Do not check ABI headers in a detached tarball build
When we use one of:

  [acme@jouet linux]$ make help | grep perf
    perf-tar-src-pkg    - Build perf-4.14.0-rc3.tar source tarball
    perf-targz-src-pkg  - Build perf-4.14.0-rc3.tar.gz source tarball
    perf-tarbz2-src-pkg - Build perf-4.14.0-rc3.tar.bz2 source tarball
    perf-tarxz-src-pkg  - Build perf-4.14.0-rc3.tar.xz source tarball
  [acme@jouet linux]$

I.e. when we create a detached tarball to build perf outside outside the
enveloping kernel sources (from a kernel tarball or a checked out
linux.git directory) we by definition can't check for differences among
the tools/{include,arch}, etc files we originally copied from the
kernel, so bail out in that case, to avoid warnings when doing the
detached builds.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-vbrga0mhplv7niwxr3ghjyxv@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 16:30:50 -03:00
Jiri Olsa 696e2457e9 perf annotate: Remove arch::cpuid_parse callback
There's no need for extra cpuid_parse arch callback, it can be handled
directly in init callback.

Adding the init function to x86 to cover the cpuid initialization.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171011150158.11895-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 11:20:54 -03:00
Andi Kleen 98ad761bd3 perf list: Fix group description in the man page
Fix an incorrect description in the 'perf list' manpage. When a group
does not fit into the hardware it is partially scheduled, but does not
error out.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20171010224322.15861-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 11:20:54 -03:00
Jiri Olsa 692f5a22cd perf tests attr: Make hw events optional
Otherwise we fail on virtual machines with no support for specific HW
events.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171009130712.14747-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 11:20:54 -03:00
Arnaldo Carvalho de Melo 73c17d8150 perf mmap: Adopt push method from builtin-record.c
The previous prep patch was just to show exactly what changed in that
function, now its time to move that method and things only it uses to
the right place, mmap.[ch]

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-aaxywfgw3d44x6xlu8zm1avu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 11:20:54 -03:00
Arnaldo Carvalho de Melo d37f1586d0 perf record: Make record__mmap_read generic
It becomes a perf_mmap method, "push", that build reads from a mmap and
"pushes" it to a consumer, that in the initial case, for 'perf record',
just writes it to the perf.data file descriptor, but may be used by
'top', etc.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-u4l1qjbi6l76r2k0nv99220n@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 11:20:54 -03:00
Arnaldo Carvalho de Melo 1695849735 perf mmap: Move perf_mmap and methods to separate mmap.[ch] files
To better organize the sources, and we may end up even using it
directly, without evlists and evsels.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-oiqrm7grflurnnzo2ovfnslg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 11:20:53 -03:00
Andi Kleen ead81ee4f8 perf vendor events: Update JSON metrics for Skylake Server
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 11:20:53 -03:00
Andi Kleen e3f2dadf76 perf vendor events: Update JSON metrics for Skylake
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 11:20:53 -03:00
Andi Kleen 41a13b74a0 perf vendor events: Update JSON metrics for Sandy Bridge
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 11:20:53 -03:00
Andi Kleen 984d91f4c6 perf vendor events: Update JSON metrics for JakeTown
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 11:20:53 -03:00
Andi Kleen 7347bba555 perf vendor events: Update JSON metrics for IvyTown
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 11:20:52 -03:00
Andi Kleen 1de3152494 perf vendor events: Update JSON metrics for IvyBridge
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23 11:20:52 -03:00