Commit Graph

5542 Commits

Author SHA1 Message Date
Arnaldo Carvalho de Melo f1a397f337 perf tools: Move branch structs to branch.h
We already have it, move those there from events.h so that we untangle
the header dependencies a bit more.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-pnbkqo8jxbi49d4f3yd3b5w3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-25 15:12:08 +01:00
Arnaldo Carvalho de Melo 8a249c73a5 perf annotate: Remove lots of headers from annotate.h
To reduce the chances changes trigger tons of rebuilds, more to come.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-ytbykaku63862guk7muflcy4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-25 15:12:08 +01:00
Arnaldo Carvalho de Melo 19ea1b6f63 perf symbols: Move symbol_conf to separate file
So that we don't drag all the headers included in symbol.h when needing
to access symbol_conf in another header, such as annotate.h.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-rvo9dzflkneqmprb0dgbfybx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-25 15:12:08 +01:00
Arnaldo Carvalho de Melo b2251c327a perf color: Add missing stdarg.g to color.h
It was getting the va_list definition by luck.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-4mavb7pgt2nw9lsew1xuez09@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-25 15:12:08 +01:00
Arnaldo Carvalho de Melo 32e9136e37 perf utils: Move perf_config using routines from color.c to separate object
To untangle objects a bit more, avoiding rebuilding the color_fprintf
routines when changes are made to the perf config headers.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lkml.kernel.org/n/tip-8qvu2ek26antm3a8jyl4ocbq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 17:38:56 -03:00
Arnaldo Carvalho de Melo a5dcc4ca91 perf python: Remove -fstack-clash-protection when building with some clang versions
These options are not present in some (all?) clang versions, so when we
build for a distro that has a gcc new enough to have these options and
that the distro python build config settings use them but clang doesn't
support, b00m.

This is the case with fedora rawhide (now gearing towards f30), so check
if clang has the  and remove the missing ones from CFLAGS.

Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Thiago Macieira <thiago.macieira@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-5q50q9w458yawgxf9ez54jbp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 17:38:56 -03:00
Song Liu 7b612e291a perf tools: Synthesize PERF_RECORD_* for loaded BPF programs
This patch synthesize PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT for
BPF programs loaded before perf-record. This is achieved by gathering
information about all BPF programs via sys_bpf.

Committer notes:

Fix the build on some older systems such as amazonlinux:1 where it was
breaking with:

  util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog':
  util/bpf-event.c:52:9: error: missing initializer for field 'type' of 'struct bpf_prog_info' [-Werror=missing-field-initializers]
    struct bpf_prog_info info = {};
           ^
  In file included from /git/linux/tools/lib/bpf/bpf.h:26:0,
                   from util/bpf-event.c:3:
  /git/linux/tools/include/uapi/linux/bpf.h:2699:8: note: 'type' declared here
    __u32 type;
          ^
  cc1: all warnings being treated as errors

Further fix on a centos:6 system:

  cc1: warnings being treated as errors
  util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog':
  util/bpf-event.c:50: error: 'func_info_rec_size' may be used uninitialized in this function

The compiler is wrong, but to silence it, initialize that variable to
zero.

One more fix, this time for debian:experimental-x-mips, x-mips64 and
x-mipsel:

  util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog':
  util/bpf-event.c:93:16: error: implicit declaration of function 'calloc' [-Werror=implicit-function-declaration]
     func_infos = calloc(sub_prog_cnt, func_info_rec_size);
                  ^~~~~~
  util/bpf-event.c:93:16: error: incompatible implicit declaration of built-in function 'calloc' [-Werror]
  util/bpf-event.c:93:16: note: include '<stdlib.h>' or provide a declaration of 'calloc'

Add the missing header.

Committer testing:

  # perf record --bpf-event sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.021 MB perf.data (7 samples) ]
  # perf report -D | grep PERF_RECORD_BPF_EVENT | nl
     1	0 0x4b10 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 13
     2	0 0x4c60 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 14
     3	0 0x4db0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 15
     4	0 0x4f00 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 16
     5	0 0x5050 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 17
     6	0 0x51a0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 18
     7	0 0x52f0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 21
     8	0 0x5440 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 22
  # bpftool prog
  13: cgroup_skb  tag 7be49e3934a125ba  gpl
	loaded_at 2019-01-19T09:09:43-0300  uid 0
	xlated 296B  jited 229B  memlock 4096B  map_ids 13,14
  14: cgroup_skb  tag 2a142ef67aaad174  gpl
	loaded_at 2019-01-19T09:09:43-0300  uid 0
	xlated 296B  jited 229B  memlock 4096B  map_ids 13,14
  15: cgroup_skb  tag 7be49e3934a125ba  gpl
	loaded_at 2019-01-19T09:09:43-0300  uid 0
	xlated 296B  jited 229B  memlock 4096B  map_ids 15,16
  16: cgroup_skb  tag 2a142ef67aaad174  gpl
	loaded_at 2019-01-19T09:09:43-0300  uid 0
	xlated 296B  jited 229B  memlock 4096B  map_ids 15,16
  17: cgroup_skb  tag 7be49e3934a125ba  gpl
	loaded_at 2019-01-19T09:09:44-0300  uid 0
	xlated 296B  jited 229B  memlock 4096B  map_ids 17,18
  18: cgroup_skb  tag 2a142ef67aaad174  gpl
	loaded_at 2019-01-19T09:09:44-0300  uid 0
	xlated 296B  jited 229B  memlock 4096B  map_ids 17,18
  21: cgroup_skb  tag 7be49e3934a125ba  gpl
	loaded_at 2019-01-19T09:09:45-0300  uid 0
	xlated 296B  jited 229B  memlock 4096B  map_ids 21,22
  22: cgroup_skb  tag 2a142ef67aaad174  gpl
	loaded_at 2019-01-19T09:09:45-0300  uid 0
	xlated 296B  jited 229B  memlock 4096B  map_ids 21,22
  #

  # perf report -D | grep -B22 PERF_RECORD_KSYMBOL
  . ... raw event: size 312 bytes
  .  0000:  11 00 00 00 00 00 38 01 ff 44 06 c0 ff ff ff ff  ......8..D......
  .  0010:  e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67  ........bpf_prog
  .  0020:  5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62  _7be49e3934a125b
  .  0030:  61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  a...............
   <SNIP zeroes>
  .  0110:  00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00  ........!.......
  .  0120:  7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00  {..94.%.........
  .  0130:  00 00 00 00 00 00 00 00                          ........

  0 0x49d8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc00644ff len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
  --
  . ... raw event: size 312 bytes
  .  0000:  11 00 00 00 00 00 38 01 48 6d 06 c0 ff ff ff ff  ......8.Hm......
  .  0010:  e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67  ........bpf_prog
  .  0020:  5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37  _2a142ef67aaad17
  .  0030:  34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  4...............
   <SNIP zeroes>
  .  0110:  00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00  ........!.......
  .  0120:  2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00  *...z..t........
  .  0130:  00 00 00 00 00 00 00 00                          ........

  0 0x4b28 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0066d48 len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
  --
  . ... raw event: size 312 bytes
  .  0000:  11 00 00 00 00 00 38 01 04 cf 03 c0 ff ff ff ff  ......8.........
  .  0010:  e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67  ........bpf_prog
  .  0020:  5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62  _7be49e3934a125b
  .  0030:  61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  a...............
   <SNIP zeroes>
  .  0110:  00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00  ........!.......
  .  0120:  7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00  {..94.%.........
  .  0130:  00 00 00 00 00 00 00 00                          ........

  0 0x4c78 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc003cf04 len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
  --
  . ... raw event: size 312 bytes
  .  0000:  11 00 00 00 00 00 38 01 96 28 04 c0 ff ff ff ff  ......8..(......
  .  0010:  e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67  ........bpf_prog
  .  0020:  5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37  _2a142ef67aaad17
  .  0030:  34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  4...............
   <SNIP zeroes>
  .  0110:  00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00  ........!.......
  .  0120:  2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00  *...z..t........
  .  0130:  00 00 00 00 00 00 00 00                          ........

  0 0x4dc8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0042896 len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
  --
  . ... raw event: size 312 bytes
  .  0000:  11 00 00 00 00 00 38 01 05 13 17 c0 ff ff ff ff  ......8.........
  .  0010:  e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67  ........bpf_prog
  .  0020:  5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62  _7be49e3934a125b
  .  0030:  61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  a...............
   <SNIP zeroes>
  .  0110:  00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00  ........!.......
  .  0120:  7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00  {..94.%.........
  .  0130:  00 00 00 00 00 00 00 00                          ........

  0 0x4f18 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0171305 len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
  --
  . ... raw event: size 312 bytes
  .  0000:  11 00 00 00 00 00 38 01 0a 8c 23 c0 ff ff ff ff  ......8...#.....
  .  0010:  e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67  ........bpf_prog
  .  0020:  5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37  _2a142ef67aaad17
  .  0030:  34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  4...............
   <SNIP zeroes>
  .  0110:  00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00  ........!.......
  .  0120:  2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00  *...z..t........
  .  0130:  00 00 00 00 00 00 00 00                          ........

  0 0x5068 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0238c0a len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174
  --
  . ... raw event: size 312 bytes
  .  0000:  11 00 00 00 00 00 38 01 2a a5 a4 c0 ff ff ff ff  ......8.*.......
  .  0010:  e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67  ........bpf_prog
  .  0020:  5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62  _7be49e3934a125b
  .  0030:  61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  a...............
   <SNIP zeroes>
  .  0110:  00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00  ........!.......
  .  0120:  7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00  {..94.%.........
  .  0130:  00 00 00 00 00 00 00 00                          ........

  0 0x51b8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0a4a52a len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba
  --
  . ... raw event: size 312 bytes
  .  0000:  11 00 00 00 00 00 38 01 9b c9 a4 c0 ff ff ff ff  ......8.........
  .  0010:  e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67  ........bpf_prog
  .  0020:  5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37  _2a142ef67aaad17
  .  0030:  34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  4...............
   <SNIP zeroes>
  .  0110:  00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00  ........!.......
  .  0120:  2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00  *...z..t........
  .  0130:  00 00 00 00 00 00 00 00                          ........

  0 0x5308 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0a4c99b len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174

Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20190117161521.1341602-8-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 17:36:39 -03:00
Song Liu 45178a928a perf tools: Handle PERF_RECORD_BPF_EVENT
This patch adds basic handling of PERF_RECORD_BPF_EVENT.  Tracking of
PERF_RECORD_BPF_EVENT is OFF by default. Option --bpf-event is added to
turn it on.

Committer notes:

Add dummy machine__process_bpf_event() variant that returns zero for
systems without HAVE_LIBBPF_SUPPORT, such as Alpine Linux, unbreaking
the build in such systems.

Remove the needless include <machine.h> from bpf->event.h, provide just
forward declarations for the structs and unions in the parameters, to
reduce compilation time and needless rebuilds when machine.h gets
changed.

Committer testing:

When running with:

 # perf record --bpf-event

On an older kernel where PERF_RECORD_BPF_EVENT and PERF_RECORD_KSYMBOL
is not present, we fallback to removing those two bits from
perf_event_attr, making the tool to continue to work on older kernels:

  perf_event_attr:
    size                             112
    { sample_period, sample_freq }   4000
    sample_type                      IP|TID|TIME|PERIOD
    read_format                      ID
    disabled                         1
    inherit                          1
    mmap                             1
    comm                             1
    freq                             1
    enable_on_exec                   1
    task                             1
    precise_ip                       3
    sample_id_all                    1
    exclude_guest                    1
    mmap2                            1
    comm_exec                        1
    ksymbol                          1
    bpf_event                        1
  ------------------------------------------------------------
  sys_perf_event_open: pid 5779  cpu 0  group_fd -1  flags 0x8
  sys_perf_event_open failed, error -22
  switching off bpf_event
  ------------------------------------------------------------
  perf_event_attr:
    size                             112
    { sample_period, sample_freq }   4000
    sample_type                      IP|TID|TIME|PERIOD
    read_format                      ID
    disabled                         1
    inherit                          1
    mmap                             1
    comm                             1
    freq                             1
    enable_on_exec                   1
    task                             1
    precise_ip                       3
    sample_id_all                    1
    exclude_guest                    1
    mmap2                            1
    comm_exec                        1
    ksymbol                          1
  ------------------------------------------------------------
  sys_perf_event_open: pid 5779  cpu 0  group_fd -1  flags 0x8
  sys_perf_event_open failed, error -22
  switching off ksymbol
  ------------------------------------------------------------
  perf_event_attr:
    size                             112
    { sample_period, sample_freq }   4000
    sample_type                      IP|TID|TIME|PERIOD
    read_format                      ID
    disabled                         1
    inherit                          1
    mmap                             1
    comm                             1
    freq                             1
    enable_on_exec                   1
    task                             1
    precise_ip                       3
    sample_id_all                    1
    exclude_guest                    1
    mmap2                            1
    comm_exec                        1
  ------------------------------------------------------------

And then proceeds to work without those two features.

As passing --bpf-event is an explicit action performed by the user, perhaps we
should emit a warning telling that the kernel has no such feature, but this can
be done on top of this patch.

Now with a kernel that supports these events, start the 'record --bpf-event -a'
and then run 'perf trace sleep 10000' that will use the BPF
augmented_raw_syscalls.o prebuilt (for another kernel version even) and thus
should generate PERF_RECORD_BPF_EVENT events:

  [root@quaco ~]# perf record -e dummy -a --bpf-event
  ^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.713 MB perf.data ]

  [root@quaco ~]# bpftool prog
  13: cgroup_skb  tag 7be49e3934a125ba  gpl
  	loaded_at 2019-01-19T09:09:43-0300  uid 0
  	xlated 296B  jited 229B  memlock 4096B  map_ids 13,14
  14: cgroup_skb  tag 2a142ef67aaad174  gpl
  	loaded_at 2019-01-19T09:09:43-0300  uid 0
  	xlated 296B  jited 229B  memlock 4096B  map_ids 13,14
  15: cgroup_skb  tag 7be49e3934a125ba  gpl
  	loaded_at 2019-01-19T09:09:43-0300  uid 0
  	xlated 296B  jited 229B  memlock 4096B  map_ids 15,16
  16: cgroup_skb  tag 2a142ef67aaad174  gpl
  	loaded_at 2019-01-19T09:09:43-0300  uid 0
  	xlated 296B  jited 229B  memlock 4096B  map_ids 15,16
  17: cgroup_skb  tag 7be49e3934a125ba  gpl
  	loaded_at 2019-01-19T09:09:44-0300  uid 0
  	xlated 296B  jited 229B  memlock 4096B  map_ids 17,18
  18: cgroup_skb  tag 2a142ef67aaad174  gpl
  	loaded_at 2019-01-19T09:09:44-0300  uid 0
  	xlated 296B  jited 229B  memlock 4096B  map_ids 17,18
  21: cgroup_skb  tag 7be49e3934a125ba  gpl
  	loaded_at 2019-01-19T09:09:45-0300  uid 0
  	xlated 296B  jited 229B  memlock 4096B  map_ids 21,22
  22: cgroup_skb  tag 2a142ef67aaad174  gpl
  	loaded_at 2019-01-19T09:09:45-0300  uid 0
  	xlated 296B  jited 229B  memlock 4096B  map_ids 21,22
  31: tracepoint  name sys_enter  tag 12504ba9402f952f  gpl
  	loaded_at 2019-01-19T09:19:56-0300  uid 0
  	xlated 512B  jited 374B  memlock 4096B  map_ids 30,29,28
  32: tracepoint  name sys_exit  tag c1bd85c092d6e4aa  gpl
  	loaded_at 2019-01-19T09:19:56-0300  uid 0
  	xlated 256B  jited 191B  memlock 4096B  map_ids 30,29
  # perf report -D | grep PERF_RECORD_BPF_EVENT | nl
     1	0 55834574849 0x4fc8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 13
     2	0 60129542145 0x5118 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 14
     3	0 64424509441 0x5268 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 15
     4	0 68719476737 0x53b8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 16
     5	0 73014444033 0x5508 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 17
     6	0 77309411329 0x5658 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 18
     7	0 90194313217 0x57a8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 21
     8	0 94489280513 0x58f8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 22
     9	7 620922484360 0xb6390 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 29
    10	7 620922486018 0xb6410 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 2, flags 0, id 29
    11	7 620922579199 0xb6490 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 30
    12	7 620922580240 0xb6510 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 2, flags 0, id 30
    13	7 620922765207 0xb6598 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 31
    14	7 620922874543 0xb6620 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 32
  #

There, the 31 and 32 tracepoint BPF programs put in place by 'perf trace'.

Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20190117161521.1341602-7-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 17:00:57 -03:00
Song Liu 9aa0bfa370 perf tools: Handle PERF_RECORD_KSYMBOL
This patch handles PERF_RECORD_KSYMBOL in perf record/report.
Specifically, map and symbol are created for ksymbol register, and
removed for ksymbol unregister.

This patch also sets perf_event_attr.ksymbol properly. The flag is ON by
default.

Committer notes:

Use proper inttypes.h for u64, fixing the build in some environments
like in the android NDK r15c targetting ARM 32-bit.

I.e. fixing this build error:

  util/event.c: In function 'perf_event__fprintf_ksymbol':
  util/event.c:1489:10: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'u64' [-Werror=format=]
            event->ksymbol_event.flags, event->ksymbol_event.name);
            ^
  cc1: all warnings being treated as errors

Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20190117161521.1341602-6-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 17:00:57 -03:00
Thomas Richter 8dabe9c43a perf report: Dump s390 counter set data to file
Add support for the new s390 PMU device cpum_cf_diag to extract the
counter set diagnostic data. This data is available as event raw data
and can be created with this command:

  [root@s35lp76 perf]# ./perf record -R -e '{rbd000,rbc000}' --
                                 ~/mytests/facultaet 2500
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.009 MB perf.data ]
  [root@s35lp76 perf]#

The new event 0xbc000 generated this counter set diagnostic trace data.
The data can be extracted using command:

  [root@s35lp76 perf]# ./perf report --stdio --itrace=d
  #
  # Total Lost Samples: 0
  #
  # Samples: 21  of events 'anon group { rbd000, rbc000 }'
  # Event count (approx.): 21
  #
  #         Overhead  Command    Shared Object      Symbol
  # ................  .........  .................  ........................
  #
    80.95%   0.00%  facultaet  facultaet          [.] facultaet
     4.76%   0.00%  facultaet  [kernel.kallsyms]  [k] check_chain_key
     4.76%   0.00%  facultaet  [kernel.kallsyms]  [k] ftrace_likely_update
     4.76%   0.00%  facultaet  [kernel.kallsyms]  [k] lock_release
     4.76%   0.00%  facultaet  libc-2.26.so       [.] _dl_addr
  [root@s35lp76 perf]# ll aux*
  -rw-r--r-- 1 root root 3408 Oct 16 12:40 aux.ctr.02
  -rw-r--r-- 1 root root 4096 Oct 16 12:40 aux.smp.02
  [root@s35lp76 perf]#

The files named aux.ctr.## contain the counter set diagnostic data and
the files named aux.smp.## contain the sampling diagnostic data. ##
stand for the CPU number the data was taken from.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20190117093003.96287-4-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 17:00:57 -03:00
Thomas Richter 3e4a1c536b perf report: Display names in s390 diagnostic counter sets
On s390 the CPU Measurement Facility diagnostic counter sets are
displayed by counter number and value. Add the logical counter name in
the output (if it is available). Otherwise "unknown" is shown.

Output before:

 [root@s35lp76 perf]# ./perf report -D --stdio
 [00000000] Counterset:0 Counters:6
   Counter:000 Value:0x000000000085ec36 Counter:001 Value:0x0000000000796c94
   Counter:002 Value:0x0000000000005ada Counter:003 Value:0x0000000000092460
   Counter:004 Value:0x0000000000006073 Counter:005 Value:0x00000000001a9a73
 [0x000038] Counterset:1 Counters:2
   Counter:000 Value:0x000000000007c59f Counter:001 Value:0x000000000002fad6
 [0x000050] Counterset:2 Counters:16
   Counter:000 Value:000000000000000000 Counter:001 Value:000000000000000000

Output after:

    [root@s35lp76 perf]# ./perf report -D --stdio

 [00000000] Counterset:0 Counters:6
     Counter:000 cpu_cycles Value:0x000000000085ec36
     Counter:001 instructions Value:0x0000000000796c94
     Counter:002 l1i_dir_writes Value:0x0000000000005ada
     Counter:003 l1i_penalty_cycles Value:0x0000000000092460
     Counter:004 l1d_dir_writes Value:0x0000000000006073
     Counter:005 l1d_penalty_cycles Value:0x00000000001a9a73
 [0x000038] Counterset:1 Counters:2
     Counter:000 problem_state_cpu_cycles Value:0x000000000007c59f
     Counter:001 problem_state_instructions Value:0x000000000002fad6
 [0x000050] Counterset:2 Counters:16
     Counter:000 prng_functions Value:000000000000000000

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20190117093003.96287-3-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 17:00:56 -03:00
Thomas Richter 93115d32e8 perf report: Display arch specific diagnostic counter sets, starting with s390
On s390 the event bc000 (also named CF_DIAG) extracts the CPU
Measurement Facility diagnostic counter sets and displays them as
counter number and counter value pairs sorted by counter set number.

Output:
 [root@s35lp76 perf]# ./perf report -D --stdio

 [00000000] Counterset:0 Counters:6
   Counter:000 Value:0x000000000085ec36 Counter:001 Value:0x0000000000796c94
   Counter:002 Value:0x0000000000005ada Counter:003 Value:0x0000000000092460
   Counter:004 Value:0x0000000000006073 Counter:005 Value:0x00000000001a9a73
 [0x000038] Counterset:1 Counters:2
   Counter:000 Value:0x000000000007c59f Counter:001 Value:0x000000000002fad6
 [0x000050] Counterset:2 Counters:16
   Counter:000 Value:000000000000000000 Counter:001 Value:000000000000000000
   Counter:002 Value:000000000000000000 Counter:003 Value:000000000000000000
   Counter:004 Value:000000000000000000 Counter:005 Value:000000000000000000
   Counter:006 Value:000000000000000000 Counter:007 Value:000000000000000000
   Counter:008 Value:000000000000000000 Counter:009 Value:000000000000000000
   Counter:010 Value:000000000000000000 Counter:011 Value:000000000000000000
   Counter:012 Value:000000000000000000 Counter:013 Value:000000000000000000
   Counter:014 Value:000000000000000000 Counter:015 Value:000000000000000000
 [0x0000d8] Counterset:3 Counters:128
   Counter:000 Value:0x000000000000020f Counter:001 Value:0x00000000000001d8
   Counter:002 Value:0x000000000000d7fa Counter:003 Value:0x000000000000008b
   ...

The number in brackets is the offset into the raw data field of the
sample.

New functions trace_event_sample_raw__init() and s390_sample_raw() are
introduced in the code path to enable interpretation on non s390
platforms. This event bc000 attached raw data is generated only on s390
platform. Correct display on other platforms requires correct endianness
handling.

Committer notes:

Added a init function that sets up a evlist function pointer to avoid
repeated tests on evlist->env and calls to perf_env__name() that
involves normalizing, etc, for each PERF_RECORD_SAMPLE.

Removed needless __maybe_unused from the trace_event_raw()
prototype in session.h, move it to be an static function in evlist.

The 'offset' variable is a size_t, not an u64, fix it to avoid this on
some arches:

    CC       /tmp/build/perf/util/s390-sample-raw.o
  util/s390-sample-raw.c: In function 's390_cpumcfdg_testctr':
  util/s390-sample-raw.c:77:4: error: format '%llx' expects argument of type 'long long unsigned int', but argument 4 has type 'size_t' [-Werror=format=]
      pr_err("Invalid counter set entry at %#"  PRIx64 "\n",
      ^
  cc1: all warnings being treated as errors

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Link: https://lkml.kernel.org/r/9c856ac0-ef23-72b5-901d-a1f815508976@linux.ibm.com
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Link: https://lkml.kernel.org/n/tip-s3jhif06et9ug78qhclw41z1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 17:00:48 -03:00
Brajeswar Ghosh 3eb03a5208 perf tools: Remove duplicate headers
Remove duplicate headers which are included more than once in the same
file.

Signed-off-by: Brajeswar Ghosh <brajeswar.linux@gmail.com>
Acked-by: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Colin King <colin.king@canonical.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sabyasachi Gupta <sabyasachi.linux@gmail.com>
Link: http://lkml.kernel.org/r/20190115135916.GA3629@hp-pavilion-15-notebook-pc-brajeswar
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 15:15:57 -03:00
Jiri Olsa 3c7b67b23e perf session: Add reader__process_events function
The reader object is defined by file's fd, data offset and data size.

Now we can simply define a reader object for an arbitrary file data
portion and pass it to reader__process_events().

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190110101301.6196-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 15:15:57 -03:00
Jiri Olsa 71002bd214 perf session: Add 'data_offset' member to reader object
Add 'data_offset' member to reader object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190110101301.6196-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 15:15:57 -03:00
Jiri Olsa f66f095052 perf session: Add 'data_size' member to reader object
Add a  'data_size' member to the reader object. Keep the 'data_size'
variable instead of replacing it with rd.data_size as it will be used in
the following patch.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190110101301.6196-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 15:15:57 -03:00
Jiri Olsa 82715eb184 perf session: Add reader object
Add a session private reader object to encapsulate the reading of the
event data block. Starting with a 'fd' field.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190110101301.6196-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 15:15:57 -03:00
Jiri Olsa 4f5a473d79 perf session: Get rid of file_size variable
It's not needed and removing it makes the code a little simpler for the
upcoming changes.

It's safe to replace file_size with data_size, because the
perf_data__size() value is never smaller than data_offset + data_size.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190110101301.6196-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 15:15:57 -03:00
Jiri Olsa 7ba4da1002 perf session: Rearrange perf_session__process_events function
To reduce function arguments and the code.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190110101301.6196-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 15:15:57 -03:00
Stephane Eranian 1497e804d1 perf tools: Handle TOPOLOGY headers with no CPU
This patch fixes an issue in cpumap.c when used with the TOPOLOGY
header. In some configurations, some NUMA nodes may have no CPU (empty
cpulist). Yet a cpumap map must be created otherwise perf abort with an
error. This patch handles this case by creating a dummy map.

  Before:

  $ perf record -o - -e cycles noploop 2 | perf script -i -
  0x6e8 [0x6c]: failed to process type: 80

  After:

  $ perf record -o - -e cycles noploop 2 | perf script -i -
  noploop for 2 seconds

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1547885559-1657-1-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 11:28:56 -03:00
Arnaldo Carvalho de Melo 94ec1eb711 perf python: Remove -fstack-clash-protection when building with some clang versions
These options are not present in some (all?) clang versions, so when we
build for a distro that has a gcc new enough to have these options and
that the distro python build config settings use them but clang doesn't
support, b00m.

This is the case with fedora rawhide (now gearing towards f30), so check
if clang has the  and remove the missing ones from CFLAGS.

Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Thiago Macieira <thiago.macieira@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-5q50q9w458yawgxf9ez54jbp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-18 11:38:09 -03:00
Jiri Olsa 99d86c8b88 perf ordered_events: Fix crash in ordered_events__free
Song Liu reported crash in 'perf record':

  > #0  0x0000000000500055 in ordered_events(float, long double,...)(...) ()
  > #1  0x0000000000500196 in ordered_events.reinit ()
  > #2  0x00000000004fe413 in perf_session.process_events ()
  > #3  0x0000000000440431 in cmd_record ()
  > #4  0x00000000004a439f in run_builtin ()
  > #5  0x000000000042b3e5 in main ()"

This can happen when we get out of buffers during event processing.

The subsequent ordered_events__free() call assumes oe->buffer != NULL
and crashes. Add a check to prevent that.

Reported-by: Song Liu <liu.song.a23@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Song Liu <liu.song.a23@gmail.com>
Tested-by: Song Liu <liu.song.a23@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20190117113017.12977-1-jolsa@kernel.org
Fixes: d5ceb62b36 ("perf ordered_events: Add 'struct ordered_events_buffer' layer")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-17 11:07:00 -03:00
Arnaldo Carvalho de Melo 549aff770c perf symbols: Add 'arch_cpu_idle' to the list of kernel idle symbols
When testing 'perf top' on a armhf system (32-bit, Orange Pi Zero), I
noticed that 'arch_cpu_idle' dominated, add it to the list of idle
symbols, so that we can see what is that being done when not idle.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-4q2b5g4p2hrstrhp9t2mrlho@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-09 16:21:15 -03:00
Florian Fainelli 011532379b perf tools: Make find_vdso_map() more modular
In preparation for checking that the vectors page on the ARM
architecture, refactor the find_vdso_map() function to accept finding an
arbitrary string and create a dedicated helper function for that under
util/find-map.c and update the filename to find-map.c and all references
to it: perf-read-vdso.c and util/vdso.c.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chris Healy <cphealy@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: http://lkml.kernel.org/r/20181221034337.26663-2-f.fainelli@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-08 13:28:13 -03:00
Ingo Molnar 64598e8b6f perf/core improvements and fixes:
perf annotate:
 
   Ivan Krylov:
 
   - Pass filename to objdump via execl, fixing usage with filenames
     with special characters.
 
 perf report:
 
   Jin Yao:
 
      Fix wrong iteration count in --branch-history
 
 perf stat:
 
   Jin Yao:
 
   - Fix endless wait for child process
 
 perf test:
 
   Arnaldo Carvalho de Melo:
 
   - Use a fallback to get the pathname in vfs_getname in
 
 tools build:
 
   Jiri Olsa:
 
   - Allow overriding CFLAGS assignments.
 
 Misc:
 
   Arnaldo Carvalho de Melo:
 
   - Syncronize UAPI headers
 
   Mattias Jacobsson:
 
   - Remove redundant va_end() in strbuf_addv()
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXC+kmQAKCRCyPKLppCJ+
 J4VVAPwK4rGYiuHZnYyDDICkL4TenIj/a2AQTIeLPifwCL06lQD+LOsMdIpD/SQW
 PAZu/R0j0uFuuehYg2ikW1zdXLykDAg=
 =2j5l
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-4.21-20190104' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

perf annotate:

  Ivan Krylov:

  - Pass filename to objdump via execl, fixing usage with filenames
    with special characters.

perf report:

  Jin Yao:

     Fix wrong iteration count in --branch-history

perf stat:

  Jin Yao:

  - Fix endless wait for child process

perf test:

  Arnaldo Carvalho de Melo:

  - Use a fallback to get the pathname in vfs_getname in

tools build:

  Jiri Olsa:

  - Allow overriding CFLAGS assignments.

Misc:

  Arnaldo Carvalho de Melo:

  - Syncronize UAPI headers

  Mattias Jacobsson:

  - Remove redundant va_end() in strbuf_addv()

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-08 16:31:19 +01:00
Linus Torvalds ac5eed2b41 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling updates form Ingo Molnar:
 "A final batch of perf tooling changes: mostly fixes and small
  improvements"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
  perf session: Add comment for perf_session__register_idle_thread()
  perf thread-stack: Fix thread stack processing for the idle task
  perf thread-stack: Allocate an array of thread stacks
  perf thread-stack: Factor out thread_stack__init()
  perf thread-stack: Allow for a thread stack array
  perf thread-stack: Avoid direct reference to the thread's stack
  perf thread-stack: Tidy thread_stack__bottom() usage
  perf thread-stack: Simplify some code in thread_stack__process()
  tools gpio: Allow overriding CFLAGS
  tools power turbostat: Override CFLAGS assignments and add LDFLAGS to build command
  tools thermal tmon: Allow overriding CFLAGS assignments
  tools power x86_energy_perf_policy: Override CFLAGS assignments and add LDFLAGS to build command
  perf c2c: Increase the HITM ratio limit for displayed cachelines
  perf c2c: Change the default coalesce setup
  perf trace beauty ioctl: Beautify USBDEVFS_ commands
  perf trace beauty: Export function to get the files for a thread
  perf trace: Wire up ioctl's USBDEBFS_ cmd table generator
  perf beauty ioctl: Add generator for USBDEVFS_ ioctl commands
  tools headers uapi: Grab a copy of usbdevice_fs.h
  perf trace: Store the major number for a file when storing its pathname
  ...
2019-01-06 16:30:14 -08:00
Mattias Jacobsson 099be74886 perf strbuf: Remove redundant va_end() in strbuf_addv()
Each call to va_copy() should have one, and only one, corresponding call
to va_end(). In strbuf_addv() some code paths result in va_end() getting
called multiple times. Remove the superfluous va_end().

Signed-off-by: Mattias Jacobsson <2pi@mok.nu>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sanskriti Sharma <sansharm@redhat.com>
Link: http://lkml.kernel.org/r/20181229141750.16945-1-2pi@mok.nu
Fixes: ce49d8436c ("perf strbuf: Match va_{add,copy} with va_end")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-04 12:54:49 -03:00
Ivan Krylov 442b4eb3af perf annotate: Pass filename to objdump via execl
The symbol__disassemble() function uses shell to launch objdump and
filter its output via grep. Passing filenames by interpolating them into
the command line via "%s" may lead to problems if said filenames contain
special characters.

Instead, pass the filename as a command line argument where it is not
subject to any kind of interpretation, then use quoted shell
interpolation to build the strings we need safely.

Signed-off-by: Ivan Krylov <krylov.r00t@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20181014111803.5d83b806@Tarkus
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-04 12:54:49 -03:00
Jin Yao a3366db06b perf report: Fix wrong iteration count in --branch-history
By calculating the removed loops, we can get the iteration count.

But the iteration count could be reported incorrectly, reporting
impossibly high counts.

That's because previous code uses the number of removed LBR entries for
the iteration count. That's not good. Fix this by increasing the
iteration count when a loop is detected.

When matching the chain, the iteration count would be added up, finally we need
to compute the average value when printing out.

For example,

  $ perf report --branch-history --stdio --no-children

Before:

  ---f2 +0
     |
     |--33.62%--f1 +9 (cycles:1)
     |          f1 +0
     |          main +22 (cycles:1)
     |          main +17
     |          main +38 (cycles:1)
     |          main +27
     |          f1 +26 (cycles:1)
     |          f1 +24
     |          f2 +27 (cycles:7)
     |          f2 +0
     |          f1 +19 (cycles:1)
     |          f1 +14
     |          f2 +27 (cycles:11)
     |          f2 +0
     |          f1 +9 (cycles:1 iter:2968 avg_cycles:3)
     |          f1 +0
     |          main +22 (cycles:1 iter:2968 avg_cycles:3)
     |          main +17
     |          main +38 (cycles:1 iter:2968 avg_cycles:3)

2968 is an impossible high iteration count and avg_cycles is too small.

After:

  ---f2 +0
     |
     |--33.62%--f1 +9 (cycles:1)
     |          f1 +0
     |          main +22 (cycles:1)
     |          main +17
     |          main +38 (cycles:1)
     |          main +27
     |          f1 +26 (cycles:1)
     |          f1 +24
     |          f2 +27 (cycles:7)
     |          f2 +0
     |          f1 +19 (cycles:1)
     |          f1 +14
     |          f2 +27 (cycles:11)
     |          f2 +0
     |          f1 +9 (cycles:1 iter:1 avg_cycles:23)
     |          f1 +0
     |          main +22 (cycles:1 iter:1 avg_cycles:23)
     |          main +17
     |          main +38 (cycles:1 iter:1 avg_cycles:23)

avg_cycles:23 is the average cycles of this iteration.

Fixes: c4ee06251d ("perf report: Calculate the average cycles of iterations")

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1546582230-17507-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-04 12:54:49 -03:00
Linus Torvalds 96d4f267e4 Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.

It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access.  But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.

A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model.  And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.

This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

There were a couple of notable cases:

 - csky still had the old "verify_area()" name as an alias.

 - the iter_iov code had magical hardcoded knowledge of the actual
   values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
   really used it)

 - microblaze used the type argument for a debug printout

but other than those oddities this should be a total no-op patch.

I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something.  Any missed conversion should be trivially fixable, though.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-03 18:57:57 -08:00
Adrian Hunter b25756df5b perf session: Add comment for perf_session__register_idle_thread()
Add a comment to perf_session__register_idle_thread() to bring attention to
a pitfall with the idle task thread structure. The pitfall is that there
should really be a 'struct thread' for the idle task of each cpu, but there
is only one that can have pid == tid == 0.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181221120620.9659-9-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-02 11:05:06 -03:00
Adrian Hunter 256d92bc93 perf thread-stack: Fix thread stack processing for the idle task
perf creates a single 'struct thread' to represent the idle task. That
is because threads are identified by PID and TID, and the idle task
always has PID == TID == 0.

However, there are actually separate idle tasks for each CPU. That
creates a problem for thread stack processing which assumes that each
thread has a single stack, not one stack per CPU.

Fix that by passing through the CPU number, and in the case of the idle
"thread", pick the thread stack from an array based on the CPU number.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181221120620.9659-8-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-02 11:03:17 -03:00
Adrian Hunter 139f42f3b3 perf thread-stack: Allocate an array of thread stacks
In preparation for fixing thread stack processing for the idle task,
allocate an array of thread stacks.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181221120620.9659-7-adrian.hunter@intel.com
[ No need to check for NULL when calling zfree(), noticed by Jiri Olsa ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-02 10:55:55 -03:00
Adrian Hunter 2e9e868876 perf thread-stack: Factor out thread_stack__init()
In preparation for fixing thread stack processing for the idle task,
factor out thread_stack__init().

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181221120620.9659-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-02 10:53:41 -03:00
Adrian Hunter f6060ac601 perf thread-stack: Allow for a thread stack array
In preparation for fixing thread stack processing for the idle task,
allow for a thread stack array.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181221120620.9659-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-02 10:49:51 -03:00
Adrian Hunter bd8e68ace1 perf thread-stack: Avoid direct reference to the thread's stack
In preparation for fixing thread stack processing for the idle task,
avoid direct reference to the thread's stack. The thread stack will
change to an array of thread stacks, at which point the meaning of the
direct reference will change.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181221120620.9659-4-adrian.hunter@intel.com
[ Rename thread_stack__ts() to thread__stack() since this operates on a 'thread' struct ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-02 10:48:18 -03:00
Adrian Hunter e0b8951190 perf thread-stack: Tidy thread_stack__bottom() usage
In preparation for fixing thread stack processing for the idle task,
tidy thread_stack__bottom() usage. Specifically, the parameter 'thread'
is not needed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181221120620.9659-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-02 10:45:26 -03:00
Adrian Hunter 03b32cb281 perf thread-stack: Simplify some code in thread_stack__process()
In preparation for fixing thread stack processing for the idle task,
simplify some code in thread_stack__process().

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181221120620.9659-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-02 10:42:45 -03:00
Andi Kleen 61f611593f perf script: Fix LBR skid dump problems in brstackinsn
This is a fix for another instance of the skid problem Milian recently
found [1]

The LBRs don't freeze at the exact same time as the PMI is triggered.
The perf script brstackinsn code that dumps LBR assembler assumes that
the last branch in the LBR leads to the sample point.  But with skid
it's possible that the CPU executes one or more branches before the
sample, but which do not appear in the LBR.

What happens then is either that the sample point is before the last LBR
branch. In this case the dumper sees a negative length and ignores it.
Or it the sample point is long after the last branch. Then the dumper
sees a very long block and dumps it upto its block limit (16k bytes),
which is noise in the output.

On typical sample session this can happen regularly.

This patch tries to detect and handle the situation. On the last block
that is dumped by the LBR dumper we always stop on the first branch. If
the block length is negative just scan forward to the first branch.
Otherwise scan until a branch is found.

The PT decoder already has a function that uses the instruction decoder
to detect branches, so we can just reuse it here.

Then when a terminating branch is found print an indication and stop
dumping. This might miss a few instructions, but at least shows no
runaway blocks.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Link: http://lkml.kernel.org/r/20181120050617.4119-1-andi@firstfloor.org
[ Resolved conflict with dd2e18e9ac ("perf tools: Support 'srccode' output") ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-28 16:33:02 -03:00
Jiri Olsa a389aece97 perf python: Do not force closing original perf descriptor in evlist.get_pollfd()
Ondřej reported that when compiled with python3, the python extension
regresses in evlist.get_pollfd function behaviour.

The evlist.get_pollfd function creates file objects from evlist's fds
and returns them in a list. The python3 version also sets them to 'close
the original descriptor' when the object dies (is closed), by passing
True via the 'closefd' arg in the PyFile_FromFd call.

The python's closefd doc says:

  If closefd is False, the underlying file descriptor will be kept open
  when the file is closed.

That's why the following line in python3 closes all evlist fds:

  evlist.get_pollfd()

the returned list is immediately destroyed and that takes down the
original events fds.

Passing closefd as False to PyFile_FromFd to fix this.

Reported-by: Ondřej Lysoněk <olysonek@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jaroslav Škarvada <jskarvad@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 66dfdff03d ("perf tools: Add Python 3 support")
Link: http://lkml.kernel.org/r/20181226112121.5285-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-28 16:33:02 -03:00
Arnaldo Carvalho de Melo bc055c54b8 perf symbols: Relax checks on perf-PID.map ownership
Those are simple enough, and usually not produced by root, instead by
whatever user is running java, rust, Node.js JIT code that end up
generating those /tmp/perf-PID.map for resolution of symbols in the
anonymous executable maps.

Having to use --force to resolve symbols in 'perf top' is a distraction,
as recently I experienced when node.js symbols were not being resolved
by 'perf top'.

Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hítalo Silva <hitalos@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/n/tip-tk2jgo2v4v2yjuj28axbpppo@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18 16:17:41 -03:00
Leo Yan 7100b12cf4 perf cs-etm: Generate branch sample for exception packet
The exception packet appears as one element with 'elem_type' ==
OCSD_GEN_TRC_ELEM_EXCEPTION or OCSD_GEN_TRC_ELEM_EXCEPTION_RET, which is
present for exception entry and exit respectively.  The decoder sets the
packet fields 'packet->exc' and 'packet->exc_ret' to indicate the
exception packets; but exception packets don't have a dedicated sample
type and shares the same sample type CS_ETM_RANGE with normal
instruction packets.

As a result, the exception packets are taken as normal instruction
packets and this introduces confusion in mixing different packet types.
Furthermore, these instruction range packets will be processed for
branch samples only when 'packet->last_instr_taken_branch' is true,
otherwise they will be omitted, this can introduce a mess for exception
and exception returning due to not having the complete address range
info for context switching.

To process exception packets properly, this patch introduces two new
sample types: CS_ETM_EXCEPTION and CS_ETM_EXCEPTION_RET; these two types
of packets will be handled by cs_etm__exception().  The function
cs_etm__exception() forces setting the previous CS_ETM_RANGE packet flag
'prev_packet->last_instr_taken_branch' to true, this matches well with
the program flow when the exception is trapped from user space to kernel
space, no matter if the most recent flow has branch taken or not; this
is also safe for returning to user space after exception handling.

After exception packets have their own sample type, the packet fields
'packet->exc' and 'packet->exc_ret' aren't needed anymore, so remove
them.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Robert Walker <robert.walker@arm.com>
Cc: coresight ml <coresight@lists.linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1544513908-16805-9-git-send-email-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18 12:24:00 -03:00
Leo Yan 02e7e2509e perf cs-etm: Treat EO_TRACE element as trace discontinuity
If the decoder outputs an EO_TRACE element, it means the end of the
trace buffer; this is a discontinuity and in this case the end of trace
data needs to be saved.

This patch generates a CS_ETM_DISCONTINUITY packet for the EO_TRACE
element hereby flushing the end of trace data in cs-etm.c.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Robert Walker <robert.walker@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1544513908-16805-8-git-send-email-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18 12:23:59 -03:00
Leo Yan 37bb37168d perf cs-etm: Treat NO_SYNC element as trace discontinuity
The CoreSight tracer driver might insert barrier packets between
different buffers, thus the decoder can spot the boundaries based on the
barrier packet; it is possible for the decoder to hit a barrier packet
and emit a NO_SYNC element, then the decoder will find a periodic
synchronisation point inside that next trace block that starts the trace
again but does not have the TRACE_ON element as indicator - usually
because this trace block has wrapped the buffer so we have lost the
original point when the trace was enabled.

In the first case it causes the insertion of a OCSD_GEN_TRC_ELEM_NO_SYNC
in the middle of the tracing stream, but as we were not handling the
NO_SYNC element properly this ends up making users miss the
discontinuity indications.

Though OCSD_GEN_TRC_ELEM_NO_SYNC is different from CS_ETM_TRACE_ON when
output from the decoder, both indicate that the trace data is
discontinuous; this patch treats OCSD_GEN_TRC_ELEM_NO_SYNC as a trace
discontinuity and generates a CS_ETM_DISCONTINUITY packet for it, so
cs-etm can handle the discontinuity for this case, finally it saves the
last trace data for the previous trace block and restart samples for the
new block.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Robert Walker <robert.walker@arm.com>
Cc: coresight ml <coresight@lists.linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1544513908-16805-7-git-send-email-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18 12:23:59 -03:00
Leo Yan 49ccf87bfb perf cs-etm: Rename CS_ETM_TRACE_ON to CS_ETM_DISCONTINUITY
TRACE_ON element is used at the beginning of trace, it also can be
appeared in the middle of trace data to indicate discontinuity; for
example, it's possible to see multiple TRACE_ON elements in the trace
stream if the trace is being limited by address range filtering.

Furthermore, except TRACE_ON element is for discontinuity, NO_SYNC and
EO_TRACE also can be used to indicate discontinuity, though they are
used for different scenarios for which the trace is interrupted.

This patch renames sample type CS_ETM_TRACE_ON to CS_ETM_DISCONTINUITY,
firstly the new name describes more closely the purpose of the packet;
secondly this is a preparation for other output elements which also
cause the trace discontinuity thus they can share the same one packet
type.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Robert Walker <robert.walker@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1544513908-16805-6-git-send-email-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18 12:23:59 -03:00
Leo Yan cfc1d4276b perf cs-etm: Refactor enumeration cs_etm_sample_type
The values in enumeration cs_etm_sample_type are defined with setting
bit N for each packet type, this is not suggested in the usual case.

This patch refactor cs_etm_sample_type by converting from bit shifting
values to continuous numbers.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Robert Walker <robert.walker@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1544513908-16805-5-git-send-email-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18 12:23:59 -03:00
Leo Yan cee7a6a212 perf cs-etm: Remove unused 'trace_on' in cs_etm_decoder
cs_etm_decoder::trace_on is being assigned when TRACE_ON or NO_SYNC
element is coming, but it is never used hence it is redundant and can
be removed.

So let's remove 'trace_on' field from cs_etm_decoder struct.

Suggested-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Robert Walker <robert.walker@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1544513908-16805-4-git-send-email-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18 12:23:59 -03:00
Leo Yan 24fff5eb2b perf cs-etm: Avoid stale branch samples when flush packet
At the end of trace buffer handling, function cs_etm__flush() is invoked
to flush any remaining branch stack entries.  As a side effect, it also
generates branch sample, because the 'etmq->packet' doesn't contains any
new coming packet but point to one stale packet after packets swapping,
so it wrongly makes synthesize branch samples with stale packet info.

We could review below detailed flow which causes issue:

  Packet1: start_addr=0xffff000008b1fbf0 end_addr=0xffff000008b1fbfc
  Packet2: start_addr=0xffff000008b1fb5c end_addr=0xffff000008b1fb6c

  step 1: cs_etm__sample():
	sample: ip=(0xffff000008b1fbfc-4) addr=0xffff000008b1fb5c

  step 2: flush packet in cs_etm__run_decoder():
	cs_etm__run_decoder()
	  `-> err = cs_etm__flush(etmq, false);
	sample: ip=(0xffff000008b1fb6c-4) addr=0xffff000008b1fbf0

Packet1 and packet2 are two continuous packets, when packet2 is the new
coming packet, cs_etm__sample() generates branch sample for these two
packets and use [packet1::end_addr - 4 => packet2::start_addr] as branch
jump flow, thus we can see the first generated branch sample in step 1.
At the end of cs_etm__sample() it swaps packets so 'etm->prev_packet'=
packet2 and 'etm->packet'=packet1, so far it's okay for branch sample.

If packet2 is the last one packet in trace buffer, even there have no
any new coming packet, cs_etm__run_decoder() invokes cs_etm__flush() to
flush branch stack entries as expected, but it also generates branch
samples by taking 'etm->packet' as a new coming packet, thus the branch
jump flow is as [packet2::end_addr - 4 =>  packet1::start_addr]; this
is the second sample which is generated in step 2.  So actually the
second sample is a stale sample and we should not generate it.

This patch introduces a new function cs_etm__end_block(), at the end of
trace block this function is invoked to only flush branch stack entries
and thus can avoid to generate branch sample for stale packet.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Robert Walker <robert.walker@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1544513908-16805-3-git-send-email-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18 12:23:59 -03:00
Leo Yan 43fd56669c perf cs-etm: Correct packets swapping in cs_etm__flush()
The structure cs_etm_queue uses 'prev_packet' to point to previous
packet, this can be used to combine with new coming packet to generate
samples.

In function cs_etm__flush() it swaps packets only when the flag
'etm->synth_opts.last_branch' is true, this means that it will not swap
packets if without option '--itrace=il' to generate last branch entries;
thus for this case the 'prev_packet' doesn't point to the correct
previous packet and the stale packet still will be used to generate
sequential sample.  Thus if dump trace with 'perf script' command we can
see the incorrect flow with the stale packet's address info.

This patch corrects packets swapping in cs_etm__flush(); except using
the flag 'etm->synth_opts.last_branch' it also checks the another flag
'etm->sample_branches', if any flag is true then it swaps packets so can
save correct content to 'prev_packet'.  Finally this can fix the wrong
program flow dumping issue.

The patch has a minor refactoring to use 'etm->synth_opts.last_branch'
instead of 'etmq->etm->synth_opts.last_branch' for condition checking,
this is consistent with that is done in cs_etm__sample().

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Robert Walker <robert.walker@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1544513908-16805-2-git-send-email-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18 12:23:58 -03:00
Arnaldo Carvalho de Melo 866053bb64 perf tools: Cast off_t to s64 to avoid warning on bionic libc
To avoid this warning:

    CC       /tmp/build/perf/util/s390-cpumsf.o
  util/s390-cpumsf.c: In function 's390_cpumsf_samples':
  util/s390-cpumsf.c:508:3: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 4 has type 'off_t' [-Wformat=]
     pr_err("[%#08" PRIx64 "] Invalid AUX trailer entry TOD clock base\n",
     ^

Now the various Android cross toolchains used in the perf tools
container test builds are all clean and we can remove this:

  export EXTRA_MAKE_ARGS="WERROR=0"

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lkml.kernel.org/n/tip-5rav4ccyb0sjciysz2i4p3sx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18 12:23:57 -03:00
Arnaldo Carvalho de Melo 0afcf29bab perf header: Fix up argument to ctime()
Reducing this noise when cross building to the Android NDK:

  util/header.c: In function 'perf_header__fprintf_info':
  util/header.c:2710:45: warning: pointer targets in passing argument 1 of 'ctime' differ in signedness [-Wpointer-sign]
    fprintf(fp, "# captured on    : %s", ctime(&st.st_ctime));
                                               ^
  In file included from util/../perf.h:5:0,
                   from util/evlist.h:11,
                   from util/header.c:22:
  /opt/android-ndk-r15c/platforms/android-26/arch-arm/usr/include/time.h:81:14: note: expected 'const time_t *' but argument is of type 'long unsigned int *'
   extern char* ctime(const time_t*) __LIBC_ABI_PUBLIC__;
                ^

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-6bz74zp080yhmtiwb36enso9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18 12:23:56 -03:00
Arnaldo Carvalho de Melo 748fe0889c perf tools: Add missing sigqueue() prototype for systems lacking it
There are systems such as the Android NDK API level 24 has the
sigqueue() function but doesn't provide a prototype, adding noise to the
build:

  util/evlist.c: In function 'perf_evlist__prepare_workload':
  util/evlist.c:1494:4: warning: implicit declaration of function 'sigqueue' [-Wimplicit-function-declaration]
      if (sigqueue(getppid(), SIGUSR1, val))
      ^
  util/evlist.c:1494:4: warning: nested extern declaration of 'sigqueue' [-Wnested-externs]

Define a LACKS_SIGQUEUE_PROTOTYPE define so that code needing that can
get a prototype.

Checked in the bionic git repo to be available since level 23:

https://android.googlesource.com/platform/bionic/+/master/libc/include/signal.h#123

  int sigqueue(pid_t __pid, int __signal, const union sigval __value) __INTRODUCED_IN(23);

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-lmhpev1uni9kdrv7j29glyov@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18 12:23:56 -03:00
Adrian Hunter 571766010e perf auxtrace: Alter addr_filter__entire_dso() to work if there are no symbols
addr_filter__entire_dso() uses the first and last symbols from a dso,
and so does not work when there are no symbols.  Alter it to filter the
whole file instead.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Fixes: 1b36c03e35 ("perf record: Add support for using symbols in address filters")
Link: http://lkml.kernel.org/r/20181127084634.12469-1-adrian.hunter@intel.com
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18 12:21:44 -03:00
Adrian Hunter b5c2161cc4 perf dso: Export data_file_size() method there are no symbols
Will be used outside dso.c in a followup patch, so rename it and make it
non-static.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20181127084634.12469-1-adrian.hunter@intel.com
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-18 12:21:44 -03:00
Jiri Olsa 83356b3d12 perf ordered_events: Add first_time() method
To get the timestamp in the first event in the queue.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dmitry Levin <ldv@altlinux.org>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Luis Cláudio Gonçalves <lclaudio@uudg.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/n/tip-appp27jw1ul8kgg872j43r5o@git.kernel.org
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 15:02:17 -03:00
Jiri Olsa 68ca5d07de perf ordered_events: Add ordered_events__flush_time interface
Add OE_FLUSH__TIME flush type, to be able to flush only certain amount
of the queue based on the provided timestamp. It will be used in the
following patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dmitry Levin <ldv@altlinux.org>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Luis Cláudio Gonçalves <lclaudio@uudg.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181205160509.1168-7-jolsa@kernel.org
[ Fix the build on older systems such as centos 5 and 6 where 'time' shadows a global declaration ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 15:02:12 -03:00
Eugeniy Paltsev 6d99a79cb4 perf annotate: Introduce basic support for ARC
Introduce basic 'perf annotate' support for ARC to be able to use
anotation via stdio interface.

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Brodkin <alexey.brodkin@synopsys.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vineet Gupta <vineet.gupta1@synopsys.com>
Link: http://lkml.kernel.org/r/20181204175118.25232-1-Eugeniy.Paltsev@synopsys.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:59:42 -03:00
Sihyeon Jang 75c375c0ae perf config: Modify size factor of snprintf
According to definition of snprintf, it gets size factor including
null('\0') byte.  So '-1' is not neccessary. Also it will be helpful
unfied style with other cases. (eg. builtin-script.c)

Signed-off-by: Sihyeon Jang <uneedsihyeon@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181201154603.10093-1-uneedsihyeon@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:59:40 -03:00
Alexey Budankov c8dd6ee51a perf record: Fix memory leak on AIO objects deallocation
Sending a part which was missed between v12 and v13 of the patch set
introducing AIO trace streaming for perf record mode.

The part is essential to avoid memory leakage during deallocation of AIO
related trace data buffers.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/e5d3154e-1583-83bb-9527-28ddbc6dbf9d@linux.intel.com
[ No need to test for NULL before calling zfree() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:59:34 -03:00
Arnaldo Carvalho de Melo bd8d57fb7e perf parse-events: Fix unchecked usage of strncpy()
The strncpy() function may leave the destination string buffer
unterminated, better use strlcpy() that we have a __weak fallback
implementation for systems without it.

This fixes this warning on an Alpine Linux Edge system with gcc 8.2:

  util/parse-events.c: In function 'print_symbol_events':
  util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation]
      strncpy(name, syms->symbol, MAX_NAME_LEN);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  In function 'print_symbol_events.constprop',
      inlined from 'print_events' at util/parse-events.c:2508:2:
  util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation]
      strncpy(name, syms->symbol, MAX_NAME_LEN);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  In function 'print_symbol_events.constprop',
      inlined from 'print_events' at util/parse-events.c:2511:2:
  util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation]
      strncpy(name, syms->symbol, MAX_NAME_LEN);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: 947b4ad1d1 ("perf list: Fix max event string size")
Link: https://lkml.kernel.org/n/tip-b663e33bm6x8hrkie4uxh7u2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:59:30 -03:00
Arnaldo Carvalho de Melo bef0b8970f perf probe: Fix unchecked usage of strncpy()
The strncpy() function may leave the destination string buffer
unterminated, better use strlcpy() that we have a __weak fallback
implementation for systems without it.

In this case the 'target' buffer is coming from a list of build-ids that
are expected to have a len of at most (SBUILD_ID_SIZE - 1) chars, so
probably we're safe, but since we're using strncpy() here, use strlcpy()
instead to provide the intended safety checking without the using the
problematic strncpy() function.

This fixes this warning on an Alpine Linux Edge system with gcc 8.2:

  util/probe-file.c: In function 'probe_cache__open.isra.5':
  util/probe-file.c:427:3: error: 'strncpy' specified bound 41 equals destination size [-Werror=stringop-truncation]
     strncpy(sbuildid, target, SBUILD_ID_SIZE);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: 1f3736c9c8 ("perf probe: Show all cached probes")
Link: https://lkml.kernel.org/n/tip-l7n8ggc9kl38qtdlouke5yp5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:59:28 -03:00
Arnaldo Carvalho de Melo 2f5302533f perf svghelper: Fix unchecked usage of strncpy()
The strncpy() function may leave the destination string buffer
unterminated, better use strlcpy() that we have a __weak fallback
implementation for systems without it.

In this specific case this would only happen if fgets() was buggy, as
its man page states that it should read one less byte than the size of
the destination buffer, so that it can put the nul byte at the end of
it, so it would never copy 255 non-nul chars, as fgets reads into the
orig buffer at most 254 non-nul chars and terminates it. But lets just
switch to strlcpy to keep the original intent and silence the gcc 8.2
warning.

This fixes this warning on an Alpine Linux Edge system with gcc 8.2:

  In function 'cpu_model',
      inlined from 'svg_cpu_box' at util/svghelper.c:378:2:
  util/svghelper.c:337:5: error: 'strncpy' output may be truncated copying 255 bytes from a string of length 255 [-Werror=stringop-truncation]
       strncpy(cpu_m, &buf[13], 255);
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Fixes: f48d55ce78 ("perf: Add a SVG helper library file")
Link: https://lkml.kernel.org/n/tip-xzkoo0gyr56gej39ltivuh9g@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:59:20 -03:00
Arnaldo Carvalho de Melo 5192bde7d9 perf header: Fix unchecked usage of strncpy()
The strncpy() function may leave the destination string buffer
unterminated, better use strlcpy() that we have a __weak fallback
implementation for systems without it.

This fixes this warning on an Alpine Linux Edge system with gcc 8.2:

  util/header.c: In function 'perf_event__synthesize_event_update_name':
  util/header.c:3625:2: error: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]
    strncpy(ev->data, evsel->name, len);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  util/header.c:3618:15: note: length computed here
    size_t len = strlen(evsel->name);
                 ^~~~~~~~~~~~~~~~~~~

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: a6e5281780 ("perf tools: Add event_update event unit type")
Link: https://lkml.kernel.org/n/tip-wycz66iy8dl2z3yifgqf894p@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:59:08 -03:00
Arnaldo Carvalho de Melo 7572588085 perf header: Fix unchecked usage of strncpy()
The strncpy() function may leave the destination string buffer
unterminated, better use strlcpy() that we have a __weak fallback
implementation for systems without it.

This fixes this warning on an Alpine Linux Edge system with gcc 8.2:

  util/header.c: In function 'perf_event__synthesize_event_update_unit':
  util/header.c:3586:2: error: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]
    strncpy(ev->data, evsel->unit, size);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  util/header.c:3579:16: note: length computed here
    size_t size = strlen(evsel->unit);
                  ^~~~~~~~~~~~~~~~~~~

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: a6e5281780 ("perf tools: Add event_update event unit type")
Link: https://lkml.kernel.org/n/tip-fiikh5nay70bv4zskw2aa858@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:59:06 -03:00
Arnaldo Carvalho de Melo fca5085c15 perf dso: Fix unchecked usage of strncpy()
The strncpy() function may leave the destination string buffer
unterminated, better use strlcpy() that we have a __weak fallback
implementation for systems without it.

This fixes this warning on an Alpine Linux Edge system with gcc 8.2:

  In function 'decompress_kmodule',
      inlined from 'dso__decompress_kmodule_fd' at util/dso.c:305:9:
  util/dso.c:298:3: error: 'strncpy' destination unchanged after copying no bytes [-Werror=stringop-truncation]
     strncpy(pathname, tmpbuf, len);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    CC       /tmp/build/perf/util/values.o
    CC       /tmp/build/perf/util/debug.o
  cc1: all warnings being treated as errors

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: c9a8a6131f ("perf tools: Move the temp file processing into decompress_kmodule")
Link: https://lkml.kernel.org/n/tip-tl2hdxj64tt4k8btbi6a0ugw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:59:03 -03:00
Mathieu Poirier 15a5cd1962 perf cs-etm: Add support for PTMv1.1 decoding
This patch is re-using the mechanic set forth by ETMv3 to add support
for PTM decoding.  Configuration for both encoding protocol is similar
but the generated stream itself is very different, hence requiring
special handling.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1543955944-10042-4-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:59:01 -03:00
Mathieu Poirier 7d0f4fefc4 perf cs-etm: Add support for ETMv3 trace decoding
Add support for the creation of packet printer and decoder for the ETMv3
trace architecture.  That way traces generated by tracers adhering to
that trace protocol can be handled properly by the perf infrastructure.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1543955944-10042-3-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:58:59 -03:00
Mathieu Poirier 78688342c5 perf cs-etm: Add configuration for ETMv3 trace protocol
This patch deals with the proper initialisation of configuration
parameters for the ETMv3 trace protocol in order to properly handle
packets generated by tracers following this specification.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1543955944-10042-2-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:58:53 -03:00
Jiri Olsa 8aa5c8eddc perf top: Move perf_top__reset_sample_counters() to after counts display
Move the perf_top__reset_sample_counters() call to right after we
display the counters so we can see the updated numbers for longer.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/n/tip-o72pyiwt05f3p2juprwmz2jo@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:58:47 -03:00
Jiri Olsa 97f7e0b33d perf top: Save and display the drop count stats
Add drop count to 'perf top' headers:

  # perf top --stdio
   PerfTop:    3549 irqs/sec  kernel:51.8%  exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles:ppp],  (all, 8 CPUs)

  # perf top
  Samples: 0  of event 'cycles:ppp', 4000 Hz, Event count (approx.): 0 lost: 0/0 drop: 0/0

The format is: <current period drop>/<total drop>

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-2lj87zz8tq9ye1ntax3ulw0n@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:58:33 -03:00
Jiri Olsa 94ad6e7e36 perf top: Use cond variable instead of a lock
Use conditional variable logic to synchronize between the reading and
processing threads. Currently it's done by having mutex around rotation
code.

Using a POSIX cond variable to sync both threads after queues rotation:

  Process thread:

    - Detects data
    - Switches queues
    - Sets rotate variable
    - Waits in pthread_cond_wait()

  Read thread:

    - Detects rotate is set
    - Kicks the process thread with a pthread_cond_signal()

After this rotation is safely completed and both threads can continue
with the new queue.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-3rdeg23rv3brvy1pwt3igvyw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:58:03 -03:00
Jiri Olsa 16c66bc167 perf top: Add processing thread
Add a new thread that takes care of the hist creating to alleviate the
main reader thread so it can keep perf mmaps served in time so that we
reduce the possibility of losing events.

The 'perf top' command now spawns 2 extra threads, the data processing
is the following:

  1) The main thread reads the data from mmaps and queues them to
     ordered events object;

  2) The processing threads takes the data from the ordered events
     object and create initial histogram;

  3) The GUI thread periodically sorts the initial histogram and
     presents it.

Passing the data between threads 1 and 2 is done by having 2 ordered
events queues. One is always being stored by thread 1 while the other is
flushed out in thread 2.

Passing the data between threads 2 and 3 stays the same as was initially
for threads 1 and 3.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-hhf4hllgkmle9wl1aly1jli0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:57:52 -03:00
Jiri Olsa d24e3c98ac perf top: Save and display the lost count stats
Add a 'lost count' to 'perf top' headers:

  # perf top --stdio
   PerfTop:    3850 irqs/sec  kernel:49.0%  exact: 100.0% lost: 0/0 [4000Hz cycles:ppp],  (all, 8 CPUs)

  # perf top
  Samples: 0  of event 'cycles:ppp', 4000 Hz, Event count (approx.): 0 lost: 0/0

The format is: <current period lost>/<total lost>

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/n/tip-zo11rn270gij5jtp8fknpf8u@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:57:36 -03:00
Jiri Olsa a4a6668a62 perf ordered_events: Add private data member
We will need it in following patch, where we can't use the
container_of() trick to get the higher level object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-vgs9aoek21v14o3obza586yy@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:57:30 -03:00
Jiri Olsa b8494f1df8 perf ordered_events: Rework show_progress for __ordered_events__flush
Decide to use the progress bar one level higher, we will need this in
following patch.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-ocjdukp2a8ujikkmafd0j5zv@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:57:12 -03:00
Andi Kleen dd2e18e9ac perf tools: Support 'srccode' output
When looking at PT or brstackinsn traces with 'perf script' it can be
very useful to see the source code. This adds a simple facility to print
them with 'perf script', if the information is available through dwarf

  % perf record ...
  % perf script -F insn,ip,sym,srccode
  ...

            4004c6 main
  5               for (i = 0; i < 10000000; i++)
             4004cd main
  5               for (i = 0; i < 10000000; i++)
             4004c6 main
  5               for (i = 0; i < 10000000; i++)
             4004cd main
  5               for (i = 0; i < 10000000; i++)
             4004cd main
  5               for (i = 0; i < 10000000; i++)
             4004cd main
  5               for (i = 0; i < 10000000; i++)
             4004cd main
  5               for (i = 0; i < 10000000; i++)
             4004cd main
  5               for (i = 0; i < 10000000; i++)
             4004b3 main
  6                       v++;

  % perf record -b ...
  % perf script -F insn,ip,sym,srccode,brstackinsn

  ...
         main+22:
          0000000000400543        insn: e8 ca ff ff ff            # PRED
  |18                     f1();
          f1:
          0000000000400512        insn: 55
  |10       {
          0000000000400513        insn: 48 89 e5
          0000000000400516        insn: b8 00 00 00 00
  |11             f2();
          000000000040051b        insn: e8 d6 ff ff ff            # PRED
          f2:
          00000000004004f6        insn: 55
  |5        {
          00000000004004f7        insn: 48 89 e5
          00000000004004fa        insn: 8b 05 2c 0b 20 00
  |6              c = a / b;
          0000000000400500        insn: 8b 0d 2a 0b 20 00
          0000000000400506        insn: 99
          0000000000400507        insn: f7 f9
          0000000000400509        insn: 89 05 29 0b 20 00
          000000000040050f        insn: 90
  |7        }
          0000000000400510        insn: 5d
          0000000000400511        insn: c3                        # PRED
          f1+14:
          0000000000400520        insn: b8 00 00 00 00
  |12             f2();
          0000000000400525        insn: e8 cc ff ff ff            # PRED
          f2:
          00000000004004f6        insn: 55
  |5        {
          00000000004004f7        insn: 48 89 e5
          00000000004004fa        insn: 8b 05 2c 0b 20 00
  |6              c = a / b;

Not supported for callchains currently, would need some layout changes
there.

Committer notes:

Fixed the build on Alpine Linux (3.4 .. 3.8) by addressing this
warning:

  In file included from util/srccode.c:19:0:
  /usr/include/sys/fcntl.h:1:2: error: #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> [-Werror=cpp]
   #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h>
    ^~~~~~~
  cc1: all warnings being treated as errors

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181204001848.24769-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:57:07 -03:00
Mark Drayton 3fcb10e496 perf tools: Allow specifying proc-map-timeout in config file
The default timeout of 500ms for parsing /proc/<pid>/maps files is too
short for profiling many of our services.

This can be overridden by passing --proc-map-timeout to the relevant
command but it'd be nice to globally increase our default value.

This patch permits setting a different default with the
core.proc-map-timeout config file parameter.

Signed-off-by: Mark Drayton <mbd@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20181204203420.1683114-1-mbd@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:56:57 -03:00
Ingo Molnar adba163441 perf tools: Fix diverse comment typos
Go over the tools/ files that are maintained in Arnaldo's tree and
fix common typos: half of them were in comments, the other half
in JSON files.

No change in functionality intended.

Committer notes:

This was split from a larger patch as there are code that is,
additionally, maintained outside the kernel tree, so to ease
cherry-picking and/or backporting, split this into multiple patches.

Just typos in comments, no need to backport, reducing the possibility of
possible backporting artifacts.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20181203102200.GA104797@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:56:47 -03:00
Ingo Molnar e4a8b0af51 perf bpf-loader: Fix debugging message typo
Go over the tools/ files that are maintained in Arnaldo's tree and
fix common typos: half of them were in comments, the other half
in JSON files.

No change in functionality intended.

Committer notes:

This was split from a larger patch as there are code that is,
additionally, maintained outside the kernel tree, so to ease cherry
picking and/or backporting, split this into multiple patches.

This one has information that is presented to the user, albeit in debug
mode.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20181203102200.GA104797@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:56:39 -03:00
Robert Walker a7ee4d625e perf cs-etm: Support for ARM A32/T32 instruction sets in CoreSight trace
This patch adds support for generating instruction samples from trace of
AArch32 programs using the A32 and T32 instruction sets.

T32 has variable 2 or 4 byte instruction size, so the conversion between
addresses and instruction counts requires extra information from the
trace decoder, requiring version 0.10.0 of OpenCSD.  A check for the
OpenCSD library version has been added to the feature check for OpenCSD.

Signed-off-by: Robert Walker <robert.walker@arm.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1543839526-30348-1-git-send-email-robert.walker@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:56:18 -03:00
Tzvetomir Stoyanov f0bba09ce3 perf tools: traceevent API cleanup, remove __tep_data2host*()
In order to make libtraceevent into a proper library, its API should be
straightforward. The __tep_data2host*() functions are going to no longer
be available as a libtraceevent API, tep_read_number() should be used
instead. This patch replaces __tep_data2host*() usage with
tep_read_number() in perf.

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181130154647.743979275@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:56:08 -03:00
Tzvetomir Stoyanov 97fbf3f0e0 tools lib traceevent, perf tools: Rename 'struct tep_event_format' to 'struct tep_event'
In order to make libtraceevent into a proper library, variables, data
structures and functions require a unique prefix to prevent name space
conflicts.

This renames 'struct tep_event_format' to 'struct tep_event', which
describes more closely the purpose of the struct.

Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181130154647.436403995@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[ Fixup conflict with 6e33c250a88f ("tools lib traceevent: Fix compile warnings in tools/lib/traceevent/event-parse.c") ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:56:02 -03:00
Jin Yao ec6ae74fe8 perf report: Display average IPC and IPC coverage per symbol
Support displaying the average IPC and IPC coverage per symbol in 'perf
report' --tui and --stdio modes.

For example,

 $ perf record -b ...
 $ perf report -s symbol

 Overhead  Symbol                           IPC   [IPC Coverage]
   39.60%  [.] __random                     2.30  [ 54.8%]
   18.02%  [.] main                         0.43  [ 54.3%]
   14.21%  [.] compute_flag                 2.29  [100.0%]
   14.16%  [.] rand                         0.36  [100.0%]
    7.06%  [.] __random_r                   2.57  [ 70.5%]
    6.85%  [.] rand@plt                     0.00  [  0.0%]

Jiri Olsa <jolsa@redhat.com> provided the patch to support the --stdio
mode. I merged Jiri's code in this patch.

  $ perf report -s symbol --stdio

    # Overhead  Symbol                       IPC   [IPC Coverage]
    # ........  ...........................  ....................
    #
      39.60%  [.] __random                   2.30  [ 54.8%]
      18.02%  [.] main                       0.43  [ 54.3%]
      14.21%  [.] compute_flag               2.29  [100.0%]
      14.16%  [.] rand                       0.36  [100.0%]
       7.06%  [.] __random_r                 2.57  [ 70.5%]
       6.85%  [.] rand@plt                   0.00  [  0.0%]
       0.02%  [k] run_timer_softirq          1.60  [ 57.2%]

The columns "IPC" and "[IPC Coverage]" are automatically enabled when
the sort-key "symbol" is specified. If the perf.data file doesn't
contain timed LBR information, columns are filled with "-".

For example,

  # Overhead  Symbol                       IPC   [IPC Coverage]
  # ........  ...........................  ....................
  #
      46.57%  [.] main                     -      -
      17.60%  [.] rand                     -      -
      15.84%  [.] __random_r               -      -
      11.90%  [.] __random                 -      -
       6.50%  [.] compute_flag             -      -
       1.59%  [.] rand@plt                 -      -
       0.00%  [.] _dl_relocate_object      -      -
       0.00%  [k] tlb_flush_mmu            -      -
       0.00%  [k] perf_event_mmap          -      -
       0.00%  [k] native_sched_clock       -      -
       0.00%  [k] intel_pmu_handle_irq_v4  -      -
       0.00%  [k] native_write_msr         -      -

 v3:
 ---
 Removed the sortkey 'ipc' from command-line. The columns "IPC"
 and "[IPC Coverage]" are automatically enabled when "symbol"
 is specified.

 v2:
 ---
 Merge in Jiri's patch to support stdio mode

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1543586097-27632-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:55:44 -03:00
Jin Yao 246fda09c1 perf annotate: Create a annotate2 flag in struct symbol
We often use the symbol__annotate2() to annotate a specified symbol.
While annotating may take some time, so in order to avoid annotating the
same symbol repeatedly, the patch creates a new flag to indicate the
symbol has been annotated.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1543586097-27632-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:55:40 -03:00
Jin Yao ace4f8faea perf annotate: Compute average IPC and IPC coverage per symbol
Add support to 'perf report' annotate view or 'perf annotate --stdio2'
to aggregate the IPC derived from timed LBRs per symbol. We compute the
average IPC and the IPC coverage percentage.

For example:

  $ perf annotate --stdio2

  Percent  IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)

                          Disassembly of section .text:

                          000000000003aac0 <random@@GLIBC_2.2.5>:
    8.32  3.28              sub    $0x18,%rsp
          3.28              mov    $0x1,%esi
          3.28              xor    %eax,%eax
          3.28              cmpl   $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
   11.57  3.28     1      ↓ je     20
                            lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
                          ↓ jne    29
                          ↓ jmp    43
   11.57  1.10        20:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
    0.00  1.10     1      ↓ je     43
                      29:   lea    __abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
                            sub    $0x80,%rsp
                          → callq  __lll_lock_wait_private
                            add    $0x80,%rsp
    0.00  3.00        43:   lea    __ctype_b@GLIBC_2.2.5+0x38,%rdi
          3.00              lea    0xc(%rsp),%rsi
    8.49  3.00     1      → callq  __random_r
    7.91  1.94              cmpl   $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
    0.00  1.94     1      ↓ je     68
                            lock   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
                          ↓ jne    70
                          ↓ jmp    8a
    0.00  2.00        68:   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
   21.56  2.00     1      ↓ je     8a
                      70:   lea    __abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
                            sub    $0x80,%rsp
                          → callq  __lll_unlock_wake_private
                            add    $0x80,%rsp
   21.56  2.90        8a:   movslq 0xc(%rsp),%rax
          2.90              add    $0x18,%rsp
    9.03  2.90     1      ← retq

It shows for this symbol the average IPC is 2.30 and the IPC coverage is
54.8%.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1543586097-27632-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:55:32 -03:00
Alexey Budankov 93f20c0fe3 perf record: Extend trace writing to multi AIO
Multi AIO trace writing allows caching more kernel data into userspace
memory postponing trace writing for the sake of overall profiling data
thruput increase. It could be seen as kernel data buffer extension into
userspace memory.

With an --aio option value different from 0 (default value is 1) the
tool has capability to cache more and more data into user space along
with delegating spill to AIO.

That allows avoiding to suspend at record__aio_sync() between calls of
record__mmap_read_evlist() and increases profiling data thruput at the
cost of userspace memory.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/050bb053-e7f3-aa83-fde7-f27ff90be7f6@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:55:11 -03:00
Alexey Budankov d3d1af6f01 perf record: Enable asynchronous trace writing
The trace file offset is read once before mmaps iterating loop and
written back after all performance data is enqueued for aio writing.

The trace file offset is incremented linearly after every successful aio
write operation.

record__aio_sync() blocks till completion of the started AIO operation
and then proceeds.

record__aio_mmap_read_sync() implements a barrier for all incomplete
aio write requests.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/ce2d45e9-d236-871c-7c8f-1bed2d37e8ac@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:55:08 -03:00
Alexey Budankov 0b77383134 perf mmap: Map data buffer for preserving collected data
The map->data buffer is used to preserve map->base profiling data for
writing to disk. AIO map->cblock is used to queue corresponding
map->data buffer for asynchronous writing.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5fcda10c-6c63-68df-383a-c6d9e5d1f918@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:55:01 -03:00
Wen Yang 19702894cd perf bpf: Use ERR_CAST instead of ERR_PTR(PTR_ERR())
Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...)).  This
makes it more readable and also fix this warning detected by
err_cast.cocci:

  tools/perf/util/bpf-loader.c:1606:11-18: WARNING: ERR_CAST can be used with op

Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wen Yang <yellowriver2010@hotmail.com>
Cc: zhong.weidong@zte.com.cn
Link: http://lkml.kernel.org/r/20181127090610.28488-1-wen.yang99@zte.com.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:54:36 -03:00
Adrian Hunter 692d0e6332 perf script: Use fallbacks for branch stacks
Branch stacks do not necessarily have the same cpumode as the 'ip'. Use
the fallback functions in those cases.

This patch depends on patch "perf tools: Add fallback functions for cases
where cpumode is insufficient".

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: stable@vger.kernel.org # 4.19
Link: http://lkml.kernel.org/r/20181106210712.12098-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:54:18 -03:00
Adrian Hunter 225f99e0c8 perf tools: Use fallback for sample_addr_correlates_sym() cases
thread__resolve() is used in the sample_addr_correlates_sym() cases
where 'addr' is a destination of a branch which does not necessarily
have the same cpumode as the 'ip'. Use the fallback function in that
case.

This patch depends on patch "perf tools: Add fallback functions for
cases where cpumode is insufficient".

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: stable@vger.kernel.org # 4.19
Link: http://lkml.kernel.org/r/20181106210712.12098-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:54:16 -03:00
Adrian Hunter 8e80ad9983 perf thread: Add fallback functions for cases where cpumode is insufficient
For branch stacks or branch samples, the sample cpumode might not be
correct because it applies only to the sample 'ip' and not necessary to
'addr' or branch stack addresses. Add fallback functions that can be
used to deal with those cases

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: stable@vger.kernel.org # 4.19
Link: http://lkml.kernel.org/r/20181106210712.12098-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:54:13 -03:00
Adrian Hunter ec1891afae perf machine: Record if a arch has a single user/kernel address space
Some architectures have a single address space for kernel and user
addresses, which makes it possible to determine if an address is in
kernel space or user space. Some don't, e.g.: sparc.

Cache that info in perf_env so that, for instance, code needing to
fallback failed symbol lookups at the kernel space in single address
space arches can lookup at userspace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: stable@vger.kernel.org # 4.19
Link: http://lkml.kernel.org/r/20181106210712.12098-2-adrian.hunter@intel.com
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:54:07 -03:00
Arnaldo Carvalho de Melo 804234f271 perf env: Also consider env->arch == NULL as local operation
We'll set a new machine field based on env->arch, which for live mode,
like with 'perf top' means we need to use uname() to figure the name of
the arch, fix perf_env__arch() to consider both (env == NULL) and
(env->arch == NULL) as local operation.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: stable@vger.kernel.org # 4.19
Link: https://lkml.kernel.org/n/tip-vcz4ufzdon7cwy8dm2ua53xk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:54:02 -03:00
Eric Saint-Etienne b18e088825 perf map: Remove extra indirection from map__find()
A double pointer is used in map__find() where a single pointer is enough
because the function doesn't affect the rbtree and the rbtree is locked.

Signed-off-by: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Eric Saint-Etienne <eric.saintetienne@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1542969759-24346-1-git-send-email-eric.saint.etienne@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:53:57 -03:00
Stephane Eranian bc4da38a47 perf stat: Fix CSV mode column output for non-cgroup events
When using the -x option, perf stat prints CSV-style output with one
event per line.  For each event, it prints the count, the unit, the
event name, the cgroup, and a bunch of other event specific fields (such
as insn per cycles).

When you use CSV-style mode, you expect a normalized output where each
event is printed with the same number of fields regardless of what it is
so it can easily be imported into a spreadsheet or parsed.

For instance, if an event does not have a unit, then print an empty
field for it.

Although this approach was implemented for the unit, it was not for the
cgroup.

When mixing cgroup and non-cgroup events, then non-cgroup events would
not show an empty field, instead the next field was printed, make
columns not line up correctly.

This patch fixes the cgroup output issues by forcing an empty field
for non-cgroup events as soon as one event has cgroup.

Before:

  <not counted> @ @cycles @foo    @ 0    @100.00@@
  2531614       @ @cycles @6420922@100.00@    @

foo cgroup lines up with time_running!

After:

  <not counted> @ @cycles @foo @0       @100.00@@
  2594834       @ @cycles @    @5287372 @100.00@@

Fields line up.

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1541587845-9150-1-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:53:41 -03:00
Ravi Bangoria 57ddf09173 perf stat: Fix shadow stats for clock events
Commit 0aa802a794 ("perf stat: Get rid of extra clock display
function") introduced scale and unit for clock events. Thus,
perf_stat__update_shadow_stats() now saves scaled values of clock events
in msecs, instead of original nsecs. But while calculating values of
shadow stats we still consider clock event values in nsecs. This results
in a wrong shadow stat values. Ex,

  # ./perf stat -e task-clock,cycles ls
    <SNIP>
              2.60 msec task-clock:u    #    0.877 CPUs utilized
         2,430,564      cycles:u        # 1215282.000 GHz

Fix this by saving original nsec values for clock events in
perf_stat__update_shadow_stats(). After patch:

  # ./perf stat -e task-clock,cycles ls
    <SNIP>
              3.14 msec task-clock:u    #    0.839 CPUs utilized
         3,094,528      cycles:u        #    0.985 GHz

Suggested-by: Jiri Olsa <jolsa@redhat.com>
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: yuzhoujian@didichuxing.com
Fixes: 0aa802a794 ("perf stat: Get rid of extra clock display function")
Link: http://lkml.kernel.org/r/20181116042843.24067-1-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:53:30 -03:00
Kan Liang f4a0742b3c perf pmu: Move *_cpuid_str() weak functions to header.c
The weak functions, strcmp_cpuid_str() and get_cpuid_str(), are defined
in pmu.c.

Most of the cpuid related functions, including *_cpuid_str()'s
declaration and platform specific definition, are in header.c/h.

To make the declaration and definition of all cpuid related functions in
a consistent place, move the weak functions to header.c.

There is no functional change.

Suggested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: http://lkml.kernel.org/r/20181121164939.13482-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 22:39:59 -03:00
Eric Saint-Etienne 1e6285699b perf symbols: Fix slowness due to -ffunction-section
Perf can take minutes to parse an image when -ffunction-section is used.
This is especially true with the kernel image when it is compiled this
way, which is the arm64 default since the patcheset "Enable deadcode
elimination at link time".

Perf organize maps using a rbtree. Whenever perf finds a new symbols, it
first searches this rbtree for the map it belongs to, by strcmp()'aring
section names.  When it finds the map with the right name, it uses it to
add the symbol. With a usual image there aren't so many maps but when
using -ffunction-section there's basically one map per function.  With
the kernel image that's north of 40,000 maps. For most symbols perf has
to parses the entire rbtree to eventually create a new map and add it.
Consequently perf spends most of the time browsing a rbtree that keeps
getting larger.

This performance fix introduces a secondary rbtree that indexes maps
based on the section name.

Signed-off-by: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reviewed-by: David Aldridge <david.aldridge@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1542822679-25591-1-git-send-email-eric.saint.etienne@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 22:39:59 -03:00
Kan Liang 3b54411a44 perf vendor events: Add stepping in CPUID string for x86
The perf tools cannot find the proper event list for the Cascadelake
server.  Because the Cascadelake server and the Skylake server have the
same CPU model number, which are used by the perf tools to find the
event list.

The stepping for Skylake server is up to 4.

The stepping for Cascadelake server starts from 5.

The stepping can be used to distinguish between them.

The stepping is added in get_cpuid_str().

The stepping information for Skylake server is updated in mapfile.csv.

A x86 specific strcmp_cpuid_cmp() function is added to handle two CPUID
formats in mapfile.csv, "vendor-family-model-stepping" and
"vendor-family-model":

- If a cpuid-regular-expression from the mapfile.csv using the new
  stepping format, a cpuid-string generated on the machine must include
  stepping. Otherwise, it is a mismatch.

- If the cpuid-regular-expression using the old non-stepping format,
  the stepping in the cpuid-string will be ignored.

The script, using environment string "PERF_CPUID" without stepping on
Skylake server, will be broken. If so, users must fix their scripts.

Committer notes:

Fixed this build error on centos:6 and debian:7:

  arch/x86/util/header.c: In function 'is_full_cpuid':
  arch/x86/util/header.c:82:39: error: declaration of 'cpuid' shadows a global declaration [-Werror=shadow]
  arch/x86/util/header.c:12:1: error: shadowed declaration is here [-Werror=shadow]
  arch/x86/util/header.c: In function 'strcmp_cpuid_str':
  arch/x86/util/header.c:98:56: error: declaration of 'cpuid' shadows a global declaration [-Werror=shadow]
  arch/x86/util/header.c:12:1: error: shadowed declaration is here [-Werror=shadow]
  cc1: all warnings being treated as errors

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20181114212416.15665-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 22:39:57 -03:00