Commit Graph

12322 Commits

Author SHA1 Message Date
Jiri Olsa 91a17d6f63 perf tests: Add daemon reconfig test
Add a test for daemon reconfiguration. The test changes the
configuration file and checks that the session is changed properly.

Committer testing:

  [root@five ~]# perf test daemon
  76: daemon operations                                               : Ok
  [root@five ~]# time perf test daemon
  76: daemon operations                                               : Ok

  real	0m6.055s
  user	0m0.174s
  sys	0m0.147s
  [root@five ~]# time perf test -v daemon
  76: daemon operations                                               :
  --- start ---
  test child forked, pid 786863
  test daemon list
  test daemon reconfig
  test child finished with 0
  ---- end ----
  daemon operations: Ok

  real	0m6.127s
  user	0m0.222s
  sys	0m0.165s
  [root@five ~]#

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-21-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 10:19:53 -03:00
Jiri Olsa 2291bb915b perf tests: Add daemon 'list' command test
Add test for basic perf daemon listing via the CSV output mode (-x
option).

Check that the configured sessions display expected values.

Committer testing:

  [root@five ~]# perf test daemon
  76: daemon operations                                               : Ok
  [root@five ~]#
  [root@five ~]# perf test -v daemon
  76: daemon operations                                               :
  --- start ---
  test child forked, pid 785037
  test daemon list
  test child finished with 0
  ---- end ----
  daemon operations: Ok
  [root@five ~]#

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-20-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 10:19:53 -03:00
Jiri Olsa 13fb3b9f5b perf daemon: Add examples to man page
Add usage examples to the man page.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-19-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 10:19:52 -03:00
Jiri Olsa 5bdee4f051 perf daemon: Add up time for daemon/session list
Display up time for both daemon and sessions.

Example:

  # cat ~/.perfconfig
  [daemon]
  base=/opt/perfdata

  [session-cycles]
  run = -m 10M -e cycles --overwrite --switch-output -a

  [session-sched]
  run = -m 20M -e sched:* --overwrite --switch-output -a

Starting the daemon:

  # perf daemon start

Get the details with up time:

  # perf daemon -v
  [778315:daemon] base: /opt/perfdata
    output:  /opt/perfdata/output
    lock:    /opt/perfdata/lock
    up:      15 minutes
  [778316:cycles] perf record -m 20M -e cycles --overwrite --switch-output -a
    base:    /opt/perfdata/session-cycles
    output:  /opt/perfdata/session-cycles/output
    control: /opt/perfdata/session-cycles/control
    ack:     /opt/perfdata/session-cycles/ack
    up:      10 minutes
  [778317:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a
    base:    /opt/perfdata/session-sched
    output:  /opt/perfdata/session-sched/output
    control: /opt/perfdata/session-sched/control
    ack:     /opt/perfdata/session-sched/ack
    up:      2 minutes

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-18-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 10:19:52 -03:00
Jiri Olsa 6d6162d51c perf daemon: Use control to stop session
Use the 'stop' control command to stop perf record session.  If that
fails, fall back to current SIGTERM/SIGKILL pair.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-17-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 10:19:52 -03:00
Jiri Olsa edcaa47958 perf daemon: Add 'ping' command
Add a 'ping' command to verify that the 'perf record' session is up and
operational.

It's used in the following patches via test code to make sure 'perf
record' is ready to receive signals.

Example:

  # cat ~/.perfconfig
  [daemon]
  base=/opt/perfdata

  [session-cycles]
  run = -m 10M -e cycles --overwrite --switch-output -a

  [session-sched]
  run = -m 20M -e sched:* --overwrite --switch-output -a

Start the daemon:

  # perf daemon start

Ping all sessions:

  # perf daemon ping
  OK   cycles
  OK   sched

Ping specific session:

  # perf daemon ping --session sched
  OK   sched

Committer notes:

Fixed up bug pointed by clang:

Buggy:

  if (!pollfd.revents & POLLIN)

Correct code:

  if (!(pollfd.revents & POLLIN))

clang warning:

  builtin-daemon.c:560:6: error: logical not is only applied to the left hand side of this bitwise operator [-Werror,-Wlogical-not-parentheses]
          if (!pollfd.revents & POLLIN) {
              ^               ~
  builtin-daemon.c:560:6: note: add parentheses after the '!' to evaluate the bitwise operator first

Also use designated initialized with pollfd, i.e.:

  struct pollfd pollfd = { .events = POLLIN, };

Instead of:

  struct pollfd pollfd = { 0, };

To get past:

    builtin-daemon.c:510:30: error: missing field 'events' initializer [-Werror,-Wmissing-field-initializers]
            struct pollfd pollfd = { 0, };
                                        ^
    1 error generated.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-16-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 10:19:52 -03:00
Jiri Olsa 6a6d1804a1 perf daemon: Set control fifo for session
Setup control fifos for session and add --control option to session
arguments.

Example:

  # cat ~/.perfconfig
  [daemon]
  base=/opt/perfdata

  [session-cycles]
  run = -m 10M -e cycles --overwrite --switch-output -a

  [session-sched]
  run = -m 20M -e sched:* --overwrite --switch-output -a

Starting the daemon:

  # perf daemon start

Use can list control fifos with (control and ack files):

  # perf daemon -v
  [776459:daemon] base: /opt/perfdata
    output:  /opt/perfdata/output
    lock:    /opt/perfdata/lock
  [776460:cycles] perf record -m 20M -e cycles --overwrite --switch-output -a
    base:    /opt/perfdata/session-cycles
    output:  /opt/perfdata/session-cycles/output
    control: /opt/perfdata/session-cycles/control
    ack:     /opt/perfdata/session-cycles/ack
  [776461:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a
    base:    /opt/perfdata/session-sched
    output:  /opt/perfdata/session-sched/output
    control: /opt/perfdata/session-sched/control
    ack:     /opt/perfdata/session-sched/ack

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-15-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 10:19:52 -03:00
Jiri Olsa 8c98be6c36 perf daemon: Allow only one daemon over base directory
Add 'lock' file under daemon base and flock it, so only one perf daemon
can run on top of it.

Each daemon tries to create and lock BASE/lock file, if it's successful
we are sure we're the only daemon running over the BASE.

Once daemon is finished, file descriptor to lock file is closed and lock
is released.

Example:

  # cat ~/.perfconfig
  [daemon]
  base=/opt/perfdata

  [session-cycles]
  run = -m 10M -e cycles --overwrite --switch-output -a

  [session-sched]
  run = -m 20M -e sched:* --overwrite --switch-output -a

Starting the daemon:

  # perf daemon start

And try once more:

  # perf daemon start
  failed: another perf daemon (pid 775594) owns /opt/perfdata

will end up with an error, because there's already one running
on top of /opt/perfdata.

Committer notes:

Provide lockf(F_TLOCK) when not available, i.e. transform:

  lockf(fd, F_TLOCK, 0);

into:

  flock(fd, LOCK_EX | LOCK_NB);

Which should be equivalent.

Noticed when cross building to some odd Android NDK.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-14-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 10:16:56 -03:00
Jiri Olsa 23c5831e2e perf daemon: Add 'stop' command
Add 'perf daemon stop' command to stop daemon process and all running
sessions.

Example:

  # cat ~/.perfconfig
  [daemon]
  base=/opt/perfdata

  [session-cycles]
  run = -m 10M -e cycles --overwrite --switch-output -a

  [session-sched]
  run = -m 20M -e sched:* --overwrite --switch-output -a

Start the daemon:

  # perf daemon start

Stop the daemon

  # perf daemon stop

Daemon is not running, nothing to connect to:

  # perf daemon
  connect error: Connection refused

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-13-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 10:02:54 -03:00
Jiri Olsa 2d6914cd59 perf daemon: Add 'signal' command
Allow the 'perf daemon' to send SIGUSR2 to all running sessions or just
to a specific session.

Example:

  # cat ~/.perfconfig
  [daemon]
  base=/opt/perfdata

  [session-cycles]
  run = -m 10M -e cycles --overwrite --switch-output -a

  [session-sched]
  run = -m 20M -e sched:* --overwrite --switch-output -a

Start the daemon:

  # perf daemon start

Send signal to all running sessions:

  # perf daemon signal
  signal 12 sent to session 'cycles [773738]'
  signal 12 sent to session 'sched [773739]'

Or to specific one:

  # perf daemon signal --session sched
  signal 12 sent to session 'sched [773739]'

And verify signals were delivered and perf.data dumped:

  # cat /opt/perfdata/session-cycles/output
  rounding mmap pages size to 32M (8192 pages)
  [ perf record: dump data: Woken up 1 times ]
  [ perf record: Dump perf.data.2021010220382490 ]

  # car /opt/perfdata/session-sched/output
  rounding mmap pages size to 32M (8192 pages)
  [ perf record: dump data: Woken up 1 times ]
  [ perf record: Dump perf.data.2021010220382489 ]
  [ perf record: dump data: Woken up 1 times ]
  [ perf record: Dump perf.data.2021010220393745 ]

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-12-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 10:02:54 -03:00
Jiri Olsa b325f7be25 perf daemon: Add 'list' command
Add a 'list' command to display all running sessions.  It's the default
command if no other command is specified.

Example:

  # cat ~/.perfconfig
  [daemon]
  base=/opt/perfdata

  [session-cycles]
  run = -m 10M -e cycles --overwrite --switch-output -a

  [session-sched]
  run = -m 20M -e sched:* --overwrite --switch-output -a

Start the daemon:

  # perf daemon start

List sessions:

  # perf daemon
  [771394:daemon] base: /opt/perfdata
  [771395:cycles] perf record -m 10M -e cycles --overwrite --switch-output -a
  [771396:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a

List sessions with more info:

  # perf daemon -v
  [771394:daemon] base: /opt/perfdata
    output:  /opt/perfdata/output
  [771395:cycles] perf record -m 10M -e cycles --overwrite --switch-output -a
    base:    /opt/perfdata/session-cycles
    output:  /opt/perfdata/session-cycles/output
  [771396:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a
    base:    /opt/perfdata/session-sched
    output:  /opt/perfdata/session-sched/output

The 'output' file is perf record output for specific session.

Note you have to stop all running perf processes manually at this point,
stop command is coming in following patches.

Committer notes:

Fixup union initialization to overcome this in multiple older systems:

  22    15.74 debian:8                      : FAIL gcc version 4.9.2 (Debian 4.9.2-10+deb8u2)

    builtin-daemon.c: In function 'send_cmd_list':
    builtin-daemon.c:1386:2: error: missing initializer for field 'csv_sep' of 'struct <anonymous>' [-Werror=missing-field-initializers]
      };
      ^
    builtin-daemon.c:641:8: note: 'csv_sep' declared here
       char csv_sep;
            ^
    cc1: all warnings being treated as errors

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-11-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 10:02:54 -03:00
Jiri Olsa 12c1a415eb perf daemon: Add signalfd support
Use a signalfd fd to track SIGCHLD signals as notifications for perf
session termination.

This way we don't need to actively check for child status, being
notified if there's change.

Suggested-by: Alexei Budankov <abudankov@huawei.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 10:02:54 -03:00
Jiri Olsa 88adb1194c perf daemon: Add background support
Add support to put the daemon process in the background.

It's now enabled by default and -f option is added to keep the daemon
process on the console for debugging.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 10:02:54 -03:00
Jiri Olsa 3cda062520 perf daemon: Add config file change check
Add support to detect changes to the daemon's config file triggering a
re-read of the configuration when that happens.

Use a inotify file descriptor plugged into the main fdarray object for
polling.

Example:

  # cat ~/.perfconfig
  [daemon]
  base=/opt/perfdata

  [session-cycles]
  run = -m 10M -e cycles --overwrite --switch-output -a

Starting the daemon:

  # perf daemon start

Check sessions:

  # perf daemon
  [772262:daemon] base: /opt/perfdata
  [772263:cycles] perf record -m 10M -e cycles --overwrite --switch-output -a

Change '-m 10M' to '-m 20M', and check daemon log:

  # tail -f /opt/perfdata/output
  [2021-01-02 20:31:41.234045] daemon started (pid 772262)
  [2021-01-02 20:31:41.235072] reconfig: ruining session [cycles:772263]: -m 10M -e cycles --overwrite --switch-output -a
  [2021-01-02 20:32:08.310137] reconfig: session 'cycles' killed
  [2021-01-02 20:32:08.310847] reconfig: ruining session [cycles:772338]: -m 20M -e cycles --overwrite --switch-output -a

And the session list:

  # perf daemon
  [772262:daemon] base: /opt/perfdata
  [772338:cycles] perf record -m 20M -e cycles --overwrite --switch-output -a

Note the changed '-m 20M' option is in place.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 10:02:54 -03:00
Jiri Olsa c0666261ff perf daemon: Add config file support
Adding support to configure daemon with config file.

Each client or server invocation of perf daemon needs to know the
base directory, where all sessions data is stored.

The base is defined with:

  daemon.base
    Base path for daemon data. All sessions data are stored under
    this path.

The daemon allows to create record sessions. Each session is a
record command spawned and monitored by perf daemon.

The session is defined with:

  session-<NAME>.run
    Defines new record session for daemon. The value is record's
    command line without the 'record' keyword.

Example:

  # cat ~/.perfconfig
  [daemon]
  base=/opt/perfdata

  [session-cycles]
  run = -m 10M -e cycles --overwrite --switch-output -a

  [session-sched]
  run = -m 20M -e sched:* --overwrite --switch-output -a

The example above defines '/opt/perfdata' as the base directory and 2
record sessions.

  # perf daemon start
  [2021-01-28 19:47:33.454413] daemon started (pid 16015)
  [2021-01-28 19:47:33.455910] reconfig: ruining session [cycles:16016]: -m 10M -e cycles --overwrite --switch-output -a
  [2021-01-28 19:47:33.456599] reconfig: ruining session [sched:16017]: -m 20M -e sched:* --overwrite --switch-output -a

  # ps -ef | grep perf
  ... perf daemon start
  ... /home/jolsa/.../perf record -m 20M -e cycles --overwrite --switch-output -a
  ... /home/jolsa/.../perf record -m 20M -e sched:* --overwrite --switch-output -a

The base directory is populated with:

  # find /opt/perfdata/
  /opt/perfdata/
  /opt/perfdata/control                    <- control socket
  /opt/perfdata/session-cycles             <- data for session 'cycles':
  /opt/perfdata/session-cycles/output      <-   perf record output
  /opt/perfdata/session-cycles/perf.data   <-   perf data
  /opt/perfdata/session-sched              <- ditto for session 'sched'
  /opt/perfdata/session-sched/output
  /opt/perfdata/session-sched/perf.data

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 10:02:54 -03:00
Jiri Olsa 90b0aad8f6 perf daemon: Add client socket support
Add support for client socket side that will be used to send commands to
the daemon server socket.

This patch adds only the core support, all commands using this
functionality are coming in the following patches.

Committer notes:

Hat to patch patch it to deal with this in some systems:

  cc1: warnings being treated as errors
  builtin-daemon.c: In function 'send_cmd':  MKDIR    /tmp/build/perf/bench/

  builtin-daemon.c:1368: error: ignoring return value of 'fwrite', declared with attribute warn_unused_result
    MKDIR    /tmp/build/perf/tests/
  make[3]: *** [/tmp/build/perf/builtin-daemon.o] Error 1

And also to not leak the 'line' buffer allocated by getline(), since you
initialized line to NULL and len to zero, man page says:

  If *lineptr is set to NULL and *n is set 0 before the call,
  then getline() will allocate a buffer for storing the line.
  This buffer should be freed by the user program even if
  getline() failed.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 09:52:28 -03:00
Jiri Olsa ed36b7042f perf daemon: Add server socket support
Add support to create a server socket that listens for client commands
and processes them.

This patch adds only the core support, all commands using this
functionality are coming in the following patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-09 16:23:10 -03:00
Jiri Olsa 5631d100f9 perf daemon: Add base option
Add a base option allowing the user to specify a base directory.  It
will have precedence over config file base definition coming in the
following patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-09 15:57:52 -03:00
Jiri Olsa fc1dcb1e56 perf daemon: Add config option
Add a config option and base functionality that takes the option
argument (if specified) and other system config locations and produces
an 'acting' config file path.

The actual config file processing is coming in following patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-09 15:56:20 -03:00
Jiri Olsa d450bc501f perf daemon: Add daemon command
Add a daemon skeleton with a minimal base (non) functionality, covering
various setup in start command.

Add an initial perf-daemon.txt with basic info.

This is in response to pople asking for the possibility to be able run
record long running sessions on the background.

The patchset that starts with this adds support to configure and run
record sessions on background via new 'perf daemon' command.

This is useful for being able to use perf as a flight recorder that one
can interact with asking for events to be enabled or disabled, added or
removed, etc.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210208200908.1019149-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-09 15:42:57 -03:00
Yang Li 8524711d2c perf script: Simplify bool conversion
Fix the following coccicheck warning:
  ./tools/perf/builtin-script.c:2789:36-41: WARNING: conversion to bool
  not needed here
  ./tools/perf/builtin-script.c:3237:48-53: WARNING: conversion to bool
  not needed here

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/1612773936-98691-1-git-send-email-yang.lee@linux.alibaba.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-09 14:58:56 -03:00
Arnaldo Carvalho de Melo 6db59d357e perf arm64/s390: Fix printf conversion specifier for IP addresses
We need to use "%#" PRIx64 for u64 values, not "%lx". In arm64's and
s390x cases the compiler doesn't complain, but lets fix this in case
this code gets copied to a 32-bit arch, like with powerpc 32-bit that
got fixed in the previous patch.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hewenliang <hewenliang4@huawei.com>
Cc: Hu Shiyuan <hushiyuan@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-09 10:19:50 -03:00
Arnaldo Carvalho de Melo 0f000f9c89 perf powerpc: Fix printf conversion specifier for IP addresses
We need to use "%#" PRIx64 for u64 values, not "%lx", fixing this build
problem on powerpc 32-bit:

  72    13.69 ubuntu:18.04-x-powerpc        : FAIL powerpc-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
    arch/powerpc/util/machine.c: In function 'arch__symbols__fixup_end':
    arch/powerpc/util/machine.c:23:12: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'u64 {aka long long unsigned int}' [-Werror=format=]
      pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end);
                ^
    /git/linux/tools/perf/util/debug.h:18:21: note: in definition of macro 'pr_fmt'
     #define pr_fmt(fmt) fmt
                         ^~~
    /git/linux/tools/perf/util/debug.h:33:29: note: in expansion of macro 'pr_debugN'
     #define pr_debug4(fmt, ...) pr_debugN(4, pr_fmt(fmt), ##__VA_ARGS__)
                                 ^~~~~~~~~
    /git/linux/tools/perf/util/debug.h:33:42: note: in expansion of macro 'pr_fmt'
     #define pr_debug4(fmt, ...) pr_debugN(4, pr_fmt(fmt), ##__VA_ARGS__)
                                              ^~~~~~
    arch/powerpc/util/machine.c:23:2: note: in expansion of macro 'pr_debug4'
      pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end);
      ^~~~~~~~~
    cc1: all warnings being treated as errors
    /git/linux/tools/build/Makefile.build:139: recipe for target 'util' failed
    make[5]: *** [util] Error 2
    /git/linux/tools/build/Makefile.build:139: recipe for target 'powerpc' failed
    make[4]: *** [powerpc] Error 2
    /git/linux/tools/build/Makefile.build:139: recipe for target 'arch' failed
    make[3]: *** [arch] Error 2
  73    30.47 ubuntu:18.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

Fixes: 557c3eadb7 ("perf powerpc: Fix gap between kernel end and module start")
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-09 09:41:21 -03:00
Jin Yao 61d9fc4449 perf script: Support filtering by hex address
'perf script' supports '-S' or '--symbol' options to only list the
records for these symbols. A symbol is typically a name or hex address.
If it's hex address, it is the start address of one symbol.

While it would be useful if we can filter trace records by any hex
address (not only the start address of symbol). So now we support
filtering trace records by more conditions, such as:

- symbol name
- start address of symbol
- any hexadecimal address
- address range

The comparison order is defined as:

1. symbol name comparison
2. symbol start address comparison.
3. any hexadecimal address comparison.
4. address range comparison.

The idea is if we can get a valid address from -S list, we add the
address to addr_list for address comparison otherwise we still leave
it to sym_list for symbol comparison.

Some examples:

  root@kbl-ppc:~# ./perf script -S ffffffff9a477308
            perf  8562 [000] 347303.578858:          1   cycles:  ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms])
            perf  8562 [000] 347303.578860:          1   cycles:  ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms])
            perf  8562 [000] 347303.578861:         11   cycles:  ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms])
            perf  8562 [001] 347303.578903:          1   cycles:  ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms])
            perf  8562 [001] 347303.578905:          1   cycles:  ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms])
            perf  8562 [001] 347303.578906:         15   cycles:  ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms])
            perf  8562 [002] 347303.578952:          1   cycles:  ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms])
            perf  8562 [002] 347303.578953:          1   cycles:  ffffffff9a477308 native_write_msr+0x8 ([kernel.kallsyms])

Filter the traced records by hex address ffffffff9a477308.

  root@kbl-ppc:~# ./perf script -S ffffffff9a4dd4ce,ffffffff9a4d2de9,ffffffff9a6bf9f4
            perf  8562 [001] 347303.578911:     311706   cycles:  ffffffff9a6bf9f4 __kmalloc_node+0x204 ([kernel.kallsyms])
            perf  8562 [002] 347303.578960:     354477   cycles:  ffffffff9a4d2de9 sched_setaffinity+0x49 ([kernel.kallsyms])
            perf  8562 [003] 347303.579015:     450958   cycles:  ffffffff9a4dd4ce dequeue_task_fair+0x1ae ([kernel.kallsyms])

Filter the traced records by hex address ffffffff9a4dd4ce, ffffffff9a4d2de9, ffffffff9a6bf9f4.

  root@kbl-ppc:~# ./perf script -S ffffffff9a477309 --addr-range 16
            perf  8562 [000] 347303.578863:        291   cycles:  ffffffff9a47730a native_write_msr+0xa ([kernel.kallsyms])
            perf  8562 [001] 347303.578907:        411   cycles:  ffffffff9a47730a native_write_msr+0xa ([kernel.kallsyms])
            perf  8562 [002] 347303.578956:        462   cycles:  ffffffff9a47730f native_write_msr+0xf ([kernel.kallsyms])
            perf  8562 [003] 347303.579010:        497   cycles:  ffffffff9a47730f native_write_msr+0xf ([kernel.kallsyms])
            perf  8562 [004] 347303.579059:        429   cycles:  ffffffff9a47730f native_write_msr+0xf ([kernel.kallsyms])
            perf  8562 [005] 347303.579109:        408   cycles:  ffffffff9a47730a native_write_msr+0xa ([kernel.kallsyms])
            perf  8562 [006] 347303.579159:        460   cycles:  ffffffff9a47730f native_write_msr+0xf ([kernel.kallsyms])
            perf  8562 [007] 347303.579213:        436   cycles:  ffffffff9a47730f native_write_msr+0xf ([kernel.kallsyms])

Filter the traced records from address range [ffffffff9a477309, ffffffff9a477309 + 15].

  root@kbl-ppc:~# ./perf script -S "ffffffff9b163046,rcu_nmi_exit"
            perf  8562 [004] 347303.579060:      12013   cycles:  ffffffff9b163046 exc_nmi+0x166 ([kernel.kallsyms])
            perf  8562 [007] 347303.579214:      12138   cycles:  ffffffff9b165944 rcu_nmi_exit+0x34 ([kernel.kallsyms])

Filter by address + symbol

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210207080935.31784-2-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 17:09:11 -03:00
Jin Yao 94253393df perf intlist: Change 'struct intlist' int member to 'unsigned long'
This is to let intlist support addresses as its payload.

One potential problem is it can't support negative number. But so far,
there is no such kind of use case.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210207080935.31784-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 17:02:00 -03:00
Paul Cercueil a81fbb8771 perf stat: Use nftw() instead of ftw()
ftw() has been obsolete for about 12 years now.

Committer notes:

Further notes provided by the patch author:

    "NOTE: Not runtime-tested, I have no idea what I need to do in perf
     to test this. But at least it compiles now with my uClibc-based
     toolchain."

I looked at the nftw()/ftw() man page and for the use made with cgroups
in 'perf stat' the end result is equivalent.

Fixes: bb1c15b60b ("perf stat: Support regex pattern in --for-each-cgroup")
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: od@zcrc.me
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20210208181157.1324550-1-paul@crapouillou.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 16:39:14 -03:00
Kan Liang 7d91e8181d perf tools: Update topdown documentation for Sapphire Rapids
Update Topdown extension on Sapphire Rapids and how to collect the L2
events.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/1612296553-21962-10-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 16:25:00 -03:00
Kan Liang 63e39aa6ae perf stat: Support L2 Topdown events
The TMA method level 2 metrics is supported from the Intel Sapphire
Rapids server, which expose four L2 Topdown metrics events to user
space. There are eight L2 events in total. The other four L2 Topdown
metrics events are calculated from the corresponding L1 and the exposed
L2 events.

Now, the --topdown prints the complete top-down metrics that supported
by the CPU. For the Intel Sapphire Rapids server, there are 4 L1 events
and 8 L2 events displyed in one line.

Add a new option, --td-level, to display the top-down statistics that
equal to or lower than the input level.

The L2 event is marked only when both its L1 parent event and itself
crosse the threshold.

Here is an example:

  $ perf stat --topdown --td-level=2 --no-metric-only sleep 1
  Topdown accuracy may decrease when measuring long periods.
  Please print the result regularly, e.g. -I1000

  Performance counter stats for 'sleep 1':

     16,734,390   slots
      2,100,001   topdown-retiring       # 12.6% retiring
      2,034,376   topdown-bad-spec       # 12.3% bad speculation
      4,003,128   topdown-fe-bound       # 24.1% frontend bound
        328,125   topdown-heavy-ops      #  2.0% heavy operations    #  10.6% light operations
      1,968,751   topdown-br-mispredict  # 11.9% branch mispredict   #  0.4% machine clears
      2,953,127   topdown-fetch-lat      # 17.8% fetch latency       #  6.3% fetch bandwidth
      5,906,255   topdown-mem-bound      # 35.6% memory bound        #  15.4% core bound

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/1612296553-21962-9-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 16:25:00 -03:00
Kan Liang c7444297fd perf test: Support PERF_SAMPLE_WEIGHT_STRUCT
Support the new sample type for sample-parsing test case.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/1612296553-21962-8-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 16:25:00 -03:00
Kan Liang 590db42de0 perf report: Support instruction latency
The instruction latency information can be recorded on some platforms,
e.g., the Intel Sapphire Rapids server. With both memory latency
(weight) and the new instruction latency information, users can easily
locate the expensive load instructions, and also understand the time
spent in different stages. The users can optimize their applications in
different pipeline stages.

The 'weight' field is shared among different architectures. Reusing the
'weight' field may impacts other architectures. Add a new field to store
the instruction latency.

Like the 'weight' support, introduce a 'ins_lat' for the global
instruction latency, and a 'local_ins_lat' for the local instruction
latency version.

Add new sort functions, INSTR Latency and Local INSTR Latency,
accordingly.

Add local_ins_lat to the default_mem_sort_order[].

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/1612296553-21962-7-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 16:25:00 -03:00
Kan Liang ea8d0ed6ea perf tools: Support PERF_SAMPLE_WEIGHT_STRUCT
The new sample type, PERF_SAMPLE_WEIGHT_STRUCT, is an alternative of the
PERF_SAMPLE_WEIGHT sample type. Users can apply either the
PERF_SAMPLE_WEIGHT sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample
type to retrieve the sample weight, but they cannot apply both sample
types simultaneously.

The new sample type shares the same space as the PERF_SAMPLE_WEIGHT
sample type. The lower 32 bits are exactly the same for both sample
type. The higher 32 bits may be different for different architecture.

Add arch specific arch_evsel__set_sample_weight() to set the new sample
type for X86. Only store the lower 32 bits for the sample->weight if the
new sample type is applied. In practice, no memory access could last
than 4G cycles. No data will be lost.

If the kernel doesn't support the new sample type. Fall back to the
PERF_SAMPLE_WEIGHT sample type.

There is no impact for other architectures.

Committer notes:

Fixup related to PERF_SAMPLE_CODE_PAGE_SIZE, present in acme/perf/core
but not upstream yet.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/1612296553-21962-6-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 16:25:00 -03:00
Kan Liang d9d5d767b2 perf c2c: Support data block and addr block
'perf c2c' is also a memory profiling tool. Apply the two new data
source fields to 'perf c2c' as well.

Extend 'perf c2c' to display the number of loads which blocked by data or
address conflict.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/1612296553-21962-5-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 16:25:00 -03:00
Kan Liang a054c2989f perf tools: Support data block and addr block
Two new data source fields, to indicate the block reasons of a load
instruction, are introduced on the Intel Sapphire Rapids server. The
fields can be used by the memory profiling.

Add a new sort function, SORT_MEM_BLOCKED, for the two fields.

For the previous platforms or the block reason is unknown, print "N/A"
for the block reason.

Add blocked as a default mem sort key for perf report and perf mem
report.

Committer testing:

So in machines without this capability we get a "N/A" filling the new "Blocked"
column:

  $ perf mem record ls
  arch     certs	 CREDITS  Documentation  include  ipc     Kconfig  lib       MAINTAINERS  mm   samples  security  usr    block
  COPYING	 crypto	 drivers  fs             init     Kbuild  kernel   LICENSES  Makefile     net  README   scripts   sound  tools
  virt
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.008 MB perf.data (17 samples) ]
  $
  $ perf mem report --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  # Total Lost Samples: 0
  #
  # Samples: 6  of event 'cpu/mem-loads,ldlat=30/Pu'
  # Total weight : 1381
  # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked
  #
  # Overhead  Samples  Local Weight  Memory access         Symbol                   Shared Object  Data Symbol             Data Object   Snoop  TLB access    Locked  Blocked
  # ........  .......  ............  ....................  .......................  .............  ......................  ............  .....  ............  ......  .......
  #
      32.87%        1  454           Local RAM or RAM hit  [.] _dl_relocate_object  ld-2.31.so     [.] 0x00007fe91cef3078  libc-2.31.so  Hit    L1 or L2 hit  No       N/A
      25.56%        1  353           LFB or LFB hit        [.] strcmp               ld-2.31.so     [.] 0x00005586973855ca  ls            None   L1 or L2 hit  No       N/A
      22.59%        1  312           LFB or LFB hit        [.] _dl_cache_libcmp     ld-2.31.so     [.] 0x00007fe91d0e3b18  ld.so.cache   None   L1 or L2 hit  No       N/A
       8.47%        1  117           LFB or LFB hit        [.] _dl_relocate_object  ld-2.31.so     [.] 0x00007fe91ceee570  libc-2.31.so  None   L1 or L2 hit  No       N/A
       6.88%        1  95            LFB or LFB hit        [.] _dl_relocate_object  ld-2.31.so     [.] 0x00007fe91ceed490  libc-2.31.so  None   L1 or L2 hit  No       N/A
       3.62%        1  50            LFB or LFB hit        [.] _dl_cache_libcmp     ld-2.31.so     [.] 0x00007fe91d0ebe60  ld.so.cache   None   L1 or L2 hit  No       N/A

  # Samples: 11  of event 'cpu/mem-stores/Pu'
  # Total weight : 11
  # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked
  #
  # Overhead  Samples  Local Weight  Memory access  Symbol                   Shared Object  Data Symbol             Data Object  Snoop  TLB access  Locked  Blocked
  # ........  .......  ............  .............  .......................  .............  ......................  ...........  .....  ..........  ......  .......
  #
       9.09%        1  0             L1 hit         [.] __strcoll_l          libc-2.31.so   [.] 0x00007fffe5648fc8  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] _dl_lookup_symbol_x  ld-2.31.so     [.] 0x00007fffe56490b8  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] _dl_name_match_p     ld-2.31.so     [.] 0x00007fffe56487d8  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] _dl_start            ld-2.31.so     [.] start_time+0x0      ld-2.31.so   N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] _dl_sysdep_start     ld-2.31.so     [.] 0x00007fffe56494b8  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] do_lookup_x          ld-2.31.so     [.] 0x00007fffe5648ff8  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] do_lookup_x          ld-2.31.so     [.] 0x00007fffe5649064  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 hit         [.] do_lookup_x          ld-2.31.so     [.] 0x00007fffe5649130  [stack]      N/A    N/A         N/A      N/A
       9.09%        1  0             L1 miss        [.] _dl_start            ld-2.31.so     [.] _rtld_global+0xaf8  ld-2.31.so   N/A    N/A         N/A      N/A
       9.09%        1  0             L1 miss        [.] _dl_start            ld-2.31.so     [.] _rtld_global+0xc28  ld-2.31.so   N/A    N/A         N/A      N/A
       9.09%        1  0             L1 miss        [.] _dl_start            ld-2.31.so     [.] 0x00007fffe56495b8  [stack]      N/A    N/A         N/A      N/A

  # (Tip: Show user configuration overrides: perf config --user --list)
  $

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/1612296553-21962-4-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 16:25:00 -03:00
Kan Liang 2a57d40832 perf tools: Support the auxiliary event
On the Intel Sapphire Rapids server, an auxiliary event has to be
enabled simultaneously with the load latency event to retrieve complete
Memory Info.

Add X86 specific perf_mem_events__name() to handle the auxiliary event.

- Users are only interested in the samples of the mem-loads event.
  Sample read the auxiliary event.

- The auxiliary event must be in front of the load latency event in a
  group. Assume the second event to sample if the auxiliary event is the
  leader.

- Add a weak is_mem_loads_aux_event() to check the auxiliary event for
  X86. For other ARCHs, it always return false.

Parse the unique event name, mem-loads-aux, for the auxiliary event.

Committer notes:

According to 61b985e3e7 ("perf/x86/intel: Add perf core PMU
support for Sapphire Rapids"), ENODATA is only returned by
sys_perf_event_open() when used with these auxiliary events, with this
in evsel__open_strerror():

       case ENODATA:
               return scnprintf(msg, size, "Cannot collect data source with the load latency event alone. "
                                "Please add an auxiliary event in front of the load latency event.");

This is Ok at this point in time, but fragile long term, I pointed this
out in the e-mail thread, requesting a follow up patch to check if
ENODATA is really for this specific case.

Fixed up sizeof(MEM_LOADS_AUX_NAME) bug pointed out by Namhyung.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210205152648.GC920417@kernel.org
Link: http://lore.kernel.org/lkml/1612296553-21962-3-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 16:25:00 -03:00
Athira Rajeev 068aeea377 perf powerpc: Support exposing Performance Monitor Counter SPRs as part of extended regs
To enable presenting of Performance Monitor Counter Registers (PMC1 to
PMC6) as part of extended regsiters, this patch adds these to
sample_reg_mask in the tool side (to use with -I? option).

Simplified the PERF_REG_PMU_MASK_300/31 definition. Excluded the
unsupported SPRs (MMCR3, SIER2, SIER3) from extended mask value for
CPU_FTR_ARCH_300.

Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 16:25:00 -03:00
Jianlin Lv 900547dd0f perf probe: Add protection to avoid endless loop
if dwarf_offdie() returns NULL, the continue statement forces the next
iteration of the loop without updating the 'off' variable. It will cause
an endless loop in the process of traversing the compile unit.  So add
exception protection for looping CUs.

Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: jianlin.lv@arm.com
Link: http://lore.kernel.org/lkml/20210203145702.1219509-1-Jianlin.Lv@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 16:25:00 -03:00
Ian Rogers d2e31d7e3f perf trace-event-info: Rename for_each_event.
Avoid a naming conflict with for_each_event with similar code in
parse-events.c, rename to for_each_event_tps.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210203052659.2975736-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:13:53 -03:00
Athira Rajeev 557c3eadb7 perf powerpc: Fix gap between kernel end and module start
Running "perf mem report" in TUI mode fails with ENOMEM message in
powerpc:

  failed to process sample

Running with debug and verbose options points that issue is while
allocating memory for sample histograms.

The error path is:

  symbol__inc_addr_samples() ->
    __symbol__inc_addr_samples() ->
      annotated_source__histogram()

symbol__inc_addr_samples() calls annotated_source__alloc_histograms ()
to allocate memory for sample histograms using calloc(). Here calloc()
fails since the size of symbol is huge. The size of a symbol is
calculated as difference between its start and end address.

Example histogram allocation that fails is:

  sym->name is _end
  sym->start is 0xc0000000027a0000
  sym->end is 0xc008000003890000
  symbol__size(sym) is 0x80000010f0000

In the above case, the difference between sym->start
(0xc0000000027a0000) and sym->end (0xc008000003890000) is huge.

This is same problem as in s390 and arm64 which are fixed in commits:

  b9c0a64901 ("perf annotate: Fix s390 gap between kernel end and module start")
  78886f3ed3 ("perf symbols: Fix arm64 gap between kernel start and module end")

When this symbol was read first, its start and end address was set to
address which matches with data from /proc/kallsyms.

After symbol__new():

  symbol__new: _end 0xc0000000027a0000-0xc0000000027a0000

  From /proc/kallsyms:
  ...
  c000000002799370 b backtrace_flag
  c000000002799378 B radix_tree_node_cachep
  c000000002799380 B __bss_stop
  c0000000027a0000 B _end
  c008000003890000 t icmp_checkentry      [ip_tables]
  c008000003890038 t ipt_alloc_initial_table      [ip_tables]
  c008000003890468 T ipt_do_table [ip_tables]
  c008000003890de8 T ipt_unregister_table_pre_exit        [ip_tables]
  ...

Perf calls function symbols__fixup_end() which sets the end of symbol to
0xc008000003890000, which is the next address and this is the start
address of first module (icmp_checkentry in above) which will make the
huge symbol size of 0x80000010f0000.

After symbols__fixup_end:

  symbols__fixup_end: sym->name: _end
  sym->start: 0xc0000000027a0000
  sym->end: 0xc008000003890000

On powerpc, kernel text segment is located at 0xc000000000000000 whereas
the modules are located at very high memory addresses,
0xc00800000xxxxxxx. Since the gap between end of kernel text segment and
beginning of first module's address is high, histogram allocation using
calloc fails.

Fix this by detecting the kernel's last symbol and limiting the range of
last kernel symbol to pagesize.

Signed-off-by: Athira Rajeev<atrajeev@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-By: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1609208054-1566-1-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:44 -03:00
Yonatan Goldschmidt 67dec92693 perf inject jit: Add namespaces support
This patch fixes "perf inject --jit" to properly operate on
namespaced/containerized processes:

* jitdump files are generated by the process, thus they should be
  looked up in its mount NS.

* DSOs of injected MMAP events will later be looked up in the process
  mount NS, so write them into its NS.

* PIDs & TIDs from jitdump events need to be translated to the PID as
  seen by "perf record" before written into MMAP events.

For a process in a different PID NS, the TID & PID given in the jitdump
event are actually ignored; I use the TID & PID of the thread which
mmap()ed the jitdump file. This is simplified and won't do for forks of
the initial process, if they continue using the same jitdump file.
Future patches might improve it.

This was tested by recording a NodeJS process running with
"--perf-prof", inside a Docker container, and by recording another
NodeJS process running in the same namespaces as perf itself, to make
sure it's not broken for non-containerized processes.

Signed-off-by: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20201105015604.1726943-1-yonatan.goldschmidt@granulate.io
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:44 -03:00
Yonatan Goldschmidt 2b51c71be5 perf namespaces: Add 'in_pidns' to nsinfo struct
Provides an accurate mean to determine if the owner thread is in a
different PID namespace.

Signed-off-by: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20201105015418.1725218-1-yonatan.goldschmidt@granulate.io
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:44 -03:00
Namhyung Kim 473f742e18 perf tools: Use scandir() to iterate threads when synthesizing PERF_RECORD_ events
Like in __event__synthesize_thread(), I think it's better to use
scandir() instead of the readdir() loop.  In case some malicious task
continues to create new threads, the readdir() loop will run over and
over to collect tids.  The scandir() also has the problem but the window
is much smaller since it doesn't do much work during the iteration.

Also add filter_task() function as we only care the tasks.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210202090118.2008551-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:44 -03:00
Namhyung Kim c1b907953b perf tools: Skip PERF_RECORD_MMAP event synthesis for kernel threads
To synthesize information to resolve sample IPs, it needs to scan task
and mmap info from the /proc filesystem.  For each process, it opens
(and reads) status and maps file respectively.  But as kernel threads
don't have memory maps so we can skip the maps file.

To find kernel threads, check "VmPeak:" line in /proc/<PID>/status file.
It's about the peak virtual memory usage so only user-level tasks have
that.  Note that it's possible to miss the line due to partial reads.
So we should double-check if it's a really kernel thread when there's no
VmPeak line.

Thus check "Threads:" line (which follows the VmPeak line whether or not
it exists) to be sure it's read enough data - just in case of deeply
nested pid namespaces or large number of supplementary groups are
involved.

This is for user process:

  $ head -40 /proc/1/status
  Name:	systemd
  Umask:	0000
  State:	S (sleeping)
  Tgid:	1
  Ngid:	0
  Pid:	1
  PPid:	0
  TracerPid:	0
  Uid:	0	0	0	0
  Gid:	0	0	0	0
  FDSize:	256
  Groups:
  NStgid:	1
  NSpid:	1
  NSpgid:	1
  NSsid:	1
  VmPeak:	  234192 kB           <-- here
  VmSize:	  169964 kB
  VmLck:	       0 kB
  VmPin:	       0 kB
  VmHWM:	   29528 kB
  VmRSS:	    6104 kB
  RssAnon:	    2756 kB
  RssFile:	    3348 kB
  RssShmem:	       0 kB
  VmData:	   19776 kB
  VmStk:	    1036 kB
  VmExe:	     784 kB
  VmLib:	    9532 kB
  VmPTE:	     116 kB
  VmSwap:	    2400 kB
  HugetlbPages:	       0 kB
  CoreDumping:	0
  THP_enabled:	1
  Threads:	1                     <-- and here
  SigQ:	1/62808
  SigPnd:	0000000000000000
  ShdPnd:	0000000000000000
  SigBlk:	7be3c0fe28014a03
  SigIgn:	0000000000001000

And this is for kernel thread:

  $ head -20 /proc/2/status
  Name:	kthreadd
  Umask:	0000
  State:	S (sleeping)
  Tgid:	2
  Ngid:	0
  Pid:	2
  PPid:	0
  TracerPid:	0
  Uid:	0	0	0	0
  Gid:	0	0	0	0
  FDSize:	64
  Groups:
  NStgid:	2
  NSpid:	2
  NSpgid:	0
  NSsid:	0
  Threads:	1                     <-- here
  SigQ:	1/62808
  SigPnd:	0000000000000000
  ShdPnd:	0000000000000000

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210202090118.2008551-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:44 -03:00
Namhyung Kim 30626e0844 perf tools: Use /proc/<PID>/task/<TID>/status for PERF_RECORD_ event synthesis
To save memory usage, it needs to reduce the number of entries in the
proc filesystem.  It's using /proc/<PID>/task directory to traverse
threads in the process and then kernel creates /proc/<PID>/task/<TID>
entries.

After that it checks the thread info using the /proc/<TID>/status file
rather than /proc/<PID>/task/<TID>/status.  As far as I can see, they
are the same and contain all the info we need.

Using the latter eliminates the unnecessary /proc/<TID> entry.  This can
be useful especially a large number of threads are used in the system.
In my experiment around 1KB of memory on average was saved for each
thread (which is not a thread group leader).

To do this, pass both pid and tid to perf_event_prepare_comm() if it
knows them.  In case it doesn't know, passing 0 as pid will do the old
way.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210202090118.2008551-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:44 -03:00
John Garry c3a9cdef61 perf vendor events arm64: Reference common and uarch events for A76
Reduce duplication in the JSONs by referencing standard events from
armv8-common-and-microarch.json

In general the "PublicDescription" fields are not modified when somewhat
significantly worded differently than the standard.

Apart from that, description and names for events slightly different to
standard are changed (to standard) for consistency.

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@openeuler.org
Link: https://lore.kernel.org/r/1611835236-34696-5-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:44 -03:00
John Garry d02d5dc882 perf vendor events arm64: Reference common and uarch events for Ampere eMag
Reduce duplication in the JSONs by referencing standard events from
armv8-common-and-microarch.json

In general the "PublicDescription" fields are not modified when somewhat
significantly worded differently than the standard.

Apart from that, description and names for events slightly different to
standard are changed (to standard) for consistency.

Note that names for events 0x34 and 0x35 are non-standard and remain
unchanged. Those events came from the following originally:

  4c2479c67b/Documentation/arm64/eMAG-ARM-CoreImpDefined.pdf

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com>
Cc: mathieu.poirier@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@openeuler.org
Link: https://lore.kernel.org/r/1611835236-34696-4-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:44 -03:00
John Garry c77669662f perf vendor events arm64: Add common and uarch event JSON
Add a common and microarch JSON, which can be referenced from CPU JSONs.

For now, brief and public description are as event brief event
description from the ARMv8 ARM [0], D7-11.

The list of events is not complete, as not all events will be referenced
yet.

Reference document is at the following:

[0] https://documentation-service.arm.com/static/5fa3bd1eb209f547eebd4141?token=

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@openeuler.org
Link: https://lore.kernel.org/r/1611835236-34696-3-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:43 -03:00
John Garry 2bf797be81 perf vendor events arm64: Fix Ampere eMag event typo
The "briefdescription" for event 0x35 has a typo - fix it.

Fixes: d35c595bf0 ("perf vendor events arm64: Revise core JSON events for eMAG")
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@openeuler.org
Link: https://lore.kernel.org/r/1611835236-34696-2-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:43 -03:00
Jin Yao 4b799a9b77 perf script: Support DSO filter like in other perf tools
Other perf tool builtins already supported a DSO filter.

For example:

  $ perf report --dsos a,b,c

which only considers symbols in these dsos.

Now the DSO filter is supported in 'perf script':

  root@kbl-ppc:~# ./perf script --dsos "[kernel.kallsyms]"
            perf 18123 [000] 6142863.075104:          1   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])
            perf 18123 [000] 6142863.075107:          1   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])
            perf 18123 [000] 6142863.075108:         10   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])
            perf 18123 [000] 6142863.075109:        273   cycles:  ffffffff9ca7730a native_write_msr+0xa ([kernel.kallsyms])
            perf 18123 [000] 6142863.075110:       7684   cycles:  ffffffff9ca3c9c0 native_sched_clock+0x50 ([kernel.kallsyms])
            perf 18123 [000] 6142863.075112:     213017   cycles:  ffffffff9d765a92 syscall_exit_to_user_mode+0x32 ([kernel.kallsyms])
            perf 18123 [001] 6142863.075156:          1   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])
            perf 18123 [001] 6142863.075158:          1   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])
            perf 18123 [001] 6142863.075159:         17   cycles:  ffffffff9ca77308 native_write_msr+0x8 ([kernel.kallsyms])

Committer testing:

  $ perf script
                ls 2364888 29303.010949:          1 cycles:u:  ffffffffa4bbc6a9 [unknown] ([unknown])
                ls 2364888 29303.010957:          1 cycles:u:  ffffffffa429ef48 [unknown] ([unknown])
                ls 2364888 29303.010961:          1 cycles:u:  ffffffffa4260133 [unknown] ([unknown])
                ls 2364888 29303.010964:          5 cycles:u:  ffffffffa429efad [unknown] ([unknown])
                ls 2364888 29303.010967:         41 cycles:u:  ffffffffa42a4586 [unknown] ([unknown])
                ls 2364888 29303.010972:        435 cycles:u:  ffffffffa429efe0 [unknown] ([unknown])
                ls 2364888 29303.010978:       5142 cycles:u:      7f9b95bc2abf __GI___tunables_init+0x11f (/usr/lib64/ld-2.32.so)
                ls 2364888 29303.011006:      38551 cycles:u:  ffffffffa4290f61 [unknown] ([unknown])
                ls 2364888 29303.011486:     238234 cycles:u:      7f9b95bb7741 _dl_relocate_object+0xa71 (/usr/lib64/ld-2.32.so)
                ls 2364888 29303.011937:     415870 cycles:u:      7f9b95a1c80e __strcoll_l+0xe (/usr/lib64/libc-2.32.so)
  $

Before:

  $ perf script --dsos /usr/lib64/libc-2.32.so |& head -5
    Error: unknown option `dsos'

   Usage: perf script [<options>]
      or: perf script [<options>] record <script> [<record-options>] <command>
      or: perf script [<options>] report <script> [script-args]
  $

After:

  $ perf script --dsos /usr/lib64/libc-2.32.so
                ls 2364888 29303.011937:     415870 cycles:u:      7f9b95a1c80e __strcoll_l+0xe (/usr/lib64/libc-2.32.so)
  $

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210124232750.19170-2-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:43 -03:00
Arnaldo Carvalho de Melo c69bf11ad3 perf tools: Fix DSO filtering when not finding a map for a sampled address
When we lookup an address and don't find a map we should filter that
sample if the user specified a list of --dso entries to filter on, fix
it.

Before:

  $ perf script
             sleep 274800  2843.556162:          1 cycles:u:  ffffffffbb26bff4 [unknown] ([unknown])
             sleep 274800  2843.556168:          1 cycles:u:  ffffffffbb2b047d [unknown] ([unknown])
             sleep 274800  2843.556171:          1 cycles:u:  ffffffffbb2706b2 [unknown] ([unknown])
             sleep 274800  2843.556174:          6 cycles:u:  ffffffffbb2b0267 [unknown] ([unknown])
             sleep 274800  2843.556176:         59 cycles:u:  ffffffffbb2b03b1 [unknown] ([unknown])
             sleep 274800  2843.556180:        691 cycles:u:  ffffffffbb26bff4 [unknown] ([unknown])
             sleep 274800  2843.556189:       9160 cycles:u:      7fa9550eeaa3 __GI___tunables_init+0xf3 (/usr/lib64/ld-2.32.so)
             sleep 274800  2843.556312:      86937 cycles:u:      7fa9550e157b _dl_lookup_symbol_x+0x4b (/usr/lib64/ld-2.32.so)
  $

So we have some samples we somehow didn't find in a map for, if we now
do:

  $ perf report --stdio --dso /usr/lib64/ld-2.32.so
  # dso: /usr/lib64/ld-2.32.so
  #
  # Total Lost Samples: 0
  #
  # Samples: 8  of event 'cycles:u'
  # Event count (approx.): 96856
  #
  # Overhead  Command  Symbol
  # ........  .......  ........................
  #
      89.76%  sleep    [.] _dl_lookup_symbol_x
       9.46%  sleep    [.] __GI___tunables_init
       0.71%  sleep    [k] 0xffffffffbb26bff4
       0.06%  sleep    [k] 0xffffffffbb2b03b1
       0.01%  sleep    [k] 0xffffffffbb2b0267
       0.00%  sleep    [k] 0xffffffffbb2706b2
       0.00%  sleep    [k] 0xffffffffbb2b047d
  $

After this patch we get the right output with just entries for the DSOs
specified in --dso:

  $ perf report --stdio --dso /usr/lib64/ld-2.32.so
  # dso: /usr/lib64/ld-2.32.so
  #
  # Total Lost Samples: 0
  #
  # Samples: 8  of event 'cycles:u'
  # Event count (approx.): 96856
  #
  # Overhead  Command  Symbol
  # ........  .......  ........................
  #
      89.76%  sleep    [.] _dl_lookup_symbol_x
       9.46%  sleep    [.] __GI___tunables_init
  $
  #

Fixes: 96415e4d3f ("perf symbols: Avoid unnecessary symbol loading when dso list is specified")
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210128131209.GD775562@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:43 -03:00
Kan Liang 42641d6f4d perf stat: Add Topdown metrics events as default events
The Topdown Microarchitecture Analysis (TMA) Method is a structured
analysis methodology to identify critical performance bottlenecks in
out-of-order processors. From the Ice Lake and later platforms, the
Topdown information can be retrieved from the dedicated "metrics"
register, which isn't impacted by other events. Also, the Topdown
metrics support both per thread/process and per core measuring.  Adding
Topdown metrics events as default events can enrich the default
measuring information, and would not cost any extra multiplexing.

Introduce arch_evlist__add_default_attrs() to allow architecture
specific default events. Add the Topdown metrics events in the X86
specific arch_evlist__add_default_attrs(). Other architectures can add
their own default events later separately.

With the patch:

 $ perf stat sleep 1

 Performance counter stats for 'sleep 1':

           0.82 msec task-clock:u              #    0.001 CPUs utilized
              0      context-switches:u        #    0.000 K/sec
              0      cpu-migrations:u          #    0.000 K/sec
             61      page-faults:u             #    0.074 M/sec
        319,941      cycles:u                  #    0.388 GHz
        242,802      instructions:u            #    0.76  insn per cycle
         54,380      branches:u                #   66.028 M/sec
          4,043      branch-misses:u           #    7.43% of all branches
      1,585,555      slots:u                   # 1925.189 M/sec
        238,941      topdown-retiring:u        #     15.0% retiring
        410,378      topdown-bad-spec:u        #     25.8% bad speculation
        634,222      topdown-fe-bound:u        #     39.9% frontend bound
        304,675      topdown-be-bound:u        #     19.2% backend bound

       1.001791625 seconds time elapsed

       0.000000000 seconds user
       0.001572000 seconds sys

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20210121133752.118327-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:43 -03:00
John Garry 7efce5c240 perf test: Add parse-metric memory bandwidth testcase
Event duration_time in a metric expression requires special handling.

Improve test coverage by including a metric whose expression includes
duration_time. The actual metric is a copied from the L1D_Cache_Fill_BW
metric on my broadwell machine.

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linuxarm@openeuler.org
Link: http://lore.kernel.org/lkml/1611578842-5749-1-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:27 -03:00
Sedat Dilek 211a741cd3 tools: Factor Clang, LLC and LLVM utils definitions
When dealing with BPF/BTF/pahole and DWARF v5 I wanted to build bpftool.

While looking into the source code I found duplicate assignments in misc tools
for the LLVM eco system, e.g. clang and llvm-objcopy.

Move the Clang, LLC and/or LLVM utils definitions to tools/scripts/Makefile.include
file and add missing includes where needed. Honestly, I was inspired by the commit
c8a950d0d3 ("tools: Factor HOSTCC, HOSTLD, HOSTAR definitions").

I tested with bpftool and perf on Debian/testing AMD64 and LLVM/Clang v11.1.0-rc1.

Build instructions:

[ make and make-options ]
MAKE="make V=1"
MAKE_OPTS="HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang LD=ld.lld LLVM=1 LLVM_IAS=1"
MAKE_OPTS="$MAKE_OPTS PAHOLE=/opt/pahole/bin/pahole"

[ clean-up ]
$MAKE $MAKE_OPTS -C tools/ clean

[ bpftool ]
$MAKE $MAKE_OPTS -C tools/bpf/bpftool/

[ perf ]
PYTHON=python3 $MAKE $MAKE_OPTS -C tools/perf/

I was careful with respecting the user's wish to override custom compiler, linker,
GNU/binutils and/or LLVM utils settings.

Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com> # tools/build and tools/perf
Link: https://lore.kernel.org/bpf/20210128015117.20515-1-sedat.dilek@gmail.com
2021-01-29 01:25:34 +01:00
Arnaldo Carvalho de Melo 70f0ba9f24 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-27 16:48:04 -03:00
Jin Yao 8adc0a06d6 perf script: Fix overrun issue for dynamically-allocated PMU type number
When unpacking the event which is from dynamic PMU, the array
output[OUTPUT_TYPE_MAX] may be overrun. For example, type number of SKL
uncore_imc is 10, but OUTPUT_TYPE_MAX is 7 now (OUTPUT_TYPE_MAX =
PERF_TYPE_MAX + 1).

/* In builtin-script.c */

process_event()
{
        unsigned int type = output_type(attr->type);

        if (output[type].fields == 0)
                return;
}

output[10] is overrun.

Create a type OUTPUT_TYPE_OTHER for dynamic PMU events, then
output_type(attr->type) will return OUTPUT_TYPE_OTHER here.

Note that if PERF_TYPE_MAX ever changed, then there would be a conflict
between old perf.data files that had a dynamicaliy allocated PMU number
that would then be the same as a fixed PERF_TYPE.

Example:

  # perf record --switch-events -C 0 -e "{cpu-clock,uncore_imc/data_reads/,uncore_imc/data_writes/}:SD" -a -- sleep 1
  # perf script

  Before:
         swapper     0 [000] 1479253.987551:     277766               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1479253.987797:     246709               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1479253.988127:     329883               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1479253.988273:     146393               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1479253.988523:     249977               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1479253.988877:     354090               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1479253.989023:     145940               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1479253.989383:     359856               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1479253.989523:     140082               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])

  After:
         swapper     0 [000] 1397040.402011:     272384               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1397040.402011:       5396  uncore_imc/data_reads/:
         swapper     0 [000] 1397040.402011:        967 uncore_imc/data_writes/:
         swapper     0 [000] 1397040.402259:     249153               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1397040.402259:       7231  uncore_imc/data_reads/:
         swapper     0 [000] 1397040.402259:       1297 uncore_imc/data_writes/:
         swapper     0 [000] 1397040.402508:     249108               cpu-clock:  ffffffff9d4ddb6f cpuidle_enter_state+0xdf ([kernel.kallsyms])
         swapper     0 [000] 1397040.402508:       5333  uncore_imc/data_reads/:
         swapper     0 [000] 1397040.402508:       1008 uncore_imc/data_writes/:

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20201209005828.21302-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-21 17:25:33 -03:00
John Garry 3d6e79ee9e perf metricgroup: Fix system PMU metrics
Joakim reports that getting "perf stat" for multiple system PMU metrics
segfaults:

  $ perf stat -a -I 1000 -M imx8mm_ddr_write.all,imx8mm_ddr_write.all
  Segmentation fault
  $

While the same works without issue for a single metric.

The logic in metricgroup__add_metric_sys_event_iter() is broken, in that
add_metric() @m argument should be NULL for each new metric. Fix by not
passing a holder for that, and rather make local in
metricgroup__add_metric_sys_event_iter().

Fixes: be335ec28e ("perf metricgroup: Support adding metrics for system PMUs")
Reported-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linuxarm@openeuler.org
Link: https://lore.kernel.org/r/1611050655-44020-1-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-21 17:25:33 -03:00
John Garry 9c880c24cb perf metricgroup: Fix for metrics containing duration_time
Metrics containing duration_time cause a segfault:

  $ perf stat -v -M L1D_Cache_Fill_BW sleep 1
  Using CPUID GenuineIntel-6-3D-4
  metric expr 64 * l1d.replacement / 1000000000 / duration_time for L1D_Cache_Fill_BW
  found event duration_time
  found event l1d.replacement
  adding {l1d.replacement}:W,duration_time
  l1d.replacement -> cpu/umask=0x1,(null)=0x1e8483,event=0x51/
  Segmentation fault
  $

In commit c2337d6719 ("perf metricgroup: Fix metrics using aliases
covering multiple PMUs"), the logic in find_evsel_group() when iter'ing
events was changed to not only select events in same group, but also for
aliased PMUs.

Checking whether events were for aliased PMUs was done by comparing the
event PMU name. This was not safe for duration_time event, which has no
associated PMU (and no PMU name), so fix by checking if the event PMU name
is set also.

Committer testing:

Reproduced the bug, then, on a:

  $ grep -m1 ^'model name' /proc/cpuinfo
  model name	: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
  $

We now get:

  $ perf stat -M L1D_Cache_Fill_BW sleep 1

   Performance counter stats for 'sleep 1':

               4,141      l1d.replacement:u
       1,001,285,107 ns   duration_time:u

         1.001285107 seconds time elapsed

         0.000000000 seconds user
         0.001119000 seconds sys

  $

Detais from -v:

  Using CPUID GenuineIntel-6-8E-A
  metric expr 64 * l1d.replacement / 1000000000 / duration_time for L1D_Cache_Fill_BW
  found event duration_time
  found event l1d.replacement
  adding {l1d.replacement}:W,duration_time
  l1d.replacement -> cpu/(null)=0x1e8483,umask=0x1,event=0x51/
  Control descriptor is not initialized
  Warning:
  kernel.perf_event_paranoid=2, trying to fall back to excluding kernel and hypervisor  samples
  Warning:
  kernel.perf_event_paranoid=2, trying to fall back to excluding kernel and hypervisor  samples
  l1d.replacement:u: 4592 612201 612201
  duration_time:u: 1001478621 1001478621 1001478621

Fixes: c2337d6719 ("perf metricgroup: Fix metrics using aliases covering multiple PMUs")
Reported-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linuxarm@openeuler.org
Link: https://lore.kernel.org/r/1611159518-226883-1-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-21 17:25:33 -03:00
Arnaldo Carvalho de Melo cd07e536b0 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:35:31 -03:00
Jiri Olsa 47fddcb479 perf tools: Add 'ping' control command
Add a control 'ping' command to detect if perf is up and its control
interface is operational.

It will be used in following daemon patches to synchronize with record
session - when control interface is up and running, we know that perf
record is monitoring and ready to receive signals.

Example session:

  terminal 1:

    # mkfifo control ack
    # perf record --control=fifo:control,ack

  terminal 2:

    # echo ping > control
    # cat ack
    ack

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201226232038.390883-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:21 -03:00
Jiri Olsa f186cd6148 perf tools: Add 'stop' control command
Adding control 'stop' command to stop perf record.

When it is received, perf will set the 'done' variable to 1 to stop its
mmap ring buffer reading loop.

Example session:

  terminal 1:
    # mkfifo control ack
    # perf record --control=fifo:control,ack

  terminal 2:
    # echo stop > control

  terminal 1:
    [ perf record: Woken up 7 times to write data ]
    [ perf record: Captured and wrote 3.214 MB perf.data (38280 samples) ]
    #

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201226232038.390883-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:21 -03:00
Jiri Olsa 142544a938 perf tools: Add 'evlist' control command
Add a new 'evlist' control command to display all the evlist events.
When it is received, perf will scan and print current evlist into perf
record terminal.

The interface string for control file is:

  evlist [-v|-g|-F]

The syntax follows perf evlist command:
  -F  Show just the sample frequency used for each event.
  -v  Show all fields.
  -g  Show event group information.

Example session:

  terminal 1:
    # mkfifo control ack
    # perf record --control=fifo:control,ack -e '{cycles,instructions}'

  terminal 2:
    # echo evlist > control

  terminal 1:
    cycles
    instructions
    dummy:HG

  terminal 2:
    # echo 'evlist -v' > control

  terminal 1:
    cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type:            \
    IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, freq: 1,    \
    sample_id_all: 1, exclude_guest: 1
    instructions: size: 120, config: 0x1, { sample_period, sample_freq }: 4000,      \
    sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, inherit: 1, freq: 1,    \
    sample_id_all: 1, exclude_guest: 1
    dummy:HG: type: 1, size: 120, config: 0x9, { sample_period, sample_freq }: 4000, \
    sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, inherit: 1, mmap: 1,    \
    comm: 1, freq: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, \
     bpf_event: 1

  terminal 2:
    # echo 'evlist -g' > control

  terminal 1:
    {cycles,instructions}
    dummy:HG

  terminal 2:
    # echo 'evlist -F' > control

  terminal 1:
    cycles: sample_freq=4000
    instructions: sample_freq=4000
    dummy:HG: sample_freq=4000

This new evlist command is handy to get real event names when
wildcards are used.

Adding evsel_fprintf.c object to python/perf.so build, because
it's now evlist.c dependency.

Adding PYTHON_PERF define for python/perf.so compilation, so we
can use it to compile in only evsel__fprintf from evsel_fprintf.c
object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201226232038.390883-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:21 -03:00
Jiri Olsa 991ae4eb36 perf tools: Allow to enable/disable events via control file
Adding new control events to enable/disable specific event.
The interface string for control file are:

  'enable <EVENT NAME>'
  'disable <EVENT NAME>'

when received the command, perf will scan the current evlist
for <EVENT NAME> and if found it's enabled/disabled.

Example session:

  terminal 1:
    # mkfifo control ack perf.pipe
    # perf record --control=fifo:control,ack -D -1 --no-buffering -e 'sched:*' -o - > perf.pipe

  terminal 2:
    # cat perf.pipe | perf --no-pager script -i -

  terminal 1:
    Events disabled

  NOTE Above message will show only after read side of the pipe ('>')
  is started on 'terminal 2'. The 'terminal 1's bash does not execute
  perf before that, hence the delyaed perf record message.

  terminal 3:
    # echo 'enable sched:sched_process_fork' > control

  terminal 1:
    event sched:sched_process_fork enabled

  terminal 2:
    bash 33349 [034] 149587.674295: sched:sched_process_fork: comm=bash pid=33349 child_comm=bash child_pid=34056
    bash 33349 [034] 149588.239521: sched:sched_process_fork: comm=bash pid=33349 child_comm=bash child_pid=34057

  terminal 3:
    # echo 'enable sched:sched_wakeup_new' > control

  terminal 1:
    event sched:sched_wakeup_new enabled

  terminal 2:
    bash 33349 [034] 149632.228023: sched:sched_process_fork: comm=bash pid=33349 child_comm=bash child_pid=34059
    bash 33349 [034] 149632.228050:   sched:sched_wakeup_new: bash:34059 [120] success=1 CPU:036
    bash 33349 [034] 149633.950005: sched:sched_process_fork: comm=bash pid=33349 child_comm=bash child_pid=34060
    bash 33349 [034] 149633.950030:   sched:sched_wakeup_new: bash:34060 [120] success=1 CPU:036

Committer testing:

If I use 'sched:*' and then enable all events, I can't get 'perf record'
to react to further commands, so I tested it with:

  [root@five ~]# perf record --control=fifo:control,ack -D -1 --no-buffering -e 'sched:sched_process_*' -o - > perf.pipe
  Events disabled
  Events enabled
  Events disabled

And then it works as expected, so we need to fix this pre-existing
problem.

Another issue, we need to check if a event is already enabled or
disabled and change the message to be clearer, i.e.:

  [root@five ~]# perf record --control=fifo:control,ack -D -1 --no-buffering -e 'sched:sched_process_*' -o - > perf.pipe
  Events disabled

If we receive a 'disable' command, then it should say:

  [root@five ~]# perf record --control=fifo:control,ack -D -1 --no-buffering -e 'sched:sched_process_*' -o - > perf.pipe
  Events disabled
  Events already disabled

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201226232038.390883-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:21 -03:00
Jiri Olsa e8b2db0781 perf config: Make perf_config_global() global
Make perf_config_global global, it will be used outside the config.c
object in the following patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210102220441.794923-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:21 -03:00
Jiri Olsa b2946282c0 perf config: Make perf_config_system() global
Make perf_config_system global, it will be used outside the config.c
object in the following patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210102220441.794923-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Jiri Olsa f5f03e19ce perf config: Add perf_home_perfconfig function
Factor out the perf_home_perfconfig, that looks for .perfconfig in home
directory including check for PERF_CONFIG_NOGLOBAL and for proper
permission.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210102220441.794923-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Jiri Olsa bcbd79d1cf perf debug: Add debug_set_display_time function
Allow to display time in perf debug output via new
debug_set_display_time function.

It will be used in perf daemon command to get verbose output into log
file.

The debug time format is:

  [2020-12-03 18:25:31.822152] affinity: SYS
  [2020-12-03 18:25:31.822164] mmap flush: 1
  [2020-12-03 18:25:31.822175] comp level: 0
  [2020-12-03 18:25:32.002047] mmap size 528384B

Committer notes:

Cast tod.tv_usec to long to avoid this problem:

    78    12.70 ubuntu:18.04-x-sparc64        : FAIL sparc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

  util/debug.c: In function 'fprintf_time':
  util/debug.c:63:32: error: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type '__suseconds_t {aka int}' [-Werror=format=]
    return fprintf(file, "[%s.%06lu] ", date, tod.tv_usec);
                              ~~~~^           ~~~~~~~~~~~
                              %06u

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210102220441.794923-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Jiri Olsa a523026cac perf config: Add config set interface
Add interface to load config set from custom file by using
perf_config_set__load_file function.

It will be used in perf daemon command to process custom config file.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210102220441.794923-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Jiri Olsa 64b9705b54 perf config: Make perf_config_from_file() static
It's not used outside config.c object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210102220441.794923-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Stephane Eranian d8eda89805 perf test: Add test case for PERF_SAMPLE_CODE_PAGE_SIZE
Extend sample-parsing test cases to support new sample type
PERF_SAMPLE_CODE_PAGE_SIZE.

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210105195752.43489-7-kan.liang@linux.intel.com
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Stephane Eranian 9fd74f209c perf report: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
Add a new sort dimension "code_page_size" for common sort.
With this option applied, perf can sort and report by sample's code page
size.

For example:

  # perf report --stdio --sort=comm,symbol,code_page_size
  # To display the perf.data header info, please use
  # --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 3K of event 'mem-loads:uP'
  # Event count (approx.): 1470769
  #
  # Overhead  Command  Symbol                        Code Page Size IPC [IPC Coverage]
  # ........  .......  ............................  .............. ....................
  #
      69.56%  dtlb     [.] GetTickCount              4K              -   -
      17.93%  dtlb     [.] Calibrate                 4K              -   -
      11.40%  dtlb     [.] __gettimeofday            4K              -   -
  #

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210105195752.43489-6-kan.liang@linux.intel.com
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Stephane Eranian c513de8a70 perf script: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
Display sampled code page sizes when PERF_SAMPLE_CODE_PAGE_SIZE was set.

For example:

  # perf script --fields comm,event,ip,code_page_size
            dtlb mem-loads:uP:            445777 4K
            dtlb mem-loads:uP:            40f724 4K
            dtlb mem-loads:uP:            474926 4K
            dtlb mem-loads:uP:            401075 4K
            dtlb mem-loads:uP:            401095 4K
            dtlb mem-loads:uP:            401095 4K
            dtlb mem-loads:uP:            4010cc 4K
            dtlb mem-loads:uP:            440b6f 4K
  #

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210105195752.43489-5-kan.liang@linux.intel.com
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Kan Liang c1de7f3d84 perf record: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
Adds the infrastructure to sample the code address page size.

Introduce a new --code-page-size option for perf record.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Originally-by: Stephane Eranian <eranian@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210105195752.43489-4-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Kan Liang 06280e3b15 perf mem: Support data page size
Add option --data-page-size in "perf mem" to record/report data page
size.

Here are some examples:

  # perf mem --phys-data --data-page-size report -D
  # PID, TID, IP, ADDR, PHYS ADDR, DATA PAGE SIZE, LOCAL WEIGHT, DSRC, SYMBOL
  20134 20134 0xffffffffb5bd2fd0 0x016ffff9a274e96a308 0x000000044e96a308 4K  1168 0x5080144 /lib/modules/4.18.0-rc7+/build/vmlinux:perf_ctx_unlock
  20134 20134 0xffffffffb63f645c 0xffffffffb752b814 0xcfb52b814 2M 225 0x26a100142 /lib/modules/4.18.0-rc7+/build/vmlinux:_raw_spin_lock
  20134 20134 0xffffffffb660300c 0xfffffe00016b8bb0 0x0 4K 0 0x5080144 /lib/modules/4.18.0-rc7+/build/vmlinux:__x86_indirect_thunk_rax
  #

  # perf mem --phys-data --data-page-size report --stdio
  # To display the perf.data header info, please use
  # --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 5K of event 'cpu/mem-loads,ldlat=30/P'
  # Total weight : 281234
  # Sort order   :
  # mem,sym,dso,symbol_daddr,dso_daddr,tlb,locked,phys_daddr,data_page_size
  #
  # Overhead  Samples  Memory access  Symbol                        Shared Object     Data Symbol             Data Object  TLB access    Locked  Data Physical Address   Data Page Size
  # ........  .......  .............  ............................  ................  ......................  ...........  ............  ......  ......................  ..............

    28.54%     1826    L1 or L1 hit   [k] __x86_indirect_thunk_rax  [kernel.vmlinux]  [k] 0xffffb0df31b0ff28  [unknown]    L1 or L2 hit  No      [k] 0x0000000000000000  4K
     6.02%      256    L1 or L1 hit   [.] touch_buffer              dtlb              [.] 0x00007ffd50109da8  [stack]      L1 or L2 hit  No      [.] 0x000000042454ada8  4K
     3.23%        5    L1 or L1 hit   [k] clear_huge_page           [kernel.vmlinux]  [k] 0xffff9a2753b8ce60  [unknown]    L1 or L2 hit  No      [k] 0x0000000453b8ce60  2M
     2.98%        4    L1 or L1 hit   [k] clear_page_erms           [kernel.vmlinux]  [k] 0xffffb0df31b0fd00  [unknown]    L1 or L2 hit  No      [k] 0x0000000000000000  4K

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210105195752.43489-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Kan Liang 407ee5c920 perf mem: Clean up output format
Now, "--phys-data" is the only option which impacts the output format.

A simple "if else" is enough to handle the option. But there will be
more options added, e.g. "--data-page-size", which also impact the
output format. The code will become too complex to be maintained.

Divide the big printf into several small pieces. Output the specific
piece only if the related option is applied.

No functional change.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210105195752.43489-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
James Clark 80ec45d9f6 perf cs-etm: Update ARM's CoreSight hardware tracing OpenCSD library to v1.0.0
Replace the OCSD_INSTR switch statement with an if to fix compilation
error about unhandled values and avoid this issue again in the future.

Add new OCSD_GEN_TRC_ELEM_SYNC_MARKER and OCSD_GEN_TRC_ELEM_MEMTRANS
enum values to fix unhandled value compilation error. Currently they are
ignored.

Increase the minimum version number to v1.0.0 now that new enum values
are used that are only present in this version.

Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Tested-by: Mike Leach <mike.leach@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210108142752.27872-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Leo Yan 0998d96048 perf c2c: Add local variables for output metrics
This patch adds several local variables:

  "cl_output": pointer for outputting single cache line metrics;
  "output_str": pointer for outputting cache line metrics;
  "sort_str": pointer to the sorting metrics.

This can improve readability for the code and it's more flexible when
later extend to different strings for the output metrics.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210114154646.209024-7-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Leo Yan f3d0a551db perf c2c: Refactor node display
The macro DISPLAY_HITM() is used to calculate HITM percentage introduced
by every node and it's shown for the node info.

This patch introduces the static function display_metrics() to replace
the macro, and the parameters are refined for passing the metric's
statistic and sum value.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210114154646.209024-6-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Leo Yan 111c141591 perf c2c: Fix argument type for percent()
For percent() its arguments are defined as integers; this is not
consistent with its consumers which pass u32 arguments.

Thus this patch makes argument type as u32 for percent().

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210114154646.209024-5-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Leo Yan 69a95bfdf9 perf c2c: Refactor display filter
When sorting on the respective metrics (lcl_hitm, rmt_hitm, tot_hitm),
the FILTER_HITM macro is used to filter out the cache line entries if
its overhead is less than 1%.

This patch introduces a static function filter_display() to replace that
macro and refines its parameters with a more flexible way, rather than
passing field name, it changes to pass the cache line's statistic and
sum value.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210114154646.209024-4-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Leo Yan 2290e1d619 perf c2c: Refactor hist entry validation
This patch has no functionality changes but refactors hist entry
validation for cache line resorting.

It renames function "valid_hitm_or_store()" to "is_valid_hist_entry()",
changes return type from integer type to bool type.  In the function,
it uses switch-case instead of ternary operators, which is easier
to extend for more display types.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210114154646.209024-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Leo Yan 1834436e34 perf c2c: Rename for shared cache line stats
For shared cache line statistics, 'perf c2c' relies on HITM.  We can use
more general naming rather than only binding to HITM, so replace
"hitm_stats" with "shared_clines_stats" in structure perf_c2c, and
rename function resort_hitm_cb() to resort_shared_cl_cb().

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20210114154646.209024-2-leo.yan@linaro.org
2021-01-20 14:34:20 -03:00
Song Liu fa853c4b83 perf stat: Enable counting events for BPF programs
Introduce 'perf stat -b' option, which counts events for BPF programs, like:

  [root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
     1.487903822            115,200      ref-cycles
     1.487903822             86,012      cycles
     2.489147029             80,560      ref-cycles
     2.489147029             73,784      cycles
     3.490341825             60,720      ref-cycles
     3.490341825             37,797      cycles
     4.491540887             37,120      ref-cycles
     4.491540887             31,963      cycles

The example above counts 'cycles' and 'ref-cycles' of BPF program of id
254.  This is similar to bpftool-prog-profile command, but more
flexible.

'perf stat -b' creates per-cpu perf_event and loads fentry/fexit BPF
programs (monitor-progs) to the target BPF program (target-prog). The
monitor-progs read perf_event before and after the target-prog, and
aggregate the difference in a BPF map. Then the user space reads data
from these maps.

A new 'struct bpf_counter' is introduced to provide a common interface
that uses BPF programs/maps to count perf events.

Committer notes:

Removed all but bpf_counter.h includes from evsel.h, not needed at all.

Also BPF map lookups for PERCPU_ARRAYs need to have as its value receive
buffer passed to the kernel libbpf_num_possible_cpus() entries, not
evsel__nr_cpus(evsel), as the former uses
/sys/devices/system/cpu/possible while the later uses
/sys/devices/system/cpu/online, which may be less than the 'possible'
number making the bpf map lookup overwrite memory and cause hard to
debug memory corruption.

We need to continue using evsel__nr_cpus(evsel) when accessing the
perf_counts array tho, not to overwrite another are of memory :-)

Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/lkml/20210120163031.GU12699@kernel.org/
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Link: http://lore.kernel.org/lkml/20201229214214.3413833-4-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:25:28 -03:00
Al Grant 648b054a46 perf inject: Correct event attribute sizes
When 'perf inject' reads a perf.data file from an older version of perf,
it writes event attributes into the output with the original size field,
but lays them out as if they had the size currently used. Readers see a
corrupt file. Update the size field to match the layout.

Signed-off-by: Al Grant <al.grant@foss.arm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20201124195818.30603-1-al.grant@arm.com
Signed-off-by: Denis Nikitin <denik@chromium.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15 17:28:28 -03:00
Adrian Hunter 5501e9229a perf intel-pt: Fix 'CPU too large' error
In some cases, the number of cpus (nr_cpus_online) is confused with the
maximum cpu number (nr_cpus_avail), which results in the error in the
example below:

Example on system with 8 cpus:

 Before:
   # echo 0 > /sys/devices/system/cpu/cpu2/online
   # ./perf record --kcore -e intel_pt// taskset --cpu-list 7 uname
   Linux
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.147 MB perf.data ]
   # ./perf script --itrace=e
   Requested CPU 7 too large. Consider raising MAX_NR_CPUS
   0x25908 [0x8]: failed to process type: 68 [Invalid argument]

 After:
   # ./perf script --itrace=e
   #

Fixes: 8c7274691f ("perf machine: Replace MAX_NR_CPUS with perf_env::nr_cpus_online")
Fixes: 7df4e36a47 ("perf session: Replace MAX_NR_CPUS with perf_env::nr_cpus_online")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20210107174159.24897-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15 17:28:27 -03:00
Namhyung Kim a1bf23052b perf stat: Take cgroups into account for shadow stats
As of now it doesn't consider cgroups when collecting shadow stats and
metrics so counter values from different cgroups will be saved in a same
slot.  This resulted in incorrect numbers when those cgroups have
different workloads.

For example, let's look at the scenario below: cgroups A and C runs same
workload which burns a cpu while cgroup B runs a light workload.

  $ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C  sleep 1

   Performance counter stats for 'system wide':

     3,958,116,522      cycles                A
     6,722,650,929      instructions          A #    2.53  insn per cycle
         1,132,741      cycles                B
           571,743      instructions          B #    0.00  insn per cycle
     4,007,799,935      cycles                C
     6,793,181,523      instructions          C #    2.56  insn per cycle

       1.001050869 seconds time elapsed

When I run 'perf stat' with single workload, it usually shows IPC around
1.7.  We can verify it (6,722,650,929.0 / 3,958,116,522 = 1.698) for cgroup A.

But in this case, since cgroups are ignored, cycles are averaged so it
used the lower value for IPC calculation and resulted in around 2.5.

  avg cycle: (3958116522 + 1132741 + 4007799935) / 3 = 2655683066
  IPC (A)  :  6722650929 / 2655683066 = 2.531
  IPC (B)  :      571743 / 2655683066 = 0.0002
  IPC (C)  :  6793181523 / 2655683066 = 2.557

We can simply compare cgroup pointers in the evsel and it'll be NULL
when cgroups are not specified.  With this patch, I can see correct
numbers like below:

  $ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C  sleep 1

  Performance counter stats for 'system wide':

     4,171,051,687      cycles                A
     7,219,793,922      instructions          A #    1.73  insn per cycle
         1,051,189      cycles                B
           583,102      instructions          B #    0.55  insn per cycle
     4,171,124,710      cycles                C
     7,192,944,580      instructions          C #    1.72  insn per cycle

       1.007909814 seconds time elapsed

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210115071139.257042-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15 17:28:27 -03:00
Namhyung Kim 3ff1e7180a perf stat: Introduce struct runtime_stat_data
To pass more info to the saved_value in the runtime_stat, add a new
struct runtime_stat_data.  Currently it only has 'ctx' field but later
patch will add more.

Note that we intentionally pass 0 as ctx to clock-related events for
compatibility.  It was already there in a few places.  So move the code
into the saved_value_lookup() explicitly and add a comment.

Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210115071139.257042-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15 17:28:27 -03:00
Namhyung Kim a042a82ddb perf test: Fix shadow stat test for non-bash shells
It was using some bash-specific features and failed to parse when
running with a different shell like below:

  root@kbl-ppc:~/kbl-ws/perf-dev/lck-9077/acme.tmp/tools/perf# ./perf test 83 -vv
  83: perf stat metrics (shadow stat) test                            :
  --- start ---
  test child forked, pid 3922
  ./tests/shell/stat+shadow_stat.sh: 19: ./tests/shell/stat+shadow_stat.sh: [[: not found
  ./tests/shell/stat+shadow_stat.sh: 24: ./tests/shell/stat+shadow_stat.sh: [[: not found
  ./tests/shell/stat+shadow_stat.sh: 30: ./tests/shell/stat+shadow_stat.sh: [[: not found
  (standard_in) 2: syntax error
  ./tests/shell/stat+shadow_stat.sh: 36: ./tests/shell/stat+shadow_stat.sh: [[: not found
  ./tests/shell/stat+shadow_stat.sh: 19: ./tests/shell/stat+shadow_stat.sh: [[: not found
  ./tests/shell/stat+shadow_stat.sh: 24: ./tests/shell/stat+shadow_stat.sh: [[: not found
  ./tests/shell/stat+shadow_stat.sh: 30: ./tests/shell/stat+shadow_stat.sh: [[: not found
  (standard_in) 2: syntax error
  ./tests/shell/stat+shadow_stat.sh: 36: ./tests/shell/stat+shadow_stat.sh: [[: not found
  ./tests/shell/stat+shadow_stat.sh: 45: ./tests/shell/stat+shadow_stat.sh: declare: not found
  test child finished with -1
  ---- end ----
  perf stat metrics (shadow stat) test: FAILED!

Reported-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114050609.1258820-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15 16:31:46 -03:00
Arnaldo Carvalho de Melo 301f0203e0 perf bpf examples: Fix bpf.h header include directive in 5sec.c example
It was looking at bpf/bpf.h, which caused this problem:

  # perf trace -e tools/perf/examples/bpf/5sec.c
  /home/acme/git/perf/tools/perf/examples/bpf/5sec.c:42:10: fatal error: 'bpf/bpf.h' file not found
  #include <bpf/bpf.h>
           ^~~~~~~~~~~
  1 error generated.
  ERROR:	unable to compile tools/perf/examples/bpf/5sec.c
  Hint:	Check error message shown above.
  Hint:	You can also pre-compile it into .o using:
       		clang -target bpf -O2 -c tools/perf/examples/bpf/5sec.c
       	with proper -I and -D options.
  event syntax error: 'tools/perf/examples/bpf/5sec.c'
                       \___ Failed to load tools/perf/examples/bpf/5sec.c from source: Error when compiling BPF scriptlet
  #

Change that to plain bpf.h, to make it work again:

  # perf trace -e tools/perf/examples/bpf/5sec.c sleep 5s
       0.000 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1776891872, rqtp: 5000000000)
  # perf trace -e tools/perf/examples/bpf/5sec.c/max-stack=16/ sleep 5s
       0.000 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1776891872, rqtp: 5000000000)
                                         hrtimer_nanosleep ([kernel.kallsyms])
                                         common_nsleep ([kernel.kallsyms])
                                         __x64_sys_clock_nanosleep ([kernel.kallsyms])
                                         do_syscall_64 ([kernel.kallsyms])
                                         entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
                                         __clock_nanosleep_2 (/usr/lib64/libc-2.32.so)
  # perf trace -e tools/perf/examples/bpf/5sec.c sleep 4s
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15 16:31:46 -03:00
Song Liu fbcdaa1908 perf build: Support build BPF skeletons with perf
BPF programs are useful in perf to profile BPF programs.

BPF skeleton is by far the easiest way to write BPF tools. Enable
building BPF skeletons in util/bpf_skel. A dummy bpf skeleton is added.
More bpf skeletons will be added for different use cases.

Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Link: http://lore.kernel.org/lkml/20201229214214.3413833-3-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15 15:49:07 -03:00
Hans-Peter Nilsson c07b45a355 perf record: Tweak "Lowering..." warning in record_opts__config_freq
That is, instead of "Lowering default frequency rate to <F>" say
"Lowering default frequency rate from <f> to <F>", specifying the
overridden default frequency <f>, so you don't have to grep through the
source to "remember" that was e.g. 4000.

Signed-off-by: Hans-Peter Nilsson <hp@axis.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20201228031908.B049B203B5@pchp3.se.axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28 12:24:48 -03:00
Jiri Olsa d176db9558 perf buildid-list: Add support for mmap2's buildid events
Add buildid-list support for mmap2's build id data, so we can display
build ids for dso objects for data without the build id cache update.

  $ perf buildid-list
  1805c738c8f3ec0f47b7ea09080c28f34d18a82b /usr/lib64/ld-2.31.so
  d278249792061c6b74d1693ca59513be1def13f2 /usr/lib64/libc-2.31.so

By default only dso objects with hits are shown.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-15-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28 12:23:09 -03:00
Jiri Olsa e8a2061f0b perf buildid-cache: Add --debuginfod option to specify a server to fetch debug files
Add the --debuginfod option to specify debuginfod URL and support to do
that through config file as well.

Use the following in ~/.perfconfig file:

  [buildid-cache]
  debuginfod=http://192.168.122.174:8002

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-14-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28 12:20:39 -03:00
Jiri Olsa 0b5c88214e perf tools: Add support to display build ids when available in PERF_RECORD_MMAP2 events
Add support to display the build id in PERF_RECORD_MMAP2 events, when
available:

  $ perf script --show-mmap-events | head -4
  swapper ... @ 0xffffffff81000000 <ff1969b3ba5e43911208bb46fa7d5b1eb809e422>]: ---p [kernel.kallsyms]_text
  swapper ... @ 0 <5f62adb730272c9417883ae8b8a8ec224df8cddd>]: ---p /lib/modules/5.9.0-rc5buildid+/kernel/drivers/firmware/qemu_fw_cfg.ko
  swapper ... @ 0 <c9ac6e1dafc1ebdadb048f967854e810706c8bab>]: ---p /lib/modules/5.9.0-rc5buildid+/kernel/drivers/char/virtio_console.ko
  swapper ... @ 0 <86441a4c5b2c2ff5b440682f4c612bd4b426eb5c>]: ---p /lib/modules/5.9.0-rc5buildid+/kernel/lib/libcrc32c.ko

  $ perf report -D | grep MMAP2 | head -4
  0 0 ... @ 0xffffffff81000000 <ff1969b3ba5e43911208bb46fa7d5b1eb809e422>]: ---p [kernel.kallsyms]_text
  0 0 ... @ 0 <5f62adb730272c9417883ae8b8a8ec224df8cddd>]: ---p /lib/modules/5.9.0-rc5buildid+/kernel/drivers/firmware/qemu_fw_cfg.ko
  0 0 ... @ 0 <c9ac6e1dafc1ebdadb048f967854e810706c8bab>]: ---p /lib/modules/5.9.0-rc5buildid+/kernel/drivers/char/virtio_console.ko
  0 0 ... @ 0 <86441a4c5b2c2ff5b440682f4c612bd4b426eb5c>]: ---p /lib/modules/5.9.0-rc5buildid+/kernel/lib/libcrc32c.ko

Adding build id data into <> brackets.

Committer testing:

  $ perf record -vv --buildid-mmap sleep 1 |& grep -m1 build
  Enabling build id in mmap2 events.
  $ perf evlist -v
  cycles:u: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, build_id: 1
  $
  $ perf script --show-mmap-events | head -4
           sleep 274800  2843.556112: PERF_RECORD_MMAP2 274800/274800: [0x564e2fd32000(0x3000) @ 0x2000 <c37cb90b77c79fc719798b066d78ef121285843e>]: r-xp /usr/bin/sleep
           sleep 274800  2843.556129: PERF_RECORD_MMAP2 274800/274800: [0x7fa9550d7000(0x21000) @ 0x1000 <fc190f17c4f4dc4a8a26df18eaeed41ecdb2c61b>]: r-xp /usr/lib64/ld-2.32.so
           sleep 274800  2843.556140: PERF_RECORD_MMAP2 274800/274800: [0x7ffd8fa96000(0x2000) @ 0 00:00 0 0]: r-xp [vdso]
           sleep 274800  2843.556162:          1 cycles:u:  ffffffffbb26bff4 [unknown] ([unknown])
  $
  $ perf buildid-list -i /usr/bin/sleep
  c37cb90b77c79fc719798b066d78ef121285843e
  $ perf buildid-list -i /usr/lib64/ld-2.32.so
  fc190f17c4f4dc4a8a26df18eaeed41ecdb2c61b

And now on a system wide session to check the build ids synthesized for
the kernel and some kernel modules:

  # perf record -a --buildid-mmap sleep 2s
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.717 MB perf.data ]
  # perf script --show-mmap-events | head -4
           swapper     0 [000]     0.000000: PERF_RECORD_MMAP2 -1/0: [0xffffffffbb000000(0xe02557) @ 0xffffffffbb000000 <e71ac4b0b0631c27181dab25d63be18dad02feb8>]: ---p [kernel.kallsyms]_text
           swapper     0 [000]     0.000000: PERF_RECORD_MMAP2 -1/0: [0xffffffffc01dc000(0x6000) @ 0 <36d21515c0b22eb2859b6419a6cdf87ef4cd01c8>]: ---p /lib/modules/5.11.0-rc1+/kernel/drivers/i2c/i2c-dev.ko
           swapper     0 [000]     0.000000: PERF_RECORD_MMAP2 -1/0: [0xffffffffc01eb000(0x24000) @ 0 <c4fbfea32d0518b3e7879de8deca40ea142bb782>]: ---p /lib/modules/5.11.0-rc1+/kernel/fs/fuse/fuse.ko
           swapper     0 [000]     0.000000: PERF_RECORD_MMAP2 -1/0: [0xffffffffc0210000(0x7000) @ 0 <dd6cfb10ae66aa7b1e7b37000a004004be8092e0>]: ---p /lib/modules/5.11.0-rc1+/kernel/drivers/block/zram/zram.ko
  # perf buildid-list -h kernel

   Usage: perf buildid-list [<options>]

      -k, --kernel          Show current kernel build id

  # perf buildid-list --kernel
  e71ac4b0b0631c27181dab25d63be18dad02feb8
  # file /lib/modules/5.11.0-rc1+/kernel/drivers/i2c/i2c-dev.ko
  /lib/modules/5.11.0-rc1+/kernel/drivers/i2c/i2c-dev.ko: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), BuildID[sha1]=36d21515c0b22eb2859b6419a6cdf87ef4cd01c8, with debug_info, not stripped
  # perf buildid-list -i /lib/modules/5.11.0-rc1+/kernel/drivers/i2c/i2c-dev.ko
  36d21515c0b22eb2859b6419a6cdf87ef4cd01c8
  # perf buildid-list -i /lib/modules/5.11.0-rc1+/kernel/fs/fuse/fuse.ko
  c4fbfea32d0518b3e7879de8deca40ea142bb782
  # perf buildid-list -i /lib/modules/5.11.0-rc1+/kernel/drivers/block/zram/zram.ko
  dd6cfb10ae66aa7b1e7b37000a004004be8092e0
  #

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-12-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28 11:36:52 -03:00
Jiri Olsa e29386c8f7 perf record: Add --buildid-mmap option to enable PERF_RECORD_MMAP2's build id
Add --buildid-mmap option to enable build id in PERF_RECORD_MMAP2 events.

It will only work if there's kernel support for that and it disables
build id cache (implies --no-buildid).

It's also possible to enable it permanently via config option in
~/.perfconfig file:

  [record]
  build-id=mmap

Also added build_id bit in the verbose output for perf_event_attr:

  # perf record --buildid-mmap -vv
  ...
  perf_event_attr:
    type                             1
    size                             120
    ...
    build_id                         1

Adding also missing text_poke bit.

Committer testing:

  $ perf record -h build

   Usage: perf record [<options>] [<command>]
      or: perf record [<options>] -- <command> [<options>]

      -B, --no-buildid      do not collect buildids in perf.data
      -N, --no-buildid-cache
                            do not update the buildid cache
          --buildid-all     Record build-id of all DSOs regardless of hits
          --buildid-mmap    Record build-id in map events

  $

  $ perf record --buildid-mmap sleep 1
  Failed: no support to record build id in mmap events, update your kernel.
  $

After adding the needed kernel bits in a test kernel:

  $ perf record -vv --buildid-mmap sleep 1 |& grep -m1 build
  Enabling build id in mmap2 events.
  $ perf evlist -v
  cycles:u: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, build_id: 1
  $

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-16-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28 11:35:57 -03:00
Jiri Olsa 4183a8d70a perf tools: Allow synthesizing the build id for kernel/modules/tasks in PERF_RECORD_MMAP2
Adding build id to synthesized mmap2 events for everything -
kernel/modules/tasks, when symbol_conf.buildid_mmap2 is true.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-11-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28 10:25:33 -03:00
Jiri Olsa e0dbf18f65 perf tools: Allow using PERF_RECORD_MMAP2 to synthesize the kernel modules maps
Allow using PERF_RECORD_MMAP2 to synthesize the kernel modules maps so
that we can use PERF_RECORD_MMAP2 to encode the kernel modules build ids
in the following csets.

It's enabled by a new symbol_conf.buildid_mmap2 bool field, which will
be switchable in following changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28 10:25:23 -03:00
Jiri Olsa 978410ff99 perf tools: Allow using PERF_RECORD_MMAP2 to synthesize the kernel map
Allow using PERF_RECORD_MMAP2 to synthesize the kernel map so that we
can use PERF_RECORD_MMAP2 to encode the kernel build id in the following
csets.

It's enabled by a new symbol_conf.buildid_mmap2 bool field, which will
be switchable in following changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28 10:22:55 -03:00
Jiri Olsa 1ca6e80254 perf tools: Store build id when available in PERF_RECORD_MMAP2 metadata events
When processing a PERF_RECORD_MMAP2 metadata event, check on the build
id misc bit: PERF_RECORD_MISC_MMAP_BUILD_ID and if it is set, store the
build id in mmap's dso object.

Also adding the build id data to struct perf_record_mmap2 event
definition.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28 10:01:55 -03:00
Jiri Olsa 29245ae8ff perf tools: Do not swap mmap2 fields in case it contains build id
If the PERF_RECORD_MISC_MMAP_BUILD_ID misc bit is set, mmap2 events
carries a build id, placed in the following union:

  union {
          struct {
                  u32       maj;
                  u32       min;
                  u64       ino;
                  u64       ino_generation;
          };
          struct {
                  u8        build_id_size;
                  u8        __reserved_1;
                  u16       __reserved_2;
                  u8        build_id[20];
          };
  };

In this case we can't swap above fields.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28 09:58:40 -03:00
Leo Yan feab999efe perf arm64: Add argument support for SDT
Now the two OP formats are used for SDT marker argument in Arm64 ELF,
one format is general register xNUM (e.g. x1, x2, etc), another is for
using stack pointer to access local variables (e.g. [sp], [sp, 8]).

This patch adds support SDT marker argument for Arm64, it parses OP and
converts to uprobe compatible format.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Truong <alexandre.truong@arm.com>
Cc: Alexis Berlemont <alexis.berlemont@gmail.com>
Cc: He Zhe <zhe.he@windriver.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20201225052751.24513-4-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28 09:53:28 -03:00
Leo Yan f19b5872d8 perf probe: Fixup Arm64 SDT arguments
Arm64 ELF section '.note.stapsdt' uses string format "-4@[sp, NUM]" if
the probe is to access data in stack, e.g. below is an example for
dumping Arm64 ELF file and shows the argument format:

  Arguments: -4@[sp, 12] -4@[sp, 8] -4@[sp, 4]

Comparing against other archs' argument format, Arm64's argument
introduces an extra space character in the middle of square brackets,
due to argv_split() uses space as splitter, the argument is wrongly
divided into two items.

To support Arm64 SDT, this patch fixes up for this case, if any item
contains sub string "[sp", concatenates the two continuous items.  And
adds the detailed explaination in comment.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Truong <alexandre.truong@arm.com>
Cc: Alexis Berlemont <alexis.berlemont@gmail.com>
Cc: He Zhe <zhe.he@windriver.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20201225052751.24513-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28 09:53:28 -03:00
Arnaldo Carvalho de Melo 5149303fdf perf probe: Fix memory leak when synthesizing SDT probes
The argv_split() function must be paired with argv_free(), else we must
keep a reference to the argv array received or do the freeing ourselves,
in synthesize_sdt_probe_command() we were simply leaking that argv[]
array.

Fixes: 3b1f8311f6 ("perf probe: Add sdt probes arguments into the uprobe cmd string")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Truong <alexandre.truong@arm.com>
Cc: Alexis Berlemont <alexis.berlemont@gmail.com>
Cc: He Zhe <zhe.he@windriver.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201224135139.GF477817@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 10:52:10 -03:00
James Clark 8d4852b468 perf stat aggregation: Add separate thread member
A separate field isn't strictly required. The core field could be
re-used for thread IDs as a single field was used previously.

But separating them will avoid confusion and catch potential errors
where core IDs are read as thread IDs and vice versa.

Also remove the placeholder id field which is now no longer used.

Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20201126141328.6509-13-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 10:05:28 -03:00
James Clark b993381779 perf stat aggregation: Add separate core member
Add core as a separate member so that it doesn't have to be packed into
the int value.

Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20201126141328.6509-12-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 10:05:25 -03:00
James Clark ba2ee166d9 perf stat aggregation: Add separate die member
Add die as a separate member so that it doesn't have to be packed into
the int value.

Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20201126141328.6509-11-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 10:05:19 -03:00
James Clark 1a270cb6b3 perf stat aggregation: Add separate socket member
Add socket as a separate member so that it doesn't have to be packed
into the int value.

When the socket ID was larger than 8 bits the output appeared corrupted
or incomplete.

For example, here on ThunderX2 'perf stat' reports a socket of -1 and an
invalid die number:

  ./perf stat -a --per-die
  The socket id number is too big.

  Performance counter stats for 'system wide':

  S-1-D255       128             687.99 msec cpu-clock                 #   57.240 CPUs utilized
  ...
  S36-D0         128             842.34 msec cpu-clock                 #   70.081 CPUs utilized
  ...

And with --per-core there is an entry with an invalid core ID:

  ./perf stat record -a --per-core
  The socket id number is too big.

  Performance counter stats for 'system wide':
  S-1-D255-C65535     128             671.04 msec cpu-clock                 #   54.112 CPUs utilized
  ...
  S36-D0-C0           4              28.27 msec cpu-clock                 #    2.279 CPUs utilized
  ...

This fixes the "Session topology" self test on ThunderX2.

After this fix the output contains the correct socket and die IDs and no
longer prints a warning about the size of the socket ID:

  ./perf stat --per-die -a

  Performance counter stats for 'system wide':

  S36-D0         128         169,869.39 msec cpu-clock                 #  127.501 CPUs utilized
  ...
  S3612-D0         128         169,733.05 msec cpu-clock                 #  127.398 CPUs utilized

Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20201126141328.6509-10-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 10:05:04 -03:00
James Clark fcd83a35dd perf stat aggregation: Add separate node member
Add node as a separate member so that it doesn't have to be packed into
the int value.

Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20201126141328.6509-9-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 10:04:52 -03:00
James Clark ff5232956e perf stat aggregation: Start using cpu_aggr_id in map
Use the new cpu_aggr_id struct in the cpu map instead of int so that it
can store more data.

No functional changes.

Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20201126141328.6509-8-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 10:04:38 -03:00
James Clark d526e1a033 perf cpumap: Drop in cpu_aggr_map struct
Replace usages of perf_cpu_map with cpu_aggr map in places that are
involved with 'perf stat' aggregation.

This will then later be changed to be a map of cpu_aggr_id rather than
an int so that more data can be stored.

No functional changes.

Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20201126141328.6509-7-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 10:04:32 -03:00
James Clark cea6575fdc perf cpumap: Add new map type for aggregation
Currently this is a duplicate of perf_cpu_map so that it can be used as
a drop in replacement.

In a later commit it will be changed from a map of ints to use the new
cpu_aggr_id struct.

No functional changes.

Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20201126141328.6509-6-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 10:04:24 -03:00
James Clark 2760f5a14f perf stat: Replace aggregation ID with a struct
Replace all occurences of the usage of int with the new struct
cpu_aggr_id.

No functional changes.

Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20201126141328.6509-5-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 10:04:19 -03:00
James Clark fa265e59b8 perf cpumap: Add new struct for cpu aggregation
This struct currently has only a single int member so that it can be
used as a drop in replacement for the existing behaviour.

Comparison and constructor functions have also been added that will
replace usages of '==' and '= -1'.

No functional changes.

Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20201126141328.6509-4-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 10:04:13 -03:00
James Clark 91585846f1 perf cpumap: Use existing allocator to avoid using malloc
Use the existing allocator for perf_cpu_map to avoid use of raw malloc.
This could cause an issue in later commits where the size of
perf_cpu_map is changed.

No functional changes.

Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20201126141328.6509-3-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 10:03:54 -03:00
James Clark 23331eeb73 perf tests: Improve topology test to check all aggregation types
Improve the topology test to check all aggregation types. This is to
lock down the behaviour before 'id' is changed into a struct in later
commits.

Committer testing:

  $ perf test topology
  41: Session topology: Ok
  $

  $ perf test -v topology
  41: Session topology:
  --- start ---
  test child forked, pid 965552
  templ file: /tmp/perf-test-mO7NtI
  Problems creating module maps, continuing anyway...
  CPU 0, core 0, socket 0
  CPU 1, core 1, socket 0
  CPU 2, core 2, socket 0
  CPU 3, core 4, socket 0
  CPU 4, core 5, socket 0
  CPU 5, core 6, socket 0
  CPU 6, core 8, socket 0
  CPU 7, core 9, socket 0
  CPU 8, core 10, socket 0
  CPU 9, core 12, socket 0
  CPU 10, core 13, socket 0
  CPU 11, core 14, socket 0
  CPU 12, core 0, socket 0
  CPU 13, core 1, socket 0
  CPU 14, core 2, socket 0
  CPU 15, core 4, socket 0
  CPU 16, core 5, socket 0
  CPU 17, core 6, socket 0
  CPU 18, core 8, socket 0
  CPU 19, core 9, socket 0
  CPU 20, core 10, socket 0
  CPU 21, core 12, socket 0
  CPU 22, core 13, socket 0
  CPU 23, core 14, socket 0
  test child finished with 0
  ---- end ----
  Session topology: Ok
  $

Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20201126141328.6509-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 10:03:35 -03:00
Tiezhu Yang b27d20ab1c perf tools: Update s390's syscall.tbl copy from the kernel sources
This silences the following tools/perf/ build warning:

  Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl'

Just make them same:

  cp arch/s390/kernel/syscalls/syscall.tbl tools/perf/arch/s390/entry/syscalls/syscall.tbl

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Link: http://lore.kernel.org/lkml/1608278364-6733-5-git-send-email-yangtiezhu@loongson.cn
[ There were updates after Tiezhu's post, so I just updated the copy ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 09:24:20 -03:00
Tiezhu Yang c5ef52944a perf tools: Update powerpc's syscall.tbl copy from the kernel sources
This silences the following tools/perf/ build warning:

  Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl'

Just make them same:

  cp arch/powerpc/kernel/syscalls/syscall.tbl tools/perf/arch/powerpc/entry/syscalls/syscall.tbl

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Link: http://lore.kernel.org/lkml/1608278364-6733-4-git-send-email-yangtiezhu@loongson.cn
[ There were updates after Tiezhu's post, so I just updated the copy ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 09:24:20 -03:00
Tiezhu Yang 22ffc3f559 perf s390: Move syscall.tbl check into check-headers.sh
It is better to check syscall.tbl for s390 in check-headers.sh, it is
similar with commit c9b51a0170 ("perf tools: Move syscall_64.tbl check
into check-headers.sh").

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Link: http://lore.kernel.org/lkml/1608278364-6733-3-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 09:24:20 -03:00
Tiezhu Yang 9bad32b2c6 perf powerpc: Move syscall.tbl check to check-headers.sh
It is better to check syscall.tbl for powerpc in check-headers.sh, it is
similar with commit c9b51a0170 ("perf tools: Move syscall_64.tbl check
into check-headers.sh").

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Link: http://lore.kernel.org/lkml/1608278364-6733-2-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 09:24:20 -03:00
Arnaldo Carvalho de Melo fde668244d tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes in:

Fixes: 69372cf012 ("x86/cpu: Add VM page flush MSR availablility as a CPUID feature")

That cause these changes in tooling:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  --- before	2020-12-21 09:09:05.593005003 -0300
  +++ after	2020-12-21 09:12:48.436994802 -0300
  @@ -21,7 +21,7 @@
   	[0x0000004f] = "PPIN",
   	[0x00000060] = "LBR_CORE_TO",
   	[0x00000079] = "IA32_UCODE_WRITE",
  -	[0x0000008b] = "IA32_UCODE_REV",
  +	[0x0000008b] = "AMD64_PATCH_LEVEL",
   	[0x0000008C] = "IA32_SGXLEPUBKEYHASH0",
   	[0x0000008D] = "IA32_SGXLEPUBKEYHASH1",
   	[0x0000008E] = "IA32_SGXLEPUBKEYHASH2",
  @@ -286,6 +286,7 @@
   	[0xc0010114 - x86_AMD_V_KVM_MSRs_offset] = "VM_CR",
   	[0xc0010115 - x86_AMD_V_KVM_MSRs_offset] = "VM_IGNNE",
   	[0xc0010117 - x86_AMD_V_KVM_MSRs_offset] = "VM_HSAVE_PA",
  +	[0xc001011e - x86_AMD_V_KVM_MSRs_offset] = "AMD64_VM_PAGE_FLUSH",
   	[0xc001011f - x86_AMD_V_KVM_MSRs_offset] = "AMD64_VIRT_SPEC_CTRL",
   	[0xc0010130 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_SEV_ES_GHCB",
   	[0xc0010131 - x86_AMD_V_KVM_MSRs_offset] = "AMD64_SEV",
  $

The new MSR has a pattern that wasn't matched to avoid a clash with
IA32_UCODE_REV, change the regex to prefer the more relevant AMD_
prefixed ones to catch this new AMD64_VM_PAGE_FLUSH MSR.

Which causes these parts of tools/perf/ to be rebuilt:

  CC       /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o
  LD       /tmp/build/perf/trace/beauty/tracepoints/perf-in.o
  LD       /tmp/build/perf/trace/beauty/perf-in.o
  LD       /tmp/build/perf/perf-in.o
  LINK     /tmp/build/perf/perf

This addresses this perf tools build warning:

  diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 09:24:19 -03:00
Arnaldo Carvalho de Melo 6e5192143a tools headers UAPI: Update epoll_pwait2 affected files
To pick the changes from:

  b0a0c2615f ("epoll: wire up syscall epoll_pwait2")

That addresses these perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
  diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
  Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
  diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-24 09:24:19 -03:00
Linus Torvalds 7b95f0563a Kbuild updates for v5.11
- Use /usr/bin/env for shebang lines in scripts
 
  - Remove useless -Wnested-externs warning flag
 
  - Update documents
 
  - Refactor log handling in modpost
 
  - Stop building modules without MODULE_LICENSE() tag
 
  - Make the insane combination of 'static' and EXPORT_SYMBOL an error
 
  - Improve genksyms to handle _Static_assert()
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl/iIY8VHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGbfsP+gMv3F+ztqfYNoMNmZcj+fLh4zrA
 8I3d0t0AoxovV1bsyVDk9nebsYLbDdsyCdHM1ZNFAFEpf9QLL8sxtpHvaaxy+rCq
 PCmy+E6iO5B91oORhuqpYpcmmgPHf4RrpUcnEEiWOMrHE5giYbXz3AiqGAt/88J5
 Y8yaPCQVhNJNkx73KHCMYLVp97xPGa5HvNrcskAueA8uG+FCRDFaIqFX+OYbGnmC
 /3kVAJmX6i2kNPzvnXpAW6mTbI/z7+s/k5yRbEFYNUtJqN+BfaFadV8pyOGXQr1T
 fwXVtXdWqVg7rbqupyVYItLHaOq2RBm4PJuee/8s7ooBI1y7U6N0HZCj+jES92ML
 wuqEyED+lLzmxRyfhmrFH/5XhxacciO7dQb9Woe5FQ6QOm+tQPtwCnxwrSSAK4XU
 k7CsJ+OMJI+JulFrgPuC/rcESjTAsgL2j4SDhsO0GLV+Qb/P9kXR88jt5eJygmSx
 xZWpI+FUUY/Ihw648i2pkHGS/NmfOrT78X4nvbOWMDKOV02NEoMmLDYnZPUIoetn
 yUo8+xSBp6n3aTy5TDtrMblNRUJwL9OzDlDiEjsPtNUJZ6sdQzFRsxJ7+FCw2Ley
 rKN2r+i5FdyAq0LLHDhoEcJxFY7cj+yAsd0QqtBb0NZLgLsaPiP7w45CXRNpqkWG
 BbK+F1E9jP8VfiZu
 =+27V
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Use /usr/bin/env for shebang lines in scripts

 - Remove useless -Wnested-externs warning flag

 - Update documents

 - Refactor log handling in modpost

 - Stop building modules without MODULE_LICENSE() tag

 - Make the insane combination of 'static' and EXPORT_SYMBOL an error

 - Improve genksyms to handle _Static_assert()

* tag 'kbuild-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  Documentation/kbuild: Document platform dependency practises
  Documentation/kbuild: Document COMPILE_TEST dependencies
  genksyms: Ignore module scoped _Static_assert()
  modpost: turn static exports into error
  modpost: turn section mismatches to error from fatal()
  modpost: change license incompatibility to error() from fatal()
  modpost: turn missing MODULE_LICENSE() into error
  modpost: refactor error handling and clarify error/fatal difference
  modpost: rename merror() to error()
  kbuild: don't hardcode depmod path
  kbuild: doc: document subdir-y syntax
  kbuild: doc: clarify the difference between extra-y and always-y
  kbuild: doc: split if_changed explanation to a separate section
  kbuild: doc: merge 'Special Rules' and 'Custom kbuild commands' sections
  kbuild: doc: fix 'List directories to visit when descending' section
  kbuild: doc: replace arch/$(ARCH)/ with arch/$(SRCARCH)/
  kbuild: doc: update the description about kbuild Makefiles
  Makefile.extrawarn: remove -Wnested-externs warning
  tweewide: Fix most Shebang lines
2020-12-22 14:02:39 -08:00
Kan Liang 2e7f545096 perf mem: Factor out a function to generate sort order
Now, "--phys-data" is the only option which impacts the sort order.  A
simple "if else" is enough to handle the option. But there will be more
options added, e.g. "--data-page-size", which also impact the sort
order. The code will become too complex to be maintained.

Divide the sort order string into several small pieces.  The first piece
is always the default sort string for LOAD/STORE.  Appends the specific
sort string if related option is applied.

No functional change.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20201216185805.9981-4-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-19 17:53:29 -03:00
Kan Liang a50d03e3b8 perf sort: Add sort option for data page size
Add a new sort option "data_page_size" for --mem-mode sort.  With this
option applied, perf can sort and report by sample's data page size.

Here is an example:

perf report --stdio --mem-mode
--sort=comm,symbol,phys_daddr,data_page_size

 # To display the perf.data header info, please use
 # --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 9K of event 'mem-loads:uP'
 # Total weight : 9028
 # Sort order   : comm,symbol,phys_daddr,data_page_size
 #
 # Overhead  Command  Symbol                        Data Physical
 # Address
 # Data Page Size
 # ........  .......  ............................
 # ......................  ......................
 #
    11.19%  dtlb     [.] touch_buffer              [.] 0x00000003fec82ea8  4K
     8.61%  dtlb     [.] GetTickCount              [.] 0x00000003c4f2c8a8  4K
     4.52%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f58  4K
     4.33%  dtlb     [.] __gettimeofday            [.] 0x00000003fec82f48  4K
     4.32%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f78  4K
     4.28%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f50  4K
     4.23%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f70  4K
     4.11%  dtlb     [.] GetTickCount              [.] 0x00000003fec82f68  4K
     4.00%  dtlb     [.] Calibrate                 [.] 0x00000003fec82f98  4K
     3.91%  dtlb     [.] Calibrate                 [.] 0x00000003fec82f90  4K
     3.43%  dtlb     [.] touch_buffer              [.] 0x00000003fec82e98  4K
     3.42%  dtlb     [.] touch_buffer              [.] 0x00000003fec82e90  4K
     0.09%  dtlb     [.] DoDependentLoads          [.] 0x000000036ea084c0  2M
     0.08%  dtlb     [.] DoDependentLoads          [.] 0x000000032b010b80  2M

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20201216185805.9981-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-19 17:52:24 -03:00
Kan Liang 6b9bae63de perf script: Support data page size
Display the data page size if it is available and asked by the user:

Can be configured by the user, for example:

  perf script --fields comm,event,phys_addr,data_page_size
            dtlb mem-loads:uP:        3fec82ea8 4K
            dtlb mem-loads:uP:        3fec82e90 4K
            dtlb mem-loads:uP:        3e23700a4 4K
            dtlb mem-loads:uP:        3fec82f20 4K
            dtlb mem-loads:uP:        3e23700a4 4K
            dtlb mem-loads:uP:        3b4211bec 4K
            dtlb mem-loads:uP:        382205dc0 2M
            dtlb mem-loads:uP:        36fa082c0 2M
            dtlb mem-loads:uP:        377607340 2M
            dtlb mem-loads:uP:        330010180 2M
            dtlb mem-loads:uP:        33200fd80 2M
            dtlb mem-loads:uP:        31b012b80 2M

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20201216185805.9981-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-19 17:04:39 -03:00
Arnaldo Carvalho de Melo eb2842da77 perf trace beauty: Update copy of linux/socket.h with the kernel sources
This just triggers the rebuilding of the syscall beautifiers that
extract patterns from this file due to this cset:

  b713c195d5 ("net: provide __sys_shutdown_sock() that takes a socket")

After updating it:

    CC       /tmp/build/perf/trace/beauty/sockaddr.o

Addressing this perf build warning:

  Warning: Kernel ABI header at 'tools/perf/trace/beauty/include/linux/socket.h' differs from latest version at 'include/linux/socket.h'
  diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-18 17:32:28 -03:00
Arnaldo Carvalho de Melo 281a94b0f2 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes and check what UAPI headers need to be synched.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:37:24 -03:00
Jiri Olsa feca8a8342 perf tools: Reformat record's control fd man text
Adding available control commands in separate paragraph, so it's more
readable and easier to add new commands.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201216083914.47215-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
Nick Thompson 526671bfc4 perf config: Fix example command in manpage to conform to syntax specified in the SYNOPSIS section.
Committer testing:

With the previously documented example:

  $ perf config --user report sort-order=srcline
  The config variable does not contain a section name: report
  $

With the fixed example line:

  $ perf config --user report.sort-order=srcline
  $ perf config --user report.sort-order
  report.sort-order=srcline
  $

Signed-off-by: Nick Thompson <nathompson7@protonmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/linux-perf-users/20201217142619.GA14524@redhat.com/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
Arnaldo Carvalho de Melo dc67d19204 perf test: Make sample-parsing test aware of PERF_SAMPLE_{CODE,DATA}_PAGE_SIZE
To fix this:

  $ perf test -v 27
  27: Sample parsing                                                  :
  --- start ---
  test child forked, pid 586013
  sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating
  test child finished with -1
  ---- end ----
  Sample parsing: FAILED!
  $

This patchset is still not completely merged, so when adding the
PERF_SAMPLE_CODE_PAGE_SIZE to 'struct perf_sample' we need to add the
bits added in this patch for 'perf_sample.data_page_size'.

Fixes: 251cc77b8176de37 ("tools headers UAPI: Update tools's copy of linux/perf_event.h")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
Jiri Olsa 47dce51acc perf tools: Add support to read build id from compressed elf
Adding support to decompress file before reading build id.

Adding filename__read_build_id and change its current versions to
read_build_id.

Shutting down stderr output of perf list in the shell test:
  82: Check open filename arg using perf trace + vfs_getname          : Ok

because with decompression code in the place we the
filename__read_build_id function is more verbose in case
of error and the test did not account for that.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
Jiri Olsa 8abceacff8 perf debug: Add debug_set_file function
Allow to set debug output file via new debug_set_file function.

It's called during perf startup in perf_debug_setup to set stderr file
as default and any perf command can set it later to different file.

It will be used in perf daemon command to get verbose output into log
file.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201212104358.412065-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
Jiri Olsa 7cfcd1e016 perf tools: Add evlist__disable_evsel/evlist__enable_evsel
Adding interface to enable/disable single event in the evlist based on
its name. It will be used later in new control enable/disable interface.

Keeping the evlist::enabled true when one or more events are enabled so
the toggle can work properly and toggle evlist to disabled state.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Alexei Budankov <abudankov@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20201210204330.233864-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
Namhyung Kim 96aea4daa6 perf evlist: Support pipe mode display
Likewise, perf evlist command should print event attributes by reading
PERF_RECORD_HEADER_ATTR records.

Before:
  $ perf record -o- true | ./perf evlist -i-
  (prints nothing)

After:
  $ perf record -o- true | ./perf evlist -i-
  cycles:pppH

Committer testing:

Verbose mode also works as expected:

  $ perf record -o- true | perf evlist -i-
  cycles:uhH
  $ perf record -o- true | perf evlist -vi-
  cycles:uhH: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|PERIOD, read_format: ID, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
  $

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20201210061302.88213-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
Namhyung Kim 03de8656c7 perf report: Support --header-only for pipe mode
The --header-only checks file header and prints the feature data.  But
as pipe mode doesn't have it in the header it prints almost nothing.
Change it to process first few records until it founds HEADER_FEATURE.

Before:
  $ perf record -o- true | perf report -i- --header-only
  # ========
  # captured on    : Thu Dec 10 14:34:59 2020
  # header version : 1
  # data offset    : 0
  # data size      : 0
  # feat offset    : 0
  # ========
  #

After:
  $ perf record -o- true | perf report -i- --header-only
  # ========
  # captured on    : Thu Dec 10 14:49:11 2020
  # header version : 1
  # data offset    : 0
  # data size      : 0
  # feat offset    : 0
  # ========
  #
  # hostname : balhae
  # os release : 5.7.17-1xxx
  # perf version : 5.10.rc6.gdb0ea13cc741
  # arch : x86_64
  # nrcpus online : 8
  # nrcpus avail : 8
  # cpudesc : Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
  # cpuid : GenuineIntel,6,142,12
  # total memory : 16158916 kB
  # cmdline : perf record -o- true
  # event : name = cycles, , id = { 81, 82, 83, 84, 85, 86, 87, 88 }, size = 120, ...
  # CPU_TOPOLOGY info available, use -I to display
  # NUMA_TOPOLOGY info available, use -I to display
  # pmu mappings: intel_pt = 9, intel_bts = 8, software = 1, power = 20, uprobe = 7, ...
  # time of first sample : 0.000000
  # time of last sample : 0.000000
  # sample duration :      0.000 ms
  # MEM_TOPOLOGY info available, use -I to display
  # cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20201210061302.88213-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
Joakim Zhang e15a536521 perf vendor events: Add JSON metrics for imx8mm DDR Perf
Add JSON metrics for imx8mm DDR Perf.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Acked-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Link: http://lore.kernel.org/lkml/1607080216-36968-11-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
John Garry be335ec28e perf metricgroup: Support adding metrics for system PMUs
Currently adding metrics for core- or uncore-based events matched by CPUID
is supported.

Extend this for system events.

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: http://lore.kernel.org/lkml/1607080216-36968-10-git-send-email-john.garry@huawei.com
[ Reorder 'struct metricgroup_add_iter_data' field to avoid alignment holes ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
John Garry a36fadb17c perf metricgroup: Support printing metric groups for system PMUs
Currently printing metricgroups for core- or uncore-based events matched
by CPUID is supported.

Extend this for system events.

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: http://lore.kernel.org/lkml/1607080216-36968-9-git-send-email-john.garry@huawei.com
[ Reorder 'struct metricgroup_print_sys_idata' field to avoid alignment holes ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
John Garry f6fe1e48ae perf metricgroup: Split up metricgroup__print()
To aid supporting system event metric groups, break up the function
metricgroup__print() into a part which iterates metrics and a part which
actually "prints" the metric.

No functional change intended.

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: http://lore.kernel.org/lkml/1607080216-36968-8-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
John Garry c2337d6719 perf metricgroup: Fix metrics using aliases covering multiple PMUs
Support for metric expressions using aliases which cover multiple PMUs
is broken. Consider the following test metric expression:

  "MetricExpr": "UNC_CBO_XSNP_RESPONSE.MISS_XCORE * UNC_CBO_XSNP_RESPONSE.MISS_EVICTION"

When used on my broadwell, "perf stat" gives:

  unc_cbo_xsnp_response.miss_eviction -> uncore_cbox_1/umask=0x81,event=0x22/
  unc_cbo_xsnp_response.miss_eviction -> uncore_cbox_0/umask=0x81,event=0x22/
  unc_cbo_xsnp_response.miss_xcore -> uncore_cbox_1/umask=0x41,event=0x22/
  unc_cbo_xsnp_response.miss_xcore -> uncore_cbox_0/umask=0x41,event=0x22/
  Control descriptor is not initialized
  unc_cbo_xsnp_response.miss_eviction: 3645925 1000850523 1000850523
  unc_cbo_xsnp_response.miss_xcore: 106850 1000850523 1000850523

   Performance counter stats for 'system wide':

           3,645,925      unc_cbo_xsnp_response.miss_eviction # 389567086250.00 test_metric_inc
             106,850      unc_cbo_xsnp_response.miss_xcore

         1.000883096 seconds time elapsed

Notice that only the results from one PMU are included. Fix the logic of
find_evsel_group() to enable events which apply to multiple PMUs, by
checking if the event pmu_name matches that of the metric event.

With that, "perf stat" now gives:

  unc_cbo_xsnp_response.miss_eviction -> uncore_cbox_1/umask=0x81,event=0x22/
  unc_cbo_xsnp_response.miss_eviction -> uncore_cbox_0/umask=0x81,event=0x22/
  unc_cbo_xsnp_response.miss_xcore -> uncore_cbox_1/umask=0x41,event=0x22/
  unc_cbo_xsnp_response.miss_xcore -> uncore_cbox_0/umask=0x41,event=0x22/
  Control descriptor is not initialized
  unc_cbo_xsnp_response.miss_eviction: 4237983 1000904100 1000904100
  unc_cbo_xsnp_response.miss_xcore: 218643 1000904100 1000904100
  unc_cbo_xsnp_response.miss_eviction: 4254148 1000902629 1000902629
  unc_cbo_xsnp_response.miss_xcore: 213352 1000902629 1000902629

   Performance counter stats for 'system wide':

           4,237,983      unc_cbo_xsnp_response.miss_eviction # 3668558131345.00 test_metric_inc
             218,643      unc_cbo_xsnp_response.miss_xcore
           4,254,148      unc_cbo_xsnp_response.miss_eviction
             213,352      unc_cbo_xsnp_response.miss_xcore

         1.000938151 seconds time elapsed

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: http://lore.kernel.org/lkml/1607080216-36968-7-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
John Garry 6d2783fe36 perf evlist: Change evlist__splice_list_tail() ordering
Function find_evsel_group() expects events to be ordered such that they
are grouped after their leader.

Modify evlist__splice_list_tail() to guarantee this (ordering).

[Should prob also change the function name]

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: http://lore.kernel.org/lkml/1607080216-36968-6-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
John Garry 4513c719c6 perf pmu: Add pmu_add_sys_aliases()
Add pmu_add_sys_aliases() to add system PMU events aliases.

For adding system PMU events, iterate through all the events for all SoC
event tables in pmu_sys_event_tables[].

Matches must satisfy both:
- PMU identifier matches event "compat" value
- event "Unit" member must match, same as uncore event aliases matched by
  CPUID

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: http://lore.kernel.org/lkml/1607080216-36968-5-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
John Garry 51d5484715 perf pmu: Add pmu_id()
Add a function to read the PMU id sysfs entry. This is only done for uncore
PMUs where this would possibly be relevant.

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: http://lore.kernel.org/lkml/1607080216-36968-4-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
John Garry 4689f56796 perf jevents: Add support for system events tables
Process the JSONs to find support for "system" events, which are not
tied to a specific CPUID.

A "COMPAT" property is now used to match against the namespace ID from
the kernel PMU driver.

The generated pmu-events.c will now have 2 tables:

a. CPU events, as before.
b. New pmu_sys_event_tables[] table, which will have events matched to
   specific SoCs.

It will look like this:

struct pmu_event pme_hisilicon_hip09_sys[] = {
{
	.name = "cycles",
	.compat = "0x00030736",
	.event = "event=0",
	.desc = "Clock cycles",
	.topic = "smmu v3 pmcg",
	.long_desc = "Clock cycles",
},
{
	.name = "smmuv3_pmcg.l1_tlb",
	.compat = "0x00030736",
	.event = "event=0x8a",
	.desc = "SMMUv3 PMCG l1_tlb. Unit: smmuv3_pmcg ",
	.topic = "smmu v3 pmcg",
	.long_desc = "SMMUv3 PMCG l1_tlb",
	.pmu = "smmuv3_pmcg",
},
...
};

struct pmu_event pme_arm_cortex_a53[] = {
{
	.name = "ext_mem_req",
	.event = "event=0xc0",
	.desc = "External memory request",
	.topic = "memory",
},
{
	.name = "ext_mem_req_nc",
	.event = "event=0xc1",
	.desc = "Non-cacheable external memory request",
	.topic = "memory",
},
...
};

struct pmu_event pme_hisilicon_hip09_cpu[] = {
{
	.name = "l2d_cache_refill_wr",
	.event = "event=0x53",
	.desc = "L2D cache refill, write",
	.topic = "core imp def",
	.long_desc = "Attributable Level 2 data cache refill, write",
},
...
};

struct pmu_events_map pmu_events_map[] = {
{
	.cpuid = "0x00000000410fd030",
	.version = "v1",
	.type = "core",
	.table = pme_arm_cortex_a53
},
{
	.cpuid = "0x00000000480fd010",
	.version = "v1",
	.type = "core",
	.table = pme_hisilicon_hip09_cpu
},
	{
		.table = 0
	},
};

struct pmu_event pme_hisilicon_hip09_cpu[] = {
{
	.name = "uncore_hisi_l3c.rd_cpipe",
	.event = "event=0",
	.desc = "Total read accesses. Unit: hisi_sccl,l3c ",
	.topic = "uncore l3c",
	.long_desc = "Total read accesses",
	.pmu = "hisi_sccl,l3c",
},
{
	.name = "uncore_hisi_l3c.wr_cpipe",
	.event = "event=0x1",
	.desc = "Total write accesses. Unit: hisi_sccl,l3c ",
	.topic = "uncore l3c",
	.long_desc = "Total write accesses",
	.pmu = "hisi_sccl,l3c",
},
...
};

struct pmu_sys_events pmu_sys_event_tables[] = {
{
	.table = pme_hisilicon_hip09_sys,
},
...
};

Committer notes:

Added the fix for architectures without PMU events, provided by John
after I reported the build failing in such systems.

Link: https://lore.kernel.org/lkml/650baaf2-36b6-a9e2-ff49-963ef864c1f3@huawei.com/

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: http://lore.kernel.org/lkml/1607080216-36968-3-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
John Garry 4853f1caa4 perf jevents: Add support for an extra directory level
Currently only upto a level 2 directory is supported, in form
vendor/platform.

Add support for a further level, to support vendor/platform
sub-directories in future, which will be vendor/platform/cpu and
vendor/platform/sys.

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: http://lore.kernel.org/lkml/1607080216-36968-2-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:17 -03:00
Arnaldo Carvalho de Melo 456ef4c11c perf evsel: Emit warning about kernel not supporting the data page size sample_type bit
Before we had this unhelpful message:

  $ perf record --data-page-size sleep 1
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles:u).
  /bin/dmesg | grep -i perf may provide additional information.
  $

Add support to the perf_missing_features variable to remember what
caused evsel__open() to fail and then use that information in
evsel__open_strerror().

  $ perf record --data-page-size sleep 1
  Error:
  Asking for the data page size isn't supported by this kernel.
  $

Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20201207170759.GB129853@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:16 -03:00
Kan Liang 542b88fd12 perf record: Support new sample type for data page size
Support new sample type PERF_SAMPLE_DATA_PAGE_SIZE for page size.

Add new option --data-page-size to record sample data page size.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20201130172803.2676-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:16 -03:00
Jan Kratochvil bf53fc6b5f perf unwind: Fix separate debug info files when using elfutils' libdw's unwinder
elfutils needs to be provided main binary and separate debug info file
respectively. Providing separate debug info file instead of the main
binary is not sufficient.

One needs to try both supplied filename and its possible cache by its
build-id depending on the use case.

Signed-off-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:16 -03:00
Zheng Zengkai 2eb5dd4180 perf record: Fix memory leak when using '--user-regs=?' to list registers
When using 'perf record's option '-I' or '--user-regs=' along with
argument '?' to list available register names, memory of variable 'os'
allocated by strdup() needs to be released before __parse_regs()
returns, otherwise memory leak will occur.

Fixes: bcc84ec65a ("perf record: Add ability to name registers to record")
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Li Bin <huawei.libin@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20200703093344.189450-1-zhengzengkai@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:16 -03:00
Kajol Jain b2ce5dbc15 perf test: Fix metric parsing test
Commit e1c92a7fbb ("perf tests: Add another metric parsing test") add
another test for metric parsing. The test goes through all metrics
compiled for arch within pmu events and try to parse them.

Right now this test is failing in powerpc machine.

Result in power9 platform:

  [command]# ./perf test 10
  10: PMU events                                                      :
  10.1: PMU event table sanity                                        : Ok
  10.2: PMU event map aliases                                         : Ok
  10.3: Parsing of PMU event table metrics                            : Skip (some metrics failed)
  10.4: Parsing of PMU event table metrics with fake PMUs             : FAILED!

Issue is we are passing different runtime parameter value in
"expr__find_other" and "expr__parse" function which is called from
function `metric_parse_fake`.  And because of this parsing of hv-24x7
metrics is failing.

  [command]# ./perf test 10 -vv
  .....
  hv_24x7/pm_mcs01_128b_rd_disp_port01,chip=1/ not found
  expr__parse failed
  test child finished with -1
  ---- end ----
  PMU events subtest 4: FAILED!

This patch fix this issue and change runtime parameter value to '0' in
expr__parse function.

Result in power9 platform after this patch:

  [command]# ./perf test 10
  10: PMU events                                                      :
  10.1: PMU event table sanity                                        : Ok
  10.2: PMU event map aliases                                         : Ok
  10.3: Parsing of PMU event table metrics                            : Skip (some metrics failed)
  10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

Fixes: e1c92a7fbb ("perf tests: Add another metric parsing test")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20201119152411.46041-1-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-15 11:52:41 -03:00
Finn Behrens c25ce589dc tweewide: Fix most Shebang lines
Change every shebang which does not need an argument to use /usr/bin/env.
This is needed as not every distro has everything under /usr/bin,
sometimes not even bash.

Signed-off-by: Finn Behrens <me@kloenk.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-12-08 23:30:04 +09:00
Jakub Kicinski 55fd59b003 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:
	drivers/net/ethernet/ibm/ibmvnic.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03 15:44:09 -08:00
Arnaldo Carvalho de Melo db0ea13cc7 perf evlist: Use the right prefix for 'struct evlist' record methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 15:19:40 -03:00
Arnaldo Carvalho de Melo b979a2f13b perf evlist: Use the right prefix for 'struct evlist' diff methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 15:18:48 -03:00
Arnaldo Carvalho de Melo f63c2f5a8b perf evlist: Use the right prefix for 'struct evlist' nr_threads method
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 15:17:20 -03:00
Arnaldo Carvalho de Melo 515ea461c2 perf evlist: Use the right prefix for 'struct evlist' deliver event method
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 15:16:29 -03:00
Arnaldo Carvalho de Melo 1420ba2f62 perf evlist: Use the right prefix for 'struct evlist' header methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 15:15:30 -03:00
Arnaldo Carvalho de Melo 44d2a55736 perf evlist: Use the right prefix for 'struct evlist' raw samples methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 15:15:30 -03:00
Arnaldo Carvalho de Melo 25f84702f3 perf evlist: Use the right prefix for 'struct evlist' mmap pages parsing method
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 15:15:30 -03:00
Arnaldo Carvalho de Melo 78e1bc2578 perf evlist: Use the right prefix for 'struct evlist' event attribute config methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 15:15:27 -03:00
Arnaldo Carvalho de Melo 606e2c2933 perf evlist: Use the right prefix for alternative 'struct evlist' constructors
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 15:04:05 -03:00
Arnaldo Carvalho de Melo 900c8ead5b perf evlist: Use the right prefix for 'struct evlist' event selection methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 15:01:08 -03:00
Arnaldo Carvalho de Melo 64b4778b86 perf evlist: Use the right prefix for 'struct evlist' event group methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 15:00:12 -03:00
Arnaldo Carvalho de Melo 7748bb7175 perf evlist: Use the right prefix for 'struct evlist' create maps methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 14:56:52 -03:00
Arnaldo Carvalho de Melo 7127372419 perf evlist: Use the right prefix for 'struct evlist' print methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 14:55:12 -03:00
Arnaldo Carvalho de Melo e414fd1a3f perf evlist: Use the right prefix for 'struct evlist' evsel list methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 14:52:44 -03:00
Arnaldo Carvalho de Melo 0a60b33947 perf evlist: Use the right prefix for 'struct evlist' pause/resume methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 14:49:05 -03:00
Arnaldo Carvalho de Melo 37b01abe2a perf evlist: Use the right prefix for 'struct evlist' enable event methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 14:47:05 -03:00
Arnaldo Carvalho de Melo 0a7e7ec90e perf evlist: Use the right prefix for 'struct evlist' id_pos methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 14:44:40 -03:00
Alexandre Truong 2a99ff822d perf tools: Add aarch64 registers to --user-regs
Previously, this command returns no help message on aarch64:

  -> ./perf record --user-regs=?

  available registers:
  Usage: perf record [<options>] [<command>]
      or: perf record [<options>] -- <command> [<options>]

With this change, the registers are listed.

  -> ./perf record --user-regs=?

  available registers: x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22 x23 x24 x25 x26 x27 x28 x29 lr sp pc

It's also now possible to record subsets of registers on aarch64:

  -> ./perf record --user-regs=x4,x5 ls
  -> ./perf report --dump-raw-trace

  12801163749305260 0xc70 [0x40]: PERF_RECORD_SAMPLE(IP, 0x2): 51956/51956: 0xffffaa6571f0 period: 145785 addr: 0
  ... user regs: mask 0x30 ABI 64-bit
  .... x4    0x000000000000006c
  .... x5    0x0000001001000001
   ... thread: ls:51956
    ...... dso: /usr/lib64/ld-2.17.so

Signed-off-by: Alexandre Truong <alexandre.truong@arm.com>
Tested-by: James Clark <james.clark@arm.com>
Acked-by: John Garry <john.garry@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20201127153923.26717-1-alexandre.truong@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 14:39:55 -03:00
Arnaldo Carvalho de Melo e80db25552 perf evlist: Use the right prefix for 'struct evlist' tracking event methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 14:39:41 -03:00
Arnaldo Carvalho de Melo f4bd0b4a9b perf evlist: Use the right prefix for 'struct evlist' browser methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 14:23:35 -03:00
Arnaldo Carvalho de Melo 3ccf8a7b66 perf evlist: Use the right prefix for 'struct evlist' sample id lookup methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 14:17:57 -03:00
Arnaldo Carvalho de Melo fd643db5a8 perf evlist: Ditch unused set/reset sample_bit methods
Not used anymore, ditch them.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 09:54:08 -03:00
Arnaldo Carvalho de Melo b02736f776 perf evlist: Use the right prefix for 'struct evlist' 'find' methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 09:48:07 -03:00
Arnaldo Carvalho de Melo 2a6599cd5e perf evlist: Use the right prefix for 'struct evlist' sample parsing methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 09:43:07 -03:00
Arnaldo Carvalho de Melo 08c83997ca perf evlist: Use the right prefix for 'struct evlist' sideband thread methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 09:40:10 -03:00
Arnaldo Carvalho de Melo 24bf91a754 perf evlist: Use the right prefix for 'struct evlist' 'filter' methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 09:38:02 -03:00
Arnaldo Carvalho de Melo ade9d208d6 perf evlist: Use the right prefix for 'struct evlist' 'toggle' methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 09:33:55 -03:00
Arnaldo Carvalho de Melo 53f5e9084d perf evlist: Use the right prefix for 'struct evlist' stats methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 09:31:04 -03:00
Arnaldo Carvalho de Melo 7b392ef04e perf evlist: Use the right prefix for 'struct evlist' 'workload' methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 09:26:54 -03:00
Arnaldo Carvalho de Melo a622eafa1a perf evlist: Use the right prefix for 'struct evlist' methods: evlist__set_leader()
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 09:22:07 -03:00
Arnaldo Carvalho de Melo 56933029d0 perf evsel: Convert last 'struct evsel' methods to the right evsel__ prefix
As 'perf_evsel__' means its a function in tools/lib/perf/.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 09:08:24 -03:00
Namhyung Kim 94b69c615e perf test: Add shadow stat test
It calculates IPC from the cycles and instruction counts and compares it
with the shadow stat for both global aggregation (default) and no
aggregation mode.

 $ perf stat -a -A -e cycles,instructions sleep 1

   Performance counter stats for 'system wide':

  CPU0   39,580,880      cycles
  CPU1   45,426,945      cycles
  CPU2   31,151,685      cycles
  CPU3   55,167,421      cycles
  CPU0   17,073,564      instructions      #    0.43  insn per cycle
  CPU1   34,955,764      instructions      #    0.77  insn per cycle
  CPU2   15,688,459      instructions      #    0.50  insn per cycle
  CPU3   34,699,217      instructions      #    0.63  insn per cycle

       1.003275495 seconds time elapsed

In this example, the 'insn per cycle' should be matched to the number
for each cpu.  For CPU2, 0.50 = 15,688,459 / 31,151,685 .

Committer testing:

  # perf test shadow
  78: perf stat metrics (shadow stat) test                            : Ok
  #

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201127041404.390276-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 08:58:26 -03:00
Arnaldo Carvalho de Melo 1f195e557d Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 08:56:55 -03:00
Masami Hiramatsu a9ffd0484e perf probe: Change function definition check due to broken DWARF
Since some gcc generates a broken DWARF which lacks DW_AT_declaration
attribute from the subprogram DIE of function prototype.
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060)

So, in addition to the DW_AT_declaration check, we also check the
subprogram DIE has DW_AT_inline or actual entry pc.

Committer testing:

  # cat /etc/fedora-release
  Fedora release 33 (Thirty Three)
  #

Before:

  # perf test vfs_getname
  78: Use vfs_getname probe to get syscall args filenames             : FAILED!
  79: Check open filename arg using perf trace + vfs_getname          : FAILED!
  81: Add vfs_getname probe to get syscall args filenames             : FAILED!
  #

After:

  # perf test vfs_getname
  78: Use vfs_getname probe to get syscall args filenames             : Ok
  79: Check open filename arg using perf trace + vfs_getname          : Ok
  81: Add vfs_getname probe to get syscall args filenames             : Ok
  #

Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Link: http://lore.kernel.org/lkml/160645613571.2824037.7441351537890235895.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 14:36:15 -03:00
Masami Hiramatsu ab4200c17b perf probe: Fix to die_entrypc() returns error correctly
Fix die_entrypc() to return error correctly if the DIE has no
DW_AT_ranges attribute. Since dwarf_ranges() will treat the case as an
empty ranges and return 0, we have to check it by ourselves.

Fixes: 91e2f539ee ("perf probe: Fix to show function entry line as probe-able")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: http://lore.kernel.org/lkml/160645612634.2824037.5284932731175079426.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 14:33:17 -03:00
Namhyung Kim c0ee1d5ae8 perf stat: Use proper cpu for shadow stats
Currently perf stat shows some metrics (like IPC) for defined events.
But when no aggregation mode is used (-A option), it shows incorrect
values since it used a value from a different cpu.

Before:

  $ perf stat -aA -e cycles,instructions sleep 1

   Performance counter stats for 'system wide':

  CPU0      116,057,380      cycles
  CPU1       86,084,722      cycles
  CPU2       99,423,125      cycles
  CPU3       98,272,994      cycles
  CPU0       53,369,217      instructions      #    0.46  insn per cycle
  CPU1       33,378,058      instructions      #    0.29  insn per cycle
  CPU2       58,150,086      instructions      #    0.50  insn per cycle
  CPU3       40,029,703      instructions      #    0.34  insn per cycle

       1.001816971 seconds time elapsed

So the IPC for CPU1 should be 0.38 (= 33,378,058 / 86,084,722)
but it was 0.29 (= 33,378,058 / 116,057,380) and so on.

After:

  $ perf stat -aA -e cycles,instructions sleep 1

   Performance counter stats for 'system wide':

  CPU0      109,621,384      cycles
  CPU1      159,026,454      cycles
  CPU2       99,460,366      cycles
  CPU3      124,144,142      cycles
  CPU0       44,396,706      instructions      #    0.41  insn per cycle
  CPU1      120,195,425      instructions      #    0.76  insn per cycle
  CPU2       44,763,978      instructions      #    0.45  insn per cycle
  CPU3       69,049,079      instructions      #    0.56  insn per cycle

       1.001910444 seconds time elapsed

Fixes: 44d49a6002 ("perf stat: Support metrics in --per-core/socket mode")
Reported-by: Sam Xi <xyzsam@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201127041404.390276-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 14:31:37 -03:00
Namhyung Kim aa50d953c1 perf record: Synthesize cgroup events only if needed
It didn't check the tool->cgroup_events bit which is set when the
--all-cgroups option is given.  Without it, samples will not have cgroup
info so no reason to synthesize.

We can check the PERF_RECORD_CGROUP records after running perf record
*WITHOUT* the --all-cgroups option:

Before:

  $ perf report -D | grep CGROUP
  0 0 0x8430 [0x38]: PERF_RECORD_CGROUP cgroup: 1 /
          CGROUP events:          1
          CGROUP events:          0
          CGROUP events:          0

After:

  $ perf report -D | grep CGROUP
          CGROUP events:          0
          CGROUP events:          0
          CGROUP events:          0

Committer testing:

Before:

  # perf record -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.208 MB perf.data (10003 samples) ]
  # perf report -D | grep "CGROUP events"
            CGROUP events:        146
            CGROUP events:          0
            CGROUP events:          0
  #

After:

  # perf record -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.208 MB perf.data (10448 samples) ]
  # perf report -D | grep "CGROUP events"
            CGROUP events:          0
            CGROUP events:          0
            CGROUP events:          0
  #

With all-cgroups:

  # perf record --all-cgroups -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.374 MB perf.data (11526 samples) ]
  # perf report -D | grep "CGROUP events"
            CGROUP events:        146
            CGROUP events:          0
            CGROUP events:          0
  #

Fixes: 8fb4b67939 ("perf record: Add --all-cgroups option")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201127054356.405481-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 14:26:33 -03:00
Zhen Lei 9713070028 perf diff: Fix error return value in __cmd_diff()
An appropriate return value should be set on the failed path.

Fixes: 2a09a84c72 ("perf diff: Support hot streams comparison")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20201124103652.438-1-thunder.leizhen@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 14:21:23 -03:00
Arnaldo Carvalho de Melo 3b13eaf0ba perf tools: Update copy of libbpf's hashmap.c
To pick the changes in:

  7a078d2d18 ("libbpf, hashmap: Fix undefined behavior in hash_bits")

That don't entail any changes in tools/perf.

This addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/perf/util/hashmap.h' differs from latest version at 'tools/lib/bpf/hashmap.h'
  diff -u tools/perf/util/hashmap.h tools/lib/bpf/hashmap.h

Not a kernel ABI, its just that this uses the mechanism in place for
checking kernel ABI files drift.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 14:19:33 -03:00
Jiri Olsa fd4ebb457c perf build-id: Add build_id_cache__add function
Adding build_id_cache__add function as core function that adds file into
build id database. It will be set from another callers in following
changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201126170026.2619053-22-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 08:37:29 -03:00
Jiri Olsa 75fb2af68e perf build-id: Add __perf_session__cache_build_ids function
Adding __perf_session__cache_build_ids function as an interface for
caching sessions build ids with callback function and its data pointer.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201126170026.2619053-20-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 08:37:25 -03:00
Jiri Olsa 0b7b9e83c7 perf build-id: Use machine__for_each_dso in perf_session__cache_build_ids
Using machine__for_each_dso in perf_session__cache_build_ids, so we can
reuse perf_session__cache_build_ids with different callback in following
changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201126170026.2619053-19-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 08:37:20 -03:00
Jiri Olsa 058f151130 perf data: Add is_perf_data function
Adding is_perf_data function that returns true if the given path is perf
data file. It will be used in following patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201126170026.2619053-21-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 08:37:15 -03:00
Jiri Olsa ca8ea73ae1 perf symbols: Try to load vmlinux from buildid database
Currently we don't check on kernel's vmlinux the same way as we do for
normal binaries, but we either look for kallsyms file in build id
database or check on known vmlinux locations (plus some other optional
paths).

This patch adds the check for standard build id binary location, so we
are ready once we start to store it there from debuginfod in following
changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201126170026.2619053-13-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 08:37:08 -03:00
Jiri Olsa 031f112f8d perf tools: Use struct extra_kernel_map in machine__process_kernel_mmap_event
Using struct extra_kernel_map in machine__process_kernel_mmap_event, to
pass mmap details. This way we can used single function for all 3 mmap
versions.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201126170026.2619053-12-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 08:37:03 -03:00
Jiri Olsa af21c579c8 perf build-id: Add check for existing link in buildid dir
When adding new build id link we fail if the link is already there.
Adding check for existing link and output debug message that the build
id is already linked.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201126170026.2619053-11-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 08:36:58 -03:00
Jiri Olsa 7ac22b088a perf tools: Add filename__decompress function
Factor filename__decompress from decompress_kmodule function.  It can
decompress files with compressions supported in perf - xz and gz, the
support needs to be compiled in.

It will to be used in following changes to get build id out of
compressed elf objects.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201126170026.2619053-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 08:36:53 -03:00
Jiri Olsa f45edd86b2 perf tools: Add build_id__is_defined function
Adding build_id__is_defined helper to check build id is defined and is
!= zero build id.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201126170026.2619053-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 08:36:37 -03:00
Wei Li 05e91e7fe2 perf arm-spe: Add support for ARMv8.3-SPE
This patch is to support Armv8.3 extension for SPE, it adds alignment
field in the Events packet and it supports the Scalable Vector Extension
(SVE) for Operation packet and Events packet with two additions:

  - The vector length for SVE operations in the Operation Type packet;
  - The incomplete predicate and empty predicate fields in the Events
    packet.

Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20201119152441.6972-17-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26 09:31:46 -03:00
Andre Przywara 3601e60550 perf arm_spe: Decode memory tagging properties
When SPE records a physical address, it can additionally tag the event
with information from the Memory Tagging architecture extension.

Decode the two additional fields in the SPE event payload.

[leoy: Refined patch to use predefined macros]

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20201119152441.6972-16-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26 09:31:46 -03:00
Leo Yan 3d829724b1 perf arm-spe: Add more sub classes for operation packet
For the operation type packet payload with load/store class, it misses
to support these sub classes:

  - A load/store targeting the general-purpose registers;
  - A load/store targeting unspecified registers;
  - The ARMv8.4 nested virtualisation extension can redirect system
    register accesses to a memory page controlled by the hypervisor.
    The SPE profiling feature in newer implementations can tag those
    memory accesses accordingly.

Add the bit pattern describing load/store sub classes, so that the perf
tool can decode it properly.

Inspired by Andre Przywara, refined the commit log and code for more
clear description.

Co-developed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20201119152441.6972-15-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26 09:31:46 -03:00
Leo Yan e771218f32 perf arm-spe: Refactor operation packet handling
Defines macros for operation packet header and formats (support sub
classes for 'other', 'branch', 'load and store', etc).  Uses these
macros for operation packet decoding and dumping.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20201119152441.6972-14-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26 09:31:46 -03:00
Leo Yan 7488ffc4d9 perf arm-spe: Add new function arm_spe_pkt_desc_op_type()
The operation type packet is complex and contains subclass; the parsing
flow causes deep indentation; for more readable, this patch introduces
a new function arm_spe_pkt_desc_op_type() which is used for operation
type parsing.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20201119152441.6972-13-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26 09:31:46 -03:00
Leo Yan 4d0f4ca273 perf arm-spe: Remove size condition checking for events
In the Armv8 ARM (ARM DDI 0487F.c), chapter "D10.2.6 Events packet", it
describes the event bit is valid with specific payload requirement.  For
example, the Last Level cache access event, the bit is defined as:

  E[8], byte 1 bit [0], when SZ == 0b01 , when SZ == 0b10 ,
  		     or when SZ == 0b11

It requires the payload size is at least 2 bytes, when byte 1 (start
counting from 0) is valid, E[8] (bit 0 in byte 1) can be used for LLC
access event type.  For safety, the code checks the condition for
payload size firstly, if meet the requirement for payload size, then
continue to parse event type.

If review function arm_spe_get_payload(), it has used cast, so any bytes
beyond the valid size have been set to zeros.

For this reason, we don't need to check payload size anymore afterwards
when parse events, thus this patch removes payload size conditions.

Suggested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20201119152441.6972-12-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26 09:31:46 -03:00
Leo Yan 889d1a675f perf arm-spe: Refactor event type handling
Move the enums of event types to arm-spe-pkt-decoder.h, thus function
arm_spe_pkt_desc_event() can use them for bitmasks.

Suggested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20201119152441.6972-11-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26 09:31:46 -03:00
Leo Yan e66f6d7596 perf arm-spe: Add new function arm_spe_pkt_desc_event()
This patch moves out the event packet parsing from arm_spe_pkt_desc()
to the new function arm_spe_pkt_desc_event().

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20201119152441.6972-10-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26 09:31:45 -03:00
Leo Yan d158aa408f perf arm-spe: Refactor counter packet handling
This patch defines macros for counter packet header, and uses macros to
replace hard code values in functions arm_spe_get_counter() and
arm_spe_pkt_desc().

In the function arm_spe_get_counter(), adds a new line for more
readable.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20201119152441.6972-9-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26 09:31:43 -03:00
Leo Yan c52cfe9872 perf arm-spe: Add new function arm_spe_pkt_desc_counter()
This patch moves out the counter packet parsing code from
arm_spe_pkt_desc() to the new function arm_spe_pkt_desc_counter().

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20201119152441.6972-8-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26 09:31:42 -03:00
Leo Yan 6550149e80 perf arm-spe: Refactor context packet handling
Minor refactoring to use macro for index mask.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20201119152441.6972-7-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26 09:31:41 -03:00
Leo Yan 5513ddaf10 perf arm_spe: Fixup top byte for data virtual address
To establish a valid address from the address packet payload and finally
the address value can be used for parsing data symbol in DSO, current
code uses 0xff to replace the tag in the top byte of data virtual
address.

So far the code only fixups top byte for the memory layouts with 4KB
pages, it misses to support memory layouts with 64KB pages.

This patch adds the conditions for checking bits [55:52] are 0xf, if
detects the pattern it will fill 0xff into the top byte of the address,
also adds comment to explain the fixing up.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20201119152441.6972-6-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26 09:31:40 -03:00
Leo Yan 09935ca7b6 perf arm-spe: Refactor address packet handling
This patch is to refactor address packet handling, it defines macros for
address packet's header and payload, these macros are used by decoder
and the dump flow.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20201119152441.6972-5-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26 09:31:38 -03:00
Leo Yan ab2aa439e4 perf arm-spe: Add new function arm_spe_pkt_desc_addr()
This patch moves out the address parsing code from arm_spe_pkt_desc()
and uses the new introduced function arm_spe_pkt_desc_addr() to process
address packet.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20201119152441.6972-4-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26 09:31:34 -03:00
Leo Yan 11695142e2 perf arm-spe: Refactor packet header parsing
The packet header parsing uses the hard coded values and it uses nested
if-else statements.

To improve the readability, this patch refactors the macros for packet
header format so it removes the hard coded values.  Furthermore, based
on the new mask macros it reduces the nested if-else statements and
changes to use the flat conditions checking, this is directive and can
easily map to the descriptions in ARMv8-a architecture reference manual
(ARM DDI 0487E.a), chapter 'D10.1.5 Statistical Profiling Extension
protocol packet headers'.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20201119152441.6972-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26 09:31:29 -03:00
Leo Yan 75eeaddd57 perf arm-spe: Refactor printing string to buffer
When outputs strings to the decoding buffer with function snprintf(),
SPE decoder needs to detects if any error returns from snprintf() and if
so needs to directly bail out.  If snprintf() returns success, it needs
to update buffer pointer and reduce the buffer length so can continue to
output the next string into the consequent memory space.

This complex logics are spreading in the function arm_spe_pkt_desc() so
there has many duplicate codes for handling error detecting, increment
buffer pointer and decrement buffer size.

To avoid the duplicate code, this patch introduces a new helper function
arm_spe_pkt_out_string() which is used to wrap up the complex logics,
and it's used by the caller arm_spe_pkt_desc().  This patch moves the
variable 'blen' as the function's local variable so allows to remove
the unnecessary braces and improve the readability.

This patch simplifies the return value for arm_spe_pkt_desc(): '0' means
success and other values mean an error has occurred.  To realize this,
it relies on arm_spe_pkt_out_string()'s parameter 'err', the 'err' is a
cumulative value, returns its final value if printing buffer is called
for one time or multiple times.  Finally, the error is handled in a
central place, rather than directly bailing out in switch-cases, it
returns error at the end of arm_spe_pkt_desc().

This patch changes the caller arm_spe_dump() to respect the updated
return value semantics of arm_spe_pkt_desc().

Suggested-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wei Li <liwei391@huawei.com>
Link: https://lore.kernel.org/r/20201119152441.6972-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26 09:31:23 -03:00
Jakub Kicinski 56495a2442 Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-19 19:08:46 -08:00
Ian Rogers 568beb2795 perf test: Avoid an msan warning in a copied stack.
This fix is for a failure that occurred in the DWARF unwind perf test.

Stack unwinders may probe memory when looking for frames.

Memory sanitizer will poison and track uninitialized memory on the
stack, and on the heap if the value is copied to the heap.

This can lead to false memory sanitizer failures for the use of an
uninitialized value.

Avoid this problem by removing the poison on the copied stack.

The full msan failure with track origins looks like:

==2168==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x559ceb10755b in handle_cfi elfutils/libdwfl/frame_unwind.c:648:8
    #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    #7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    #8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    #9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    #10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    #16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    #17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    #18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    #19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    #20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    #21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    #22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    #23 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceb106acf in __libdwfl_frame_reg_set elfutils/libdwfl/frame_unwind.c:77:22
    #1 0x559ceb106acf in handle_cfi elfutils/libdwfl/frame_unwind.c:627:13
    #2 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    #3 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    #4 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    #5 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    #6 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    #7 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    #8 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    #9 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    #10 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    #11 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #12 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #13 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #14 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #15 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #16 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    #17 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    #18 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    #19 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    #20 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    #21 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    #22 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    #23 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    #24 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceb106a54 in handle_cfi elfutils/libdwfl/frame_unwind.c:613:9
    #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    #7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    #8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    #9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    #10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    #16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    #17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    #18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    #19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    #20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    #21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    #22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    #23 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceaff8800 in memory_read tools/perf/util/unwind-libdw.c:156:10
    #1 0x559ceb10f053 in expr_eval elfutils/libdwfl/frame_unwind.c:501:13
    #2 0x559ceb1060cc in handle_cfi elfutils/libdwfl/frame_unwind.c:603:18
    #3 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    #4 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    #5 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    #6 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    #7 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    #8 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    #9 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    #10 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    #11 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    #12 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #13 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #14 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #15 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #16 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #17 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    #18 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    #19 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    #20 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    #21 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    #22 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    #23 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    #24 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    #25 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559cea9027d9 in __msan_memcpy llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3
    #1 0x559cea9d2185 in sample_ustack tools/perf/arch/x86/tests/dwarf-unwind.c:41:2
    #2 0x559cea9d202c in test__arch_unwind_sample tools/perf/arch/x86/tests/dwarf-unwind.c:72:9
    #3 0x559ceabc9cbd in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:106:6
    #4 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #5 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #6 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #7 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #8 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #9 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    #10 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    #11 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    #12 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    #13 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    #14 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    #15 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    #16 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    #17 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was created by an allocation of 'bf' in the stack frame of function 'perf_event__synthesize_mmap_events'
    #0 0x559ceafc5f60 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:445

SUMMARY: MemorySanitizer: use-of-uninitialized-value elfutils/libdwfl/frame_unwind.c:648:8 in handle_cfi
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: clang-built-linux@googlegroups.com
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandeep Dasgupta <sdasgup@google.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201113182053.754625-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-16 14:10:58 -03:00
Ian Rogers 29396cd573 perf expr: Force encapsulation on expr_id_data
This patch resolves some undefined behavior where variables in
expr_id_data were accessed (for debugging) without being defined. To
better enforce the tagged union behavior, the struct is moved into
expr.c and accessors provided. Tag values (kinds) are explicitly
identified.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-By: Kajol Jain<kjain@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20200826153055.2067780-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-16 14:09:18 -03:00
Jin Yao 3d05181a08 perf vendor events: Update Skylake client events to v50
- Update Skylake events to v50.
- Update Skylake JSON metrics from TMAM 4.0.
- Fix the issue in DRAM_Parallel_Reads
- Fix the perf test warning

Before:

  root@kbl-ppc:~# perf stat -M DRAM_Parallel_Reads -- sleep 1
  event syntax error: '{arb/event=0x80,umask=0x2/,arb/event=0x80,umask=0x2,thresh=1/}:W'
                       \___ unknown term 'thresh' for pmu 'uncore_arb'

  valid terms: event,edge,inv,umask,cmask,config,config1,config2,name,period,percore

  Initial error:
  event syntax error: '..umask=0x2/,arb/event=0x80,umask=0x2,thresh=1/}:W'
                                    \___ Cannot find PMU `arb'. Missing kernel support?

  root@kbl-ppc:~# perf test metrics
  10: PMU events                                 :
  10.3: Parsing of PMU event table metrics               : Skip (some metrics failed)
  10.4: Parsing of PMU event table metrics with fake PMUs: Ok
  67: Parse and process metrics                  : Ok

After:

  root@kbl-ppc:~# perf stat -M MEM_Parallel_Reads -- sleep 1

   Performance counter stats for 'system wide':

           4,951,646      arb/event=0x80,umask=0x2/ #    26.30 MEM_Parallel_Reads       (50.04%)
             188,251      arb/event=0x80,umask=0x2,cmask=1/                                     (49.96%)

         1.000867010 seconds time elapsed

  root@kbl-ppc:~# perf test metrics
  10: PMU events                                 :
  10.3: Parsing of PMU event table metrics               : Ok
  10.4: Parsing of PMU event table metrics with fake PMUs: Ok
  67: Parse and process metrics                  : Ok

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/93fae76f-ce2b-ab0b-3ae9-cc9a2b4cbaec@linux.intel.com/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-16 14:05:32 -03:00
Al Grant 1c756cd429 perf inject: Fix file corruption due to event deletion
"perf inject" can create corrupt files when synthesizing sample events from AUX
data. This happens when in the input file, the first event (for the AUX data)
has a different sample_type from the second event (generally dummy).

Specifically, they differ in the bits that indicate the standard fields
appended to perf records in the mmap buffer. "perf inject" deletes the first
event and moves up the second event to first position.

The problem is with the synthetic PERF_RECORD_MMAP (etc.) events created
by "perf record".

Since these are synthetic versions of events which are normally produced
by the kernel, they have to have the standard fields appended as
described by sample_type.

"perf record" fills these in with zeroes, including the IDENTIFIER
field; perf readers interpret records with zero IDENTIFIER using the
descriptor for the first event in the file.

Since "perf inject" changes the first event, these synthetic records are
then processed with the wrong value of sample_type, and the perf reader
reads bad data, reports on incorrect length records etc.

Mismatching sample_types are seen with "perf record -e cs_etm//", where the AUX
event has TID|TIME|CPU|IDENTIFIER and the dummy event has TID|TIME|IDENTIFIER.

Perhaps they could be the same, but it isn't normally a problem if they aren't
- perf has no problems reading the file.

The sample_types have to agree on the position of IDENTIFIER, because
that's how perf finds the right event descriptor in the first place, but
they don't normally have to agree on other fields, and perf doesn't
check that they do.

The problem is specific to the way "perf inject" reorganizes the events
and the way synthetic MMAP events are recorded with a zero identifier. A
simple solution is to stop "perf inject" deleting the tracing event.

Committer testing

Removed the now unused 'evsel' variable, update the comment about the
evsel removal not being performed anymore, and apply the patch manually
as it failed with this warning:

  warning: Patch sent with format=flowed; space at the end of lines might be lost.

Testing it with:

  $ perf bench internals inject-build-id
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 8.543 msec (+- 0.130 msec)
    Average time per event: 0.838 usec (+- 0.013 usec)
    Average memory usage: 12717 KB (+- 9 KB)
    Average build-id-all injection took: 5.710 msec (+- 0.058 msec)
    Average time per event: 0.560 usec (+- 0.006 usec)
    Average memory usage: 12079 KB (+- 7 KB)
  $

Signed-off-by: Al Grant <al.grant@arm.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LPU-Reference: b9cf5611-daae-2390-3439-6617f8f0a34b@foss.arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-16 13:59:17 -03:00
Namhyung Kim 601366678c perf data: Allow to use stdio functions for pipe mode
When perf data is in a pipe, it reads each event separately using
read(2) syscall.  This is a huge performance bottleneck when
processing large data like in perf inject.  Also perf inject needs to
use write(2) syscall for the output.

So convert it to use buffer I/O functions in stdio library for pipe
data.  This makes inject-build-id bench time drops from 20ms to 8ms.

  $ perf bench internals inject-build-id
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 8.074 msec (+- 0.013 msec)
    Average time per event: 0.792 usec (+- 0.001 usec)
    Average memory usage: 8328 KB (+- 0 KB)
    Average build-id-all injection took: 5.490 msec (+- 0.008 msec)
    Average time per event: 0.538 usec (+- 0.001 usec)
    Average memory usage: 7563 KB (+- 0 KB)

This patch enables it just for perf inject when used with pipe (it's a
default behavior).  Maybe we could do it for perf record and/or report
later..

Committer testing:

Before:

  $ perf stat -r 5 perf bench internals inject-build-id
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 13.605 msec (+- 0.064 msec)
    Average time per event: 1.334 usec (+- 0.006 usec)
    Average memory usage: 12220 KB (+- 7 KB)
    Average build-id-all injection took: 11.458 msec (+- 0.058 msec)
    Average time per event: 1.123 usec (+- 0.006 usec)
    Average memory usage: 11546 KB (+- 8 KB)
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 13.673 msec (+- 0.057 msec)
    Average time per event: 1.341 usec (+- 0.006 usec)
    Average memory usage: 12508 KB (+- 8 KB)
    Average build-id-all injection took: 11.437 msec (+- 0.046 msec)
    Average time per event: 1.121 usec (+- 0.004 usec)
    Average memory usage: 11812 KB (+- 7 KB)
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 13.641 msec (+- 0.069 msec)
    Average time per event: 1.337 usec (+- 0.007 usec)
    Average memory usage: 12302 KB (+- 8 KB)
    Average build-id-all injection took: 10.820 msec (+- 0.106 msec)
    Average time per event: 1.061 usec (+- 0.010 usec)
    Average memory usage: 11616 KB (+- 7 KB)
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 13.379 msec (+- 0.074 msec)
    Average time per event: 1.312 usec (+- 0.007 usec)
    Average memory usage: 12334 KB (+- 8 KB)
    Average build-id-all injection took: 11.288 msec (+- 0.071 msec)
    Average time per event: 1.107 usec (+- 0.007 usec)
    Average memory usage: 11657 KB (+- 8 KB)
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 13.534 msec (+- 0.058 msec)
    Average time per event: 1.327 usec (+- 0.006 usec)
    Average memory usage: 12264 KB (+- 8 KB)
    Average build-id-all injection took: 11.557 msec (+- 0.076 msec)
    Average time per event: 1.133 usec (+- 0.007 usec)
    Average memory usage: 11593 KB (+- 8 KB)

   Performance counter stats for 'perf bench internals inject-build-id' (5 runs):

            4,060.05 msec task-clock:u              #    1.566 CPUs utilized            ( +-  0.65% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             101,888      page-faults:u             #    0.025 M/sec                    ( +-  0.12% )
       3,745,833,163      cycles:u                  #    0.923 GHz                      ( +-  0.10% )  (83.22%)
         194,346,613      stalled-cycles-frontend:u #    5.19% frontend cycles idle     ( +-  0.57% )  (83.30%)
         708,495,034      stalled-cycles-backend:u  #   18.91% backend cycles idle      ( +-  0.48% )  (83.48%)
       5,629,328,628      instructions:u            #    1.50  insn per cycle
                                                    #    0.13  stalled cycles per insn  ( +-  0.21% )  (83.57%)
       1,236,697,927      branches:u                #  304.602 M/sec                    ( +-  0.16% )  (83.44%)
          17,564,877      branch-misses:u           #    1.42% of all branches          ( +-  0.23% )  (82.99%)

              2.5934 +- 0.0128 seconds time elapsed  ( +-  0.49% )

  $

After:

  $ perf stat -r 5 perf bench internals inject-build-id
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 8.560 msec (+- 0.125 msec)
    Average time per event: 0.839 usec (+- 0.012 usec)
    Average memory usage: 12520 KB (+- 8 KB)
    Average build-id-all injection took: 5.789 msec (+- 0.054 msec)
    Average time per event: 0.568 usec (+- 0.005 usec)
    Average memory usage: 11919 KB (+- 9 KB)
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 8.639 msec (+- 0.111 msec)
    Average time per event: 0.847 usec (+- 0.011 usec)
    Average memory usage: 12732 KB (+- 8 KB)
    Average build-id-all injection took: 5.647 msec (+- 0.069 msec)
    Average time per event: 0.554 usec (+- 0.007 usec)
    Average memory usage: 12093 KB (+- 7 KB)
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 8.551 msec (+- 0.096 msec)
    Average time per event: 0.838 usec (+- 0.009 usec)
    Average memory usage: 12739 KB (+- 8 KB)
    Average build-id-all injection took: 5.617 msec (+- 0.061 msec)
    Average time per event: 0.551 usec (+- 0.006 usec)
    Average memory usage: 12105 KB (+- 7 KB)
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 8.403 msec (+- 0.097 msec)
    Average time per event: 0.824 usec (+- 0.010 usec)
    Average memory usage: 12770 KB (+- 8 KB)
    Average build-id-all injection took: 5.611 msec (+- 0.085 msec)
    Average time per event: 0.550 usec (+- 0.008 usec)
    Average memory usage: 12134 KB (+- 8 KB)
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 8.518 msec (+- 0.102 msec)
    Average time per event: 0.835 usec (+- 0.010 usec)
    Average memory usage: 12518 KB (+- 10 KB)
    Average build-id-all injection took: 5.503 msec (+- 0.073 msec)
    Average time per event: 0.540 usec (+- 0.007 usec)
    Average memory usage: 11882 KB (+- 8 KB)

   Performance counter stats for 'perf bench internals inject-build-id' (5 runs):

            2,394.88 msec task-clock:u              #    1.577 CPUs utilized            ( +-  0.83% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             103,181      page-faults:u             #    0.043 M/sec                    ( +-  0.11% )
       3,548,172,030      cycles:u                  #    1.482 GHz                      ( +-  0.30% )  (83.26%)
          81,537,700      stalled-cycles-frontend:u #    2.30% frontend cycles idle     ( +-  1.54% )  (83.24%)
         876,631,544      stalled-cycles-backend:u  #   24.71% backend cycles idle      ( +-  1.14% )  (83.45%)
       5,960,361,707      instructions:u            #    1.68  insn per cycle
                                                    #    0.15  stalled cycles per insn  ( +-  0.27% )  (83.26%)
       1,269,413,491      branches:u                #  530.054 M/sec                    ( +-  0.10% )  (83.48%)
          11,372,453      branch-misses:u           #    0.90% of all branches          ( +-  0.52% )  (83.31%)

             1.51874 +- 0.00642 seconds time elapsed  ( +-  0.42% )

  $

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201030054742.87740-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-16 13:37:28 -03:00
Jakub Kicinski 07cbce2e46 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-11-14

1) Add BTF generation for kernel modules and extend BTF infra in kernel
   e.g. support for split BTF loading and validation, from Andrii Nakryiko.

2) Support for pointers beyond pkt_end to recognize LLVM generated patterns
   on inlined branch conditions, from Alexei Starovoitov.

3) Implements bpf_local_storage for task_struct for BPF LSM, from KP Singh.

4) Enable FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage
   infra, from Martin KaFai Lau.

5) Add XDP bulk APIs that introduce a defer/flush mechanism to optimize the
   XDP_REDIRECT path, from Lorenzo Bianconi.

6) Fix a potential (although rather theoretical) deadlock of hashtab in NMI
   context, from Song Liu.

7) Fixes for cross and out-of-tree build of bpftool and runqslower allowing build
   for different target archs on same source tree, from Jean-Philippe Brucker.

8) Fix error path in htab_map_alloc() triggered from syzbot, from Eric Dumazet.

9) Move functionality from test_tcpbpf_user into the test_progs framework so it
   can run in BPF CI, from Alexander Duyck.

10) Lift hashtab key_size limit to be larger than MAX_BPF_STACK, from Florian Lehner.

Note that for the fix from Song we have seen a sparse report on context
imbalance which requires changes in sparse itself for proper annotation
detection where this is currently being discussed on linux-sparse among
developers [0]. Once we have more clarification/guidance after their fix,
Song will follow-up.

  [0] https://lore.kernel.org/linux-sparse/CAHk-=wh4bx8A8dHnX612MsDO13st6uzAz1mJ1PaHHVevJx_ZCw@mail.gmail.com/T/
      https://lore.kernel.org/linux-sparse/20201109221345.uklbp3lzgq6g42zb@ltop.local/T/

* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (66 commits)
  net: mlx5: Add xdp tx return bulking support
  net: mvpp2: Add xdp tx return bulking support
  net: mvneta: Add xdp tx return bulking support
  net: page_pool: Add bulk support for ptr_ring
  net: xdp: Introduce bulking for xdp tx return path
  bpf: Expose bpf_d_path helper to sleepable LSM hooks
  bpf: Augment the set of sleepable LSM hooks
  bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP
  bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP
  bpf: Rename some functions in bpf_sk_storage
  bpf: Folding omem_charge() into sk_storage_charge()
  selftests/bpf: Add asm tests for pkt vs pkt_end comparison.
  selftests/bpf: Add skb_pkt_end test
  bpf: Support for pointers beyond pkt_end.
  tools/bpf: Always run the *-clean recipes
  tools/bpf: Add bootstrap/ to .gitignore
  bpf: Fix NULL dereference in bpf_task_storage
  tools/bpftool: Fix build slowdown
  tools/runqslower: Build bpftool using HOSTCC
  tools/runqslower: Enable out-of-tree build
  ...
====================

Link: https://lore.kernel.org/r/20201114020819.29584-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 09:13:41 -08:00
Leo Yan dd94ac807a perf test: Update branch sample pattern for cs-etm
Since the commit 943b69ac18 ("perf parse-events: Set exclude_guest=1
for user-space counting"), 'exclude_guest=1' is set for user-space
counting; and the branch sample's modifier has been altered, the sample
event name has been changed from "branches:u:" to "branches:uH:", which
gives out info for "user-space and host counting".

But the cs-etm testing's regular expression cannot match the updated
branch sample event and leads to test failure.

This patch updates the branch sample pattern by using a more flexible
expression '.*' to match branch sample's modifiers, so that allows the
testing to work as expected.

Fixes: 943b69ac18 ("perf parse-events: Set exclude_guest=1 for user-space counting")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight ml <coresight@lists.linaro.org>
Cc: stable@kernel.org
Link: http://lore.kernel.org/lkml/20201110063417.14467-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-12 17:55:42 -03:00
Leo Yan db2ac2e49e perf test: Fix a typo in cs-etm testing
Fix a typo: s/devce_name/device_name.

Fixes: fe0aed19b2 ("perf test: Introduce script for Arm CoreSight testing")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight ml <coresight@lists.linaro.org>
Cc: stable@kernel.org
Link: http://lore.kernel.org/lkml/20201110063417.14467-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-12 17:55:42 -03:00
Arnaldo Carvalho de Melo db1a8b97a0 tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy'
To bring in the change made in this cset:

  4d6ffa27b8 ("x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S")
  6dcc5627f6 ("x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_*")

I needed to define SYM_FUNC_START_LOCAL() as SYM_L_GLOBAL as
mem{cpy,set}_{orig,erms} are used by 'perf bench'.

This silences these perf tools build warnings:

  Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S'
  diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S
  Warning: Kernel ABI header at 'tools/arch/x86/lib/memset_64.S' differs from latest version at 'arch/x86/lib/memset_64.S'
  diff -u tools/arch/x86/lib/memset_64.S arch/x86/lib/memset_64.S

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Fangrui Song <maskray@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-12 17:55:41 -03:00
Leo Yan b0e5a05cc9 perf lock: Don't free "lock_seq_stat" if read_count isn't zero
When execute command "perf lock report", it hits failure and outputs log
as follows:

  perf: builtin-lock.c:623: report_lock_release_event: Assertion `!(seq->read_count < 0)' failed.
  Aborted

This is an imbalance issue.  The locking sequence structure
"lock_seq_stat" contains the reader counter and it is used to check if
the locking sequence is balance or not between acquiring and releasing.

If the tool wrongly frees "lock_seq_stat" when "read_count" isn't zero,
the "read_count" will be reset to zero when allocate a new structure at
the next time; thus it causes the wrong counting for reader and finally
results in imbalance issue.

To fix this issue, if detects "read_count" is not zero (means still have
read user in the locking sequence), goto the "end" tag to skip freeing
structure "lock_seq_stat".

Fixes: e4cef1f650 ("perf lock: Fix state machine to recognize lock sequence")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201104094229.17509-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-12 17:55:41 -03:00
Leo Yan e24a87b54e perf lock: Correct field name "flags"
The tracepoint "lock:lock_acquire" contains field "flags" but not
"flag".  Current code wrongly retrieves value from field "flag" and it
always gets zero for the value, thus "perf lock" doesn't report the
correct result.

This patch replaces the field name "flag" with "flags", so can read out
the correct flags for locking.

Fixes: e4cef1f650 ("perf lock: Fix state machine to recognize lock sequence")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201104094229.17509-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-12 17:55:41 -03:00
Jean-Philippe Brucker c8a950d0d3 tools: Factor HOSTCC, HOSTLD, HOSTAR definitions
Several Makefiles in tools/ need to define the host toolchain variables.
Move their definition to tools/scripts/Makefile.include

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/bpf/20201110164310.2600671-2-jean-philippe@linaro.org
2020-11-11 12:18:22 -08:00
Leo Yan 0a04244cab perf arm-spe: Fix packet length handling
When processing address packet and counter packet, if the packet
contains extended header, it misses to account the extra one byte for
header length calculation, thus returns the wrong packet length.

To correct the packet length calculation, one possible fixing is simply
to plus extra 1 for extended header, but will spread some duplicate code
in the flows for processing address packet and counter packet.
Alternatively, we can refine the function arm_spe_get_payload() to not
only support short header and allow it to support extended header, and
rely on it for the packet length calculation.

So this patch refactors function arm_spe_get_payload() with a new
argument 'ext_hdr' for support extended header; the packet processing
flows can invoke this function to unify the packet length calculation.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20201111071149.815-6-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11 14:45:05 -03:00
Leo Yan b65577baf4 perf arm-spe: Refactor arm_spe_get_events()
In function arm_spe_get_events(), the event packet's 'index' is assigned
as payload length, but the flow is not directive: it firstly gets the
packet length from the return value of arm_spe_get_payload(), the value
includes header length (1) and payload length:

  int ret = arm_spe_get_payload(buf, len, packet);

and then reduces header length from packet length, so finally get the
payload length:

  packet->index = ret - 1;

To simplify the code, this patch directly assigns payload length to
event packet's index; and at the end it calls arm_spe_get_payload() to
return the payload value.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20201111071149.815-5-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11 14:45:05 -03:00
Leo Yan b2ded2e2e2 perf arm-spe: Refactor payload size calculation
This patch defines macro to extract "sz" field from header, and renames
the function payloadlen() to arm_spe_payload_len().

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20201111071149.815-4-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11 14:45:05 -03:00
Leo Yan 903b659436 perf arm-spe: Fix a typo in comment
Fix a typo: s/iff/if.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20201111071149.815-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11 14:45:05 -03:00
Leo Yan c185f1cde4 perf arm-spe: Include bitops.h for BIT() macro
Include header linux/bitops.h, directly use its BIT() macro and remove
the self defined macros.

Committer notes:

Use BIT_ULL() instead of BIT to build on 32-bit arches as mentioned in
review by Andre Przywara <andre.przywara@arm.com>. I noticed the build
failure when crossbuilding to arm32 from x86_64.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/20201111071149.815-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11 14:44:22 -03:00
Leo Yan 40714c5863 perf mem: Support ARM SPE events
This patch adds ARM SPE events for perf memory profiling:

  'spe-load': event for only recording memory load ops;
  'spe-store': event for only recording memory store ops;
  'spe-ldst': event for recording memory load and store ops.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201106094853.21082-10-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11 12:27:37 -03:00
Leo Yan c825f78851 perf c2c: Support AUX trace
This patch adds the AUX callbacks in session structure, so support AUX
trace for "perf c2c" tool; make itrace memory event as default for "perf
c2c", this tells the AUX trace decoder to synthesize samples and can be
used for statistics.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201106094853.21082-9-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11 12:27:06 -03:00
Leo Yan 13e5df1e3f perf mem: Support AUX trace
The 'perf mem' tool doesn't support AUX trace data so it cannot receive
the hardware tracing data.

On arm64, although it doesn't support PMU events for memory load and
store, ARM SPE is a good candidate for memory profiling, the hardware
tracer can record memory accessing operations with affiliated
information (e.g. physical address and virtual address for accessing,
cache levels, TLB walking, latency, etc).

To allow "perf mem" tool to support AUX trace, this patch adds the AUX
callbacks for session structure; make itrace memory event as default for
"perf mem", this tells the AUX trace decoder to synthesize memory
samples.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201106094853.21082-8-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11 12:25:45 -03:00
Leo Yan 014a771c78 perf auxtrace: Add itrace option '-M' for memory events
This patch is to add itrace option '-M' to synthesize memory event.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201106094853.21082-7-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11 12:24:51 -03:00
Leo Yan 436cce0071 perf mem: Only initialize memory event for recording
It's needless to initialize memory events for reporting, this patch
moves memory event initialization for only recording.  Furthermore,
the change allows to parse perf data on cross platforms, e.g. perf
tool can report result properly even the machine doesn't support
the memory events.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201106094853.21082-6-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11 12:24:10 -03:00
Leo Yan 8b8173b45a perf c2c: Support memory event PERF_MEM_EVENTS__LOAD_STORE
When user doesn't specify event name, perf c2c tool enables both the
load and store events, and this leads to failure for opening the
duplicate PMU device for AUX trace.

After the memory event PERF_MEM_EVENTS__LOAD_STORE is introduced, when
the user doesn't specify event name, this patch converts the required
operation to PERF_MEM_EVENTS__LOAD_STORE if the arch supports it.
Otherwise, the tool still rolls back to enable events
PERF_MEM_EVENTS__LOAD and PERF_MEM_EVENTS__STORE.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201106094853.21082-5-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11 12:23:53 -03:00
Leo Yan 4ba2452cd8 perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE
On the architectures with perf memory profiling, two types of hardware
events have been supported: load and store; if want to profile memory
for both load and store operations, the tool will use these two events
at the same time, the usage is:

  # perf mem record -t load,store -- uname

But this cannot be applied for AUX tracing event, the same PMU event can
be used to only trace memory load, or only memory store, or trace for
both memory load and store.

This patch introduces a new event PERF_MEM_EVENTS__LOAD_STORE, which is
used to support the event which can record both memory load and store
operations.

When user specifies memory operation type as 'load,store', or doesn't
set type so use 'load,store' as default, if the arch supports the event
PERF_MEM_EVENTS__LOAD_STORE, the tool will convert the required
operations to this single event; otherwise, if the arch doesn't support
PERF_MEM_EVENTS__LOAD_STORE, the tool rolls back to enable both events
PERF_MEM_EVENTS__LOAD and PERF_MEM_EVENTS__STORE, which keeps the same
behaviour with before.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201106094853.21082-4-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11 10:30:25 -03:00
Leo Yan eaf6aaeec5 perf mem: Introduce weak function perf_mem_events__ptr()
Different architectures might use different event or different event
parameters for memory profiling, this patch introduces a weak
perf_mem_events__ptr() function which allows to return back a
architecture specific memory event.

Since the variable 'perf_mem_events' can be only accessed by the
perf_mem_events__ptr() function, mark the variable as 'static', this
allows the architectures to define its own memory event array.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201106094853.21082-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11 10:29:12 -03:00
Leo Yan f9f16dfbe7 perf mem: Search event name with more flexible path
The perf tool searches a memory event name under the folder
'/sys/devices/cpu/events/', this leads to the limitation for the
selection of a memory profiling event which must be under this folder.

Thus it's impossible to use any other event as memory event which is not
under this specific folder, e.g. Arm SPE hardware event is not located
in '/sys/devices/cpu/events/' so it cannot be enabled for memory
profiling.

This patch changes to search folder from '/sys/devices/cpu/events/' to
'/sys/devices', so it give flexibility to find events which can be used
for memory profiling.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201106094853.21082-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11 10:28:12 -03:00
John Garry 644bf4b0f7 perf jevents: Add test for arch std events
Recently there was an undetected breakage for std arch event support.

Add support in "PMU events" testcase to detect such breakages.

For this, the "test" arch needs has support added to process std arch
events. And a test event is added for the test, ifself.

Also add a few code comments to help understand the code a bit better.

Committer testing:

Before:

  # perf test -vv pmu  |& grep l3_cache_rd
  #

After:

  # perf test -vv pmu  |& grep l3_cache_rd
  testing event table l3_cache_rd: pass
  testing aliases PMU cpu: matched event l3_cache_rd
  #

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-By: Kajol Jain<kjain@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/1603364547-197086-3-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:41 -03:00
John Garry fa1b41a74d perf jevents: Tidy error handling
There is much duplication in the error handling for directory transvering
for prcessing JSONs.

Factor out the common code to tidy a bit.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-By: Kajol Jain<kjain@linux.ibm.com>
Link: https://lore.kernel.org/r/1603364547-197086-2-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:41 -03:00
Namhyung Kim c5e6bc2335 perf trace beauty: Allow header files in a different path
Current script to generate mmap flags and prot checks headers from the
uapi/asm-generic directory but it might come from a different directory
in some environment.  So change the pattern to accept it.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201023020628.346257-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:41 -03:00
Andi Kleen 55a4de94c6 perf stat: Add --quiet option
Add a new --quiet option to 'perf stat'. This is useful with 'perf stat
record' to write the data only to the perf.data file, which can lower
measurement overhead because the data doesn't need to be formatted.

On my 4C desktop:

  % time ./perf stat record  -e $(python -c 'print ",".join(["cycles"]*1000)')  -a -I 1000 sleep 5
  ...
  real    0m5.377s
  user    0m0.238s
  sys     0m0.452s
  % time ./perf stat record --quiet -e $(python -c 'print ",".join(["cycles"]*1000)')  -a -I 1000 sleep 5

  real    0m5.452s
  user    0m0.183s
  sys     0m0.423s

In this example it cuts the user time by 20%. On systems with more cores
the savings are higher.

Signed-off-by: Andi Kleen <andi@firstfloor.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Link: http://lore.kernel.org/lkml/20201027002737.30942-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:41 -03:00
Namhyung Kim bb1c15b60b perf stat: Support regex pattern in --for-each-cgroup
To make the command line even more compact with cgroups, support regex
pattern matching in cgroup names.

  $ perf stat -a -e cpu-clock,cycles --for-each-cgroup ^foo sleep 1

          3,000.73 msec cpu-clock                 foo #    2.998 CPUs utilized
    12,530,992,699      cycles                    foo #    7.517 GHz                      (100.00%)
          1,000.61 msec cpu-clock                 foo/bar #    1.000 CPUs utilized
     4,178,529,579      cycles                    foo/bar #    2.506 GHz                      (100.00%)
          1,000.03 msec cpu-clock                 foo/baz #    0.999 CPUs utilized
     4,176,104,315      cycles                    foo/baz #    2.505 GHz                      (100.00%)

       1.000892614 seconds time elapsed

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201027072855.655449-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:41 -03:00
Namhyung Kim 9b0a783635 perf test: Use generic event for expand_libpfm_events()
I found that the UNHALTED_CORE_CYCLES event is only available in the
Intel machines and it makes other vendors/archs fail on the test.  As
libpfm4 can parse the generic events like cycles, let's use them.

Fixes: 40b74c30ff ("perf test: Add expand cgroup event test")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201027072855.655449-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:41 -03:00
Sergey Senozhatsky 1218838d68 perf kvm: Add kvm-stat for arm64
Add support for 'perf kvm stat' on arm64 platform.

Example:

  # perf kvm stat report

Analyze events for all VMs, all VCPUs:

    VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time

   DABT_LOW     661867    98.91%    40.45%      2.19us   3364.65us      6.24us ( +-   0.34% )
        IRQ       4598     0.69%    57.44%      2.89us   3397.59us   1276.27us ( +-   1.61% )
        WFx       1475     0.22%     1.71%      2.22us   3388.63us    118.31us ( +-   8.69% )
   IABT_LOW       1018     0.15%     0.38%      2.22us   2742.07us     38.29us ( +-  12.55% )
      SYS64        180     0.03%     0.01%      2.07us    112.91us      6.57us ( +-  14.95% )
      HVC64         17     0.00%     0.01%      2.19us    322.35us     42.95us ( +-  58.98% )

Total Samples:669155, Total events handled time:10216387.86us.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Suleiman Souhlal <suleiman@google.com>
Link: http://lore.kernel.org/lkml/20201027062421.463355-1-sergey.senozhatsky@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:41 -03:00
Arnaldo Carvalho de Melo ef0580ecd8 perf env: Conditionally compile BPF support code on having HAVE_LIBBPF_SUPPORT
If libbpf isn't selected, no need for a bunch of related code, that were
not even being used, as code using these perf_env methods was also
enclosed in HAVE_LIBBPF_SUPPORT.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:41 -03:00
Arnaldo Carvalho de Melo 20e88c6076 perf annotate: Move bpf header inclusion to inside HAVE_LIBBPF_SUPPORT
No need to include it otherwise.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:41 -03:00
Arnaldo Carvalho de Melo 38219f2411 perf tests: Skip the llvm and bpf tests if HAVE_LIBBPF_SUPPORT isn't defined
If either NO_LIBBPF=1 is passed, explicitely disabling it or if libbpf
is not available due to some missing dependency, skip its tests, telling
the user the feature isn't available.

  # perf test
  <SNIP>
  40: LLVM search and compile                                         : Skip (not compiled in)
  41: Session topology                                                : Ok
  42: BPF filter                                                      : Skip (not compiled in)
  <SNIP>

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:41 -03:00
Arnaldo Carvalho de Melo c18cf78d79 perf bpf: Enclose libbpf.h include within HAVE_LIBBPF_SUPPORT
As it uses the 'deprecated' attribute in a way that breaks the build
with old gcc compilers, so to continue being able to build in such
systems where NO_LIBBPF=1 is being used, enclose it under
HAVE_LIBBPF_SUPPORT.

   1 centos:6          : FAIL gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
   2 oraclelinux:6     : FAIL gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1)

    CC       /tmp/build/perf/builtin-record.o
  In file included from util/bpf-loader.h:11,
                   from builtin-record.c:39:
  /git/linux/tools/lib/bpf/libbpf.h:203: error: wrong number of arguments specified for 'deprecated' attribute

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:40 -03:00
Tommi Rantala cc3b964d5e perf test: Implement skip_reason callback for watchpoint tests
Currently reason for skipping the read only watchpoint test is only seen
when running in verbose mode:

  $ perf test watchpoint
  23: Watchpoint                                            :
  23.1: Read Only Watchpoint                                : Skip
  23.2: Write Only Watchpoint                               : Ok
  23.3: Read / Write Watchpoint                             : Ok
  23.4: Modify Watchpoint                                   : Ok

  $ perf test -v watchpoint
  23: Watchpoint                                            :
  23.1: Read Only Watchpoint                                :
  --- start ---
  test child forked, pid 60204
  Hardware does not support read only watchpoints.
  test child finished with -2

Implement skip_reason callback for the watchpoint tests, so that it's
easy to see reason why the test is skipped:

  $ perf test watchpoint
  23: Watchpoint                                            :
  23.1: Read Only Watchpoint                                : Skip (missing hardware support)
  23.2: Write Only Watchpoint                               : Ok
  23.3: Read / Write Watchpoint                             : Ok
  23.4: Modify Watchpoint                                   : Ok

Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20201016131650.72476-1-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:40 -03:00
Leo Yan 248dd9b591 perf tests tsc: Add checking helper is_supported()
So far tsc is enabled on x86_64, i386 and Arm64 architectures, add
checking helper to skip this testing for other architectures.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20201019100236.23675-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:40 -03:00
Leo Yan 3989bbf960 perf tests tsc: Make tsc testing as a common testing
x86 arch provides the testing for conversion between tsc and perf time,
the testing is located in x86 arch folder.  Move this testing out from
x86 arch folder and place it into the common testing folder, so allows
to execute tsc testing on other architectures (e.g. Arm64).

This patch removes the inclusion of "arch-tests.h" from the testing
code, this can avoid building failure if any arch has no this header
file.

Committer testing:

  $ perf test -v tsc
  Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
  70: Convert perf time to TSC                                        :
  --- start ---
  test child forked, pid 4032834
  mmap size 528384B
  1st event perf time 165409788843605 tsc 336578703793868
  rdtsc          time 165409788854986 tsc 336578703837038
  2nd event perf time 165409788855487 tsc 336578703838935
  test child finished with 0
  ---- end ----
  Convert perf time to TSC: Ok
  $

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20201019100236.23675-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:40 -03:00
Leo Yan 0ee281e1e4 perf mem2node: Improve warning if detected no memory nodes
Some archs (e.g. x86 and Arm64) don't enable the configuration
CONFIG_MEMORY_HOTPLUG by default, if this configuration is not enabled
when build the kernel image, the SysFS for memory nodes will be missed.
This results in perf tool has no chance to catpure the memory nodes
information, when perf tool reports the result and detects no memory
nodes, it outputs "assertion failed at util/mem2node.c:99".

The output log doesn't give out reason for the failure and users have no
clue for how to fix it.  This patch changes to use explicit way for
warning: it tells user that detected no memory nodes and suggests to
enable CONFIG_MEMORY_HOTPLUG for kernel building.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201019003613.8399-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:40 -03:00
Ian Rogers a7c77c4f52 perf version: Add a feature for libpfm4
If perf is built with libpfm4 (LIBPFM4=1) then advertise it in perf -vv.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201019232545.4047264-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:40 -03:00
Dengcheng Zhu a701d28e2d perf annotate mips: Add perf arch instructions annotate handlers
Support the MIPS architecture using the ins_ops association method. With
this patch, perf-annotate can work well on MIPS.

Testing it with a perf.data file collected on a mips machine:

$./perf annotate -i perf.data

         :           Disassembly of section .text:
         :
         :           00000000000be6a0 <get_next_seq>:
         :           get_next_seq():
    0.00 :   be6a0:       lw      v0,0(a0)
    0.00 :   be6a4:       daddiu  sp,sp,-128
    0.00 :   be6a8:       ld      a7,72(a0)
    0.00 :   be6ac:       gssq    s5,s4,80(sp)
    0.00 :   be6b0:       gssq    s1,s0,48(sp)
    0.00 :   be6b4:       gssq    s8,gp,112(sp)
    0.00 :   be6b8:       gssq    s7,s6,96(sp)
    0.00 :   be6bc:       gssq    s3,s2,64(sp)
    0.00 :   be6c0:       sd      a3,0(sp)
    0.00 :   be6c4:       move    s0,a0
    0.00 :   be6c8:       sd      v0,32(sp)
    0.00 :   be6cc:       sd      a5,8(sp)
    0.00 :   be6d0:       sd      zero,8(a0)
    0.00 :   be6d4:       sd      a6,16(sp)
    0.00 :   be6d8:       ld      s2,48(a0)
    8.53 :   be6dc:       ld      s1,40(a0)
    9.42 :   be6e0:       ld      v1,32(a0)
    0.00 :   be6e4:       nop
    0.00 :   be6e8:       ld      s4,24(a0)
    0.00 :   be6ec:       ld      s5,16(a0)
    0.00 :   be6f0:       sd      a7,40(sp)
   10.11 :   be6f4:       ld      s6,64(a0)

...

The original patch link:
https://lore.kernel.org/patchwork/patch/1180480/

Signed-off-by: Dengcheng Zhu <dzhu@wavecomp.com>
Cc: Dengcheng Zhu <dzhu@wavecomp.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Cc: linux-mips@vger.kernel.org
[ fanpeng@loongson.cn: Add missing "bgtzl", "bltzl", "bgezl", "blezl", "beql" and "bnel" for pre-R6processors ]
Signed-off-by: Peng Fan <fanpeng@loongson.cn>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:40 -03:00
Namhyung Kim 2c589d933e perf tools: Add missing swap for cgroup events
It was missed to add a swap function for PERF_RECORD_CGROUP.

Fixes: ba78c1c546 ("perf tools: Basic support for CGROUP event")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201102140228.303657-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-03 09:16:41 -03:00
Jiri Olsa fe01adb723 perf tools: Add missing swap for ino_generation
We are missing swap for ino_generation field.

Fixes: 5c5e854bc7 ("perf tools: Add attr->mmap2 support")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20201101233103.3537427-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-03 09:15:02 -03:00
Jiri Olsa 6311951d4f perf tools: Initialize output buffer in build_id__sprintf
We display garbage for undefined build_id objects, because we don't
initialize the output buffer.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20201101233103.3537427-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-03 09:14:45 -03:00
Song Liu 86449b12f6 perf hists browser: Increase size of 'buf' in perf_evsel__hists_browse()
Making perf with gcc-9.1.1 generates the following warning:

    CC       ui/browsers/hists.o
  ui/browsers/hists.c: In function 'perf_evsel__hists_browse':
  ui/browsers/hists.c:3078:61: error: '%d' directive output may be \
  truncated writing between 1 and 11 bytes into a region of size \
  between 2 and 12 [-Werror=format-truncation=]

   3078 |       "Max event group index to sort is %d (index from 0 to %d)",
        |                                                             ^~
  ui/browsers/hists.c:3078:7: note: directive argument in the range [-2147483648, 8]
   3078 |       "Max event group index to sort is %d (index from 0 to %d)",
        |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  In file included from /usr/include/stdio.h:937,
                   from ui/browsers/hists.c:5:

IOW, the string in line 3078 might be too long for buf[] of 64 bytes.

Fix this by increasing the size of buf[] to 128.

Fixes: dbddf17474  ("perf report/top TUI: Support hotkeys to let user select any event for sorting")
Signed-off-by: Song Liu <songliubraving@fb.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: stable@vger.kernel.org # v5.7+
Link: http://lore.kernel.org/lkml/20201030235431.534417-1-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-03 09:11:45 -03:00
Arnaldo Carvalho de Melo d0e7b0c71f perf scripting python: Avoid declaring function pointers with a visibility attribute
To avoid this:

  util/scripting-engines/trace-event-python.c: In function 'python_start_script':
  util/scripting-engines/trace-event-python.c:1595:2: error: 'visibility' attribute ignored [-Werror=attributes]
   1595 |  PyMODINIT_FUNC (*initfunc)(void);
        |  ^~~~~~~~~~~~~~

That started breaking when building with PYTHON=python3 and these gcc
versions (I haven't checked with the clang ones, maybe it breaks there
as well):

  # export PERF_TARBALL=http://192.168.86.5/perf/perf-5.9.0.tar.xz
  # dm  fedora:33 fedora:rawhide
     1   107.80 fedora:33         : Ok   gcc (GCC) 10.2.1 20201005 (Red Hat 10.2.1-5), clang version 11.0.0 (Fedora 11.0.0-1.fc33)
     2    92.47 fedora:rawhide    : Ok   gcc (GCC) 10.2.1 20201016 (Red Hat 10.2.1-6), clang version 11.0.0 (Fedora 11.0.0-1.fc34)
  #

Avoid that by ditching that 'initfunc' function pointer with its:

    #define Py_EXPORTED_SYMBOL _attribute_ ((visibility ("default")))
    #define PyMODINIT_FUNC Py_EXPORTED_SYMBOL PyObject*

And just call PyImport_AppendInittab() at the end of the ifdef python3
block with the functions that were being attributed to that initfunc.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-03 08:32:43 -03:00
Peter Zijlstra 9ae1e990f1 perf tools: Remove broken __no_tail_call attribute
The GCC specific __attribute__((optimize)) attribute does not what is
commonly expected and is explicitly recommended against using in
production code by the GCC people.

Unlike what is often expected, it doesn't add to the optimization flags,
but it fully replaces them, loosing any and all optimization flags
provided by the compiler commandline.

The only guaranteed upon means of inhibiting tail-calls is by placing a
volatile asm with side-effects after the call such that the tail-call simply
cannot be done.

Given the original commit wasn't specific on which calls were the problem, this
removal might re-introduce the problem, which can then be re-analyzed and cured
properly.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20201028081123.GT2628@hirez.programming.kicks-ass.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-03 08:32:15 -03:00
Jin Yao 0dfbe4c646 perf vendor events: Fix DRAM_BW_Use 0 issue for CLX/SKX
Ian reports an issue that the metric DRAM_BW_Use often remains 0.

The metric expression for DRAM_BW_Use on CLX/SKX:

"( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time"

The counts of uncore_imc/cas_count_read/ and uncore_imc/cas_count_write/
are scaled up by 64, that is to turn a count of cache lines into bytes,
the count is then divided by 1000000000 to give GB.

However, the counts of uncore_imc/cas_count_read/ and
uncore_imc/cas_count_write/ have been scaled yet.

The scale values are from sysfs, such as
/sys/devices/uncore_imc_0/events/cas_count_read.scale.
It's 6.103515625e-5 (64 / 1024.0 / 1024.0).

So if we use original metric expression, the result is not correct.

But the difficulty is, for SKL client, the counts are not scaled.

The metric expression for DRAM_BW_Use on SKL:

"64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000"

root@kbl-ppc:~# perf stat -M DRAM_BW_Use -a -- sleep 1

 Performance counter stats for 'system wide':

               190      arb/event=0x84,umask=0x1/ #     1.86 DRAM_BW_Use
        29,093,178      arb/event=0x81,umask=0x1/
     1,000,703,287 ns   duration_time

       1.000703287 seconds time elapsed

The result is expected.

So the easy way is just change the metric expression for CLX/SKX.
This patch changes the metric expression to:

"( ( ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) * 1048576 ) / 1000000000 ) / duration_time"

1048576 = 1024 * 1024.

Before (tested on CLX):

root@lkp-csl-2sp5 ~# perf stat -M DRAM_BW_Use -a -- sleep 1

 Performance counter stats for 'system wide':

            765.35 MiB  uncore_imc/cas_count_read/ #     0.00 DRAM_BW_Use
              5.42 MiB  uncore_imc/cas_count_write/
        1001515088 ns   duration_time

       1.001515088 seconds time elapsed

After:

root@lkp-csl-2sp5 ~# perf stat -M DRAM_BW_Use -a -- sleep 1

 Performance counter stats for 'system wide':

            767.95 MiB  uncore_imc/cas_count_read/ #     0.80 DRAM_BW_Use
              5.02 MiB  uncore_imc/cas_count_write/
        1001900010 ns   duration_time

       1.001900010 seconds time elapsed

Fixes: 038d3b53c2 ("perf vendor events intel: Update CascadelakeX events to v1.08")
Fixes: b5ff7f2799 ("perf vendor events: Update SkylakeX events to v1.21")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20201023005334.7869-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-03 08:31:31 -03:00
Stanislav Ivanichkin a6293f36ac perf trace: Fix segfault when trying to trace events by cgroup
# ./perf trace -e sched:sched_switch -G test -a sleep 1
  perf: Segmentation fault
  Obtained 11 stack frames.
  ./perf(sighandler_dump_stack+0x43) [0x55cfdc636db3]
  /lib/x86_64-linux-gnu/libc.so.6(+0x3efcf) [0x7fd23eecafcf]
  ./perf(parse_cgroups+0x36) [0x55cfdc673f36]
  ./perf(+0x3186ed) [0x55cfdc70d6ed]
  ./perf(parse_options_subcommand+0x629) [0x55cfdc70e999]
  ./perf(cmd_trace+0x9c2) [0x55cfdc5ad6d2]
  ./perf(+0x1e8ae0) [0x55cfdc5ddae0]
  ./perf(+0x1e8ded) [0x55cfdc5ddded]
  ./perf(main+0x370) [0x55cfdc556f00]
  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x7fd23eeadb96]
  ./perf(_start+0x29) [0x55cfdc557389]
  Segmentation fault
  #

 It happens because "struct trace" in option->value is passed to the
 parse_cgroups function instead of "struct evlist".

Fixes: 9ea42ba441 ("perf trace: Support setting cgroups as targets")
Signed-off-by: Stanislav Ivanichkin <sivanichkin@yandex-team.ru>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Link: http://lore.kernel.org/lkml/20201027094357.94881-1-sivanichkin@yandex-team.ru
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-03 08:31:03 -03:00
Tommi Rantala ab8bf5f2e0 perf tools: Fix crash with non-jited bpf progs
The addr in PERF_RECORD_KSYMBOL events for non-jited bpf progs points to
the bpf interpreter, ie. within kernel text section. When processing the
unregister event, this causes unexpected removal of vmlinux_map,
crashing perf later in cleanup:

  # perf record -- timeout --signal=INT 2s /usr/share/bcc/tools/execsnoop
  PCOMM            PID    PPID   RET ARGS
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.208 MB perf.data (5155 samples) ]
  perf: tools/include/linux/refcount.h:131: refcount_sub_and_test: Assertion `!(new > val)' failed.
  Aborted (core dumped)

  # perf script -D|grep KSYM
  0 0xa40 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_f958f6eb72ef5af6
  0 0xab0 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_8c42dee26e8cd4c2
  0 0xb20 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_f958f6eb72ef5af6
  108563691893 0x33d98 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3b0 len 0 type 1 flags 0x0 name bpf_prog_bc5697a410556fc2_syscall__execve
  108568518458 0x34098 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3f0 len 0 type 1 flags 0x0 name bpf_prog_45e2203c2928704d_do_ret_sys_execve
  109301967895 0x34830 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3b0 len 0 type 1 flags 0x1 name bpf_prog_bc5697a410556fc2_syscall__execve
  109302007356 0x348b0 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3f0 len 0 type 1 flags 0x1 name bpf_prog_45e2203c2928704d_do_ret_sys_execve
  perf: tools/include/linux/refcount.h:131: refcount_sub_and_test: Assertion `!(new > val)' failed.

Here the addresses match the bpf interpreter:

  # grep -e ffffffffa9b6b530 -e ffffffffa9b6b3b0 -e ffffffffa9b6b3f0 /proc/kallsyms
  ffffffffa9b6b3b0 t __bpf_prog_run224
  ffffffffa9b6b3f0 t __bpf_prog_run192
  ffffffffa9b6b530 t __bpf_prog_run32

Fix by not allowing vmlinux_map to be removed by PERF_RECORD_KSYMBOL
unregister event.

Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201016114718.54332-1-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-03 08:30:34 -03:00
Arnaldo Carvalho de Melo 263e452eff tools headers UAPI: Update process_madvise affected files
To pick the changes from:

  ecb8ac8b1f ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")

That addresses these perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
  diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
  Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
  diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-03 08:29:30 -03:00
Arnaldo Carvalho de Melo e555b4b8d7 perf tools: Update copy of libbpf's hashmap.c
To pick the changes in:

  85367030a6 ("libbpf: Centralize poisoning and poison reallocarray()")
  7d9c71e10b ("libbpf: Extract generic string hashing function for reuse")

That don't entail any changes in tools/perf.

This addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/perf/util/hashmap.h' differs from latest version at 'tools/lib/bpf/hashmap.h'
  diff -u tools/perf/util/hashmap.h tools/lib/bpf/hashmap.h

Not a kernel ABI, its just that this uses the mechanism in place for
checking kernel ABI files drift.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-03 08:26:55 -03:00
Justin M. Forbes b773ea6505 perf tools: Remove LTO compiler options when building perl support
To avoid breaking the build by mixing files compiled with things coming
from distro specific compiler options for perl with the rest of perf,
i.e. to avoid this:

  `.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of /tmp/build/perf/util/scripting-engines/perf-in.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.stdcpredef.h.19.8dc41bed5d9037ff9622e015fb5f0ce3]' of /tmp/build/perf/util/scripting-engines/perf-in.o

Noticed on Fedora 33.

Signed-off-by: Justin M. Forbes <jforbes@fedoraproject.org>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1593431
Cc: Jiri Olsa <jolsa@redhat.com>
Link: 589a32b62f?branch=master
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-03 08:24:54 -03:00
Linus Torvalds 9d9af1007b perf tools changes for v5.10: 1st batch
- cgroup improvements for 'perf stat', allowing for compact specification of events
   and cgroups in the command line.
 
 - Support per thread topdown metrics in 'perf stat'.
 
 - Support sample-read topdown metric group in 'perf record'
 
 - Show start of latency in addition to its start in 'perf sched latency'.
 
 - Add min, max to 'perf script' futex-contention output, in addition to avg.
 
 - Allow usage of 'perf_event_attr->exclusive' attribute via the new ':e' event
   modifier.
 
 - Add 'snapshot' command to 'perf record --control', using it with Intel PT.
 
 - Support FIFO file names as alternative options to 'perf record --control'.
 
 - Introduce branch history "streams", to compare 'perf record' runs with
   'perf diff' based on branch records and report hot streams.
 
 - Support PE executable symbol tables using libbfd, to profile, for instance, wine binaries.
 
 - Add filter support for option 'perf ftrace -F/--funcs'.
 
 - Allow configuring the 'disassembler_style' 'perf annotate' knob via 'perf config'
 
 - Update CascadelakeX and SkylakeX JSON vendor events files.
 
 - Add support for parsing perchip/percore JSON vendor events.
 
 - Add power9 hv_24x7 core level metric events.
 
 - Add L2 prefetch, ITLB instruction fetch hits JSON events for AMD zen1.
 
 - Enable Family 19h users by matching Zen2 AMD vendor events.
 
 - Use debuginfod in 'perf probe' when required debug files not found locally.
 
 - Display negative tid in non-sample events in 'perf script'.
 
 - Make GTK2 support opt-in
 
 - Add build test with GTK+
 
 - Add missing -lzstd to the fast path feature detection
 
 - Add scripts to auto generate 'mmap', 'mremap' string<->id tables for use in 'perf trace'.
 
 - Show python test script in verbose mode.
 
 - Fix uncore metric expressions
 
 - Msan uninitialized use fixes.
 
 - Use condition variables in 'perf bench numa'
 
 - Autodetect python3 binary in systems without python2.
 
 - Support md5 build ids in addition to sha1.
 
 - Add build id 'perf test' regression test.
 
 - Fix printable strings in python3 scripts.
 
 - Fix off by ones in 'perf trace' in arches using libaudit.
 
 - Fix JSON event code for events referencing std arch events.
 
 - Introduce 'perf test' shell script for Arm CoreSight testing.
 
 - Add rdtsc() for Arm64 for used in the PERF_RECORD_TIME_CONV metadata
   event and in 'perf test tsc'.
 
 - 'perf c2c' improvements: Add "RMT Load Hit" metric, "Total Stores", fixes
   and documentation update.
 
 - Fix usage of reloc_sym in 'perf probe' when using both kallsyms and debuginfo files.
 
 - Do not print 'Metric Groups:' unnecessarily in 'perf list'
 
 - Refcounting fixes in the event parsing code.
 
 - Add expand cgroup event 'perf test' entry.
 
 - Fix out of bounds CPU map access when handling armv8_pmu events in 'perf stat'.
 
 - Add build-id injection 'perf bench' benchmark.
 
 - Enter namespace when reading build-id in 'perf inject'.
 
 - Do not load map/dso when injecting build-id speeding up the 'perf inject' process.
 
 - Add --buildid-all option to avoid processing all samples, just the mmap metadata events.
 
 - Add feature test to check if libbfd has buildid support
 
 - Add 'perf test' entry for PE binary format support.
 
 - Fix typos in power8 PMU vendor events JSON files.
 
 - Hide libtraceevent non API functions.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 
 Test results:
 
 The first ones are container based builds of tools/perf with and without libelf
 support.  Where clang is available, it is also used to build perf with/without
 libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang
 when clang and its devel libraries are installed.
 
 The objtool and samples/bpf/ builds are disabled now that I'm switching from
 using the sources in a local volume to fetching them from a http server to
 build it inside the container, to make it easier to build in a container cluster.
 Those will come back later.
 
 Several are cross builds, the ones with -x-ARCH and the android one, and those
 may not have all the features built, due to lack of multi-arch devel packages,
 available and being used so far on just a few, like
 debian:experimental-x-{arm64,mipsel}.
 
 The 'perf test' one will perform a variety of tests exercising
 tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
 with a variety of command line event specifications to then intercept the
 sys_perf_event syscall to check that the perf_event_attr fields are set up as
 expected, among a variety of other unit tests.
 
 Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
 with a variety of feature sets, exercising the build with an incomplete set of
 features as well as with a complete one. It is planned to have it run on each
 of the containers mentioned above, using some container orchestration
 infrastructure. Get in contact if interested in helping having this in place.
 
   $ grep "model name" -m1 /proc/cpuinfo
   model name: AMD Ryzen 9 3900X 12-Core Processor
   $ export PERF_TARBALL=http://192.168.122.1/perf/perf-5.9.0-rc7.tar.xz
   $ dm
   Thu 15 Oct 2020 01:10:56 PM -03
    1    67.40 alpine:3.4                    : Ok   gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final)
    2    69.01 alpine:3.5                    : Ok   gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final)
    3    70.79 alpine:3.6                    : Ok   gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final)
    4    79.89 alpine:3.7                    : Ok   gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0)
    5    80.88 alpine:3.8                    : Ok   gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1)
    6    83.88 alpine:3.9                    : Ok   gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1)
    7   107.87 alpine:3.10                   : Ok   gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0)
    8   115.43 alpine:3.11                   : Ok   gcc (Alpine 9.3.0) 9.3.0, Alpine clang version 9.0.0 (https://git.alpinelinux.org/aports f7f0d2c2b8bcd6a5843401a9a702029556492689) (based on LLVM 9.0.0)
    9   106.80 alpine:3.12                   : Ok   gcc (Alpine 9.3.0) 9.3.0, Alpine clang version 10.0.0 (https://gitlab.alpinelinux.org/alpine/aports.git 7445adce501f8473efdb93b17b5eaf2f1445ed4c)
   10   114.06 alpine:edge                   : Ok   gcc (Alpine 10.2.0) 10.2.0, Alpine clang version 10.0.1
   11    70.42 alt:p8                        : Ok   x86_64-alt-linux-gcc (GCC) 5.3.1 20151207 (ALT p8 5.3.1-alt3.M80P.1), clang version 3.8.0 (tags/RELEASE_380/final)
   12    98.70 alt:p9                        : Ok   x86_64-alt-linux-gcc (GCC) 8.4.1 20200305 (ALT p9 8.4.1-alt0.p9.1), clang version 10.0.0
   13    80.37 alt:sisyphus                  : Ok   x86_64-alt-linux-gcc (GCC) 9.3.1 20200518 (ALT Sisyphus 9.3.1-alt1), clang version 10.0.1
   14    64.12 amazonlinux:1                 : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final)
   15    97.64 amazonlinux:2                 : Ok   gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-9), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2)
   16    22.70 android-ndk:r12b-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   17    22.72 android-ndk:r15c-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   18    26.70 centos:6                      : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
   19    31.86 centos:7                      : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
   20   113.19 centos:8                      : Ok   gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), clang version 9.0.1 (Red Hat 9.0.1-2.module_el8.2.0+309+0c7b6b03)
   21    57.23 clearlinux:latest             : Ok   gcc (Clear Linux OS for Intel Architecture) 10.2.1 20200908 releases/gcc-10.2.0-203-g127d693955, clang version 10.0.1
   22    64.98 debian:8                      : Ok   gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0)
   23    76.08 debian:9                      : Ok   gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final)
   24    74.49 debian:10                     : Ok   gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8+deb10u2 (tags/RELEASE_701/final)
   25    78.50 debian:experimental           : Ok   gcc (Debian 10.2.0-15) 10.2.0, Debian clang version 11.0.0-2
   26    33.30 debian:experimental-x-arm64   : Ok   aarch64-linux-gnu-gcc (Debian 10.2.0-3) 10.2.0
   27    30.96 debian:experimental-x-mips64  : Ok   mips64-linux-gnuabi64-gcc (Debian 9.3.0-8) 9.3.0
   28    32.63 debian:experimental-x-mipsel  : Ok   mipsel-linux-gnu-gcc (Debian 9.3.0-8) 9.3.0
   29    30.12 fedora:20                     : Ok   gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
   30    30.99 fedora:22                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final)
   31    68.60 fedora:23                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final)
   32    78.92 fedora:24                     : Ok   gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final)
   33    26.15 fedora:24-x-ARC-uClibc        : Ok   arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
   34    80.13 fedora:25                     : Ok   gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final)
   35    90.68 fedora:26                     : Ok   gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final)
   36    90.45 fedora:27                     : Ok   gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final)
   37   100.88 fedora:28                     : Ok   gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final)
   38   105.99 fedora:29                     : Ok   gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29)
   39   111.05 fedora:30                     : Ok   gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc30)
   40    29.96 fedora:30-x-ARC-glibc         : Ok   arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225
   41    27.02 fedora:30-x-ARC-uClibc        : Ok   arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225
   42   110.47 fedora:31                     : Ok   gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2), clang version 9.0.1 (Fedora 9.0.1-2.fc31)
   43    88.78 fedora:32                     : Ok   gcc (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1), clang version 10.0.0 (Fedora 10.0.0-2.fc32)
   44    15.92 fedora:rawhide                : FAIL gcc (GCC) 10.2.1 20200916 (Red Hat 10.2.1-4), clang version 11.0.0 (Fedora 11.0.0-0.4.rc3.fc34)
   45    33.58 gentoo-stage3-amd64:latest    : Ok   gcc (Gentoo 9.3.0-r1 p3) 9.3.0
   46    65.32 mageia:5                      : Ok   gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final)
   47    81.35 mageia:6                      : Ok   gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final)
   48   103.94 mageia:7                      : Ok   gcc (Mageia 8.4.0-1.mga7) 8.4.0, clang version 8.0.0 (Mageia 8.0.0-1.mga7)
   49    91.62 manjaro:latest                : Ok   gcc (GCC) 10.2.0, clang version 10.0.1
   50   219.87 openmandriva:cooker           : Ok   gcc (GCC) 10.2.0 20200723 (OpenMandriva), OpenMandriva 11.0.0-0.20200909.1 clang version 11.0.0 (/builddir/build/BUILD/llvm-project-release-11.x/clang 5cb8ffbab42358a7cdb0a67acfadb84df0779579)
   51   111.76 opensuse:15.0                 : Ok   gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 5.0.1 (tags/RELEASE_501/final 312548)
   52   118.03 opensuse:15.1                 : Ok   gcc (SUSE Linux) 7.5.0, clang version 7.0.1 (tags/RELEASE_701/final 349238)
   53   107.91 opensuse:15.2                 : Ok   gcc (SUSE Linux) 7.5.0, clang version 9.0.1
   54   102.34 opensuse:tumbleweed           : Ok   gcc (SUSE Linux) 10.2.1 20200825 [revision c0746a1beb1ba073c7981eb09f55b3d993b32e5c], clang version 10.0.1
   55    25.33 oraclelinux:6                 : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1)
   56    30.45 oraclelinux:7                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44.0.3)
   57   104.65 oraclelinux:8                 : Ok   gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5.0.3), clang version 9.0.1 (Red Hat 9.0.1-2.0.1.module+el8.2.0+5599+9ed9ef6d)
   58    26.04 ubuntu:12.04                  : Ok   gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0)
   59    29.49 ubuntu:14.04                  : Ok   gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4
   60    72.95 ubuntu:16.04                  : Ok   gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
   61    26.03 ubuntu:16.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   62    25.15 ubuntu:16.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   63    24.88 ubuntu:16.04-x-powerpc        : Ok   powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   64    25.72 ubuntu:16.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   65    25.39 ubuntu:16.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   66    25.34 ubuntu:16.04-x-s390           : Ok   s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
   67    84.84 ubuntu:18.04                  : Ok   gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
   68    27.15 ubuntu:18.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
   69    26.68 ubuntu:18.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
   70    22.38 ubuntu:18.04-x-m68k           : Ok   m68k-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   71    26.35 ubuntu:18.04-x-powerpc        : Ok   powerpc-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   72    28.58 ubuntu:18.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   73    28.18 ubuntu:18.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   74   178.55 ubuntu:18.04-x-riscv64        : Ok   riscv64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   75    24.58 ubuntu:18.04-x-s390           : Ok   s390x-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   76    26.89 ubuntu:18.04-x-sh4            : Ok   sh4-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   77    24.81 ubuntu:18.04-x-sparc64        : Ok   sparc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
   78    68.90 ubuntu:19.10                  : Ok   gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 8.0.1-3build1 (tags/RELEASE_801/final)
   79    69.31 ubuntu:20.04                  : Ok   gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0, clang version 10.0.0-4ubuntu1
   80    30.00 ubuntu:20.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu 10-20200411-0ubuntu1) 10.0.1 20200411 (experimental) [master revision bb87d5cc77d:75961caccb7:f883c46b4877f637e0fa5025b4d6b5c9040ec566]
   81    70.34 ubuntu:20.10                  : Ok   gcc (Ubuntu 10.2.0-5ubuntu2) 10.2.0, Ubuntu clang version 10.0.1-1
   $
 
   # uname -a
   Linux five 5.9.0+ #1 SMP Thu Oct 15 09:06:41 -03 2020 x86_64 x86_64 x86_64 GNU/Linux
   # git log --oneline -1
   744aec4df2 perf c2c: Update documentation for metrics reorganization
   # perf version --build-options
   perf version 5.9.rc7.g744aec4df2c5
                    dwarf: [ on  ]  # HAVE_DWARF_SUPPORT
       dwarf_getlocations: [ on  ]  # HAVE_DWARF_GETLOCATIONS_SUPPORT
                    glibc: [ on  ]  # HAVE_GLIBC_SUPPORT
            syscall_table: [ on  ]  # HAVE_SYSCALL_TABLE_SUPPORT
                   libbfd: [ on  ]  # HAVE_LIBBFD_SUPPORT
                   libelf: [ on  ]  # HAVE_LIBELF_SUPPORT
                  libnuma: [ on  ]  # HAVE_LIBNUMA_SUPPORT
   numa_num_possible_cpus: [ on  ]  # HAVE_LIBNUMA_SUPPORT
                  libperl: [ on  ]  # HAVE_LIBPERL_SUPPORT
                libpython: [ on  ]  # HAVE_LIBPYTHON_SUPPORT
                 libslang: [ on  ]  # HAVE_SLANG_SUPPORT
                libcrypto: [ on  ]  # HAVE_LIBCRYPTO_SUPPORT
                libunwind: [ on  ]  # HAVE_LIBUNWIND_SUPPORT
       libdw-dwarf-unwind: [ on  ]  # HAVE_DWARF_SUPPORT
                     zlib: [ on  ]  # HAVE_ZLIB_SUPPORT
                     lzma: [ on  ]  # HAVE_LZMA_SUPPORT
                get_cpuid: [ on  ]  # HAVE_AUXTRACE_SUPPORT
                      bpf: [ on  ]  # HAVE_LIBBPF_SUPPORT
                      aio: [ on  ]  # HAVE_AIO_SUPPORT
                     zstd: [ on  ]  # HAVE_ZSTD_SUPPORT
   # perf test
    1: vmlinux symtab matches kallsyms                                 : Ok
    2: Detect openat syscall event                                     : Ok
    3: Detect openat syscall event on all cpus                         : Ok
    4: Read samples using the mmap interface                           : Ok
    5: Test data source output                                         : Ok
    6: Parse event definition strings                                  : Ok
    7: Simple expression parser                                        : Ok
    8: PERF_RECORD_* events & perf_sample fields                       : Ok
    9: Parse perf pmu format                                           : Ok
   10: PMU events                                                      :
   10.1: PMU event table sanity                                        : Ok
   10.2: PMU event map aliases                                         : Ok
   10.3: Parsing of PMU event table metrics                            : Ok
   10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
   11: DSO data read                                                   : Ok
   12: DSO data cache                                                  : Ok
   13: DSO data reopen                                                 : Ok
   14: Roundtrip evsel->name                                           : Ok
   15: Parse sched tracepoints fields                                  : Ok
   16: syscalls:sys_enter_openat event fields                          : Ok
   17: Setup struct perf_event_attr                                    : Ok
   18: Match and link multiple hists                                   : Ok
   19: 'import perf' in python                                         : Ok
   20: Breakpoint overflow signal handler                              : Ok
   21: Breakpoint overflow sampling                                    : Ok
   22: Breakpoint accounting                                           : Ok
   23: Watchpoint                                                      :
   23.1: Read Only Watchpoint                                          : Skip
   23.2: Write Only Watchpoint                                         : Ok
   23.3: Read / Write Watchpoint                                       : Ok
   23.4: Modify Watchpoint                                             : Ok
   24: Number of exit events of a simple workload                      : Ok
   25: Software clock events period values                             : Ok
   26: Object code reading                                             : Ok
   27: Sample parsing                                                  : Ok
   28: Use a dummy software event to keep tracking                     : Ok
   29: Parse with no sample_id_all bit set                             : Ok
   30: Filter hist entries                                             : Ok
   31: Lookup mmap thread                                              : Ok
   32: Share thread maps                                               : Ok
   33: Sort output of hist entries                                     : Ok
   34: Cumulate child hist entries                                     : Ok
   35: Track with sched_switch                                         : Ok
   36: Filter fds with revents mask in a fdarray                       : Ok
   37: Add fd to a fdarray, making it autogrow                         : Ok
   38: kmod_path__parse                                                : Ok
   39: Thread map                                                      : Ok
   40: LLVM search and compile                                         :
   40.1: Basic BPF llvm compile                                        : Ok
   40.2: kbuild searching                                              : Ok
   40.3: Compile source for BPF prologue generation                    : Ok
   40.4: Compile source for BPF relocation                             : Ok
   41: Session topology                                                : Ok
   42: BPF filter                                                      :
   42.1: Basic BPF filtering                                           : Ok
   42.2: BPF pinning                                                   : Ok
   42.3: BPF prologue generation                                       : Ok
   42.4: BPF relocation checker                                        : Ok
   43: Synthesize thread map                                           : Ok
   44: Remove thread map                                               : Ok
   45: Synthesize cpu map                                              : Ok
   46: Synthesize stat config                                          : Ok
   47: Synthesize stat                                                 : Ok
   48: Synthesize stat round                                           : Ok
   49: Synthesize attr update                                          : Ok
   50: Event times                                                     : Ok
   51: Read backward ring buffer                                       : Ok
   52: Print cpu map                                                   : Ok
   53: Merge cpu map                                                   : Ok
   54: Probe SDT events                                                : Ok
   55: is_printable_array                                              : Ok
   56: Print bitmap                                                    : Ok
   57: perf hooks                                                      : Ok
   58: builtin clang support                                           : Skip (not compiled in)
   59: unit_number__scnprintf                                          : Ok
   60: mem2node                                                        : Ok
   61: time utils                                                      : Ok
   62: Test jit_write_elf                                              : Ok
   63: Test libpfm4 support                                            : Skip (not compiled in)
   64: Test api io                                                     : Ok
   65: maps__merge_in                                                  : Ok
   66: Demangle Java                                                   : Ok
   67: Parse and process metrics                                       : Ok
   68: PE file support                                                 : Ok
   69: Event expansion for cgroups                                     : Ok
   70: x86 rdpmc                                                       : Ok
   71: Convert perf time to TSC                                        : Ok
   72: DWARF unwind                                                    : Ok
   73: x86 instruction decoder - new instructions                      : Ok
   74: Intel PT packet decoder                                         : Ok
   75: x86 bp modify                                                   : Ok
   76: probe libc's inet_pton & backtrace it with ping                 : Ok
   77: Check Arm CoreSight trace data recording and synthesized samples: Skip
   78: Use vfs_getname probe to get syscall args filenames             : Ok
   79: Check open filename arg using perf trace + vfs_getname          : Ok
   80: Zstd perf.data compression/decompression                        : Ok
   81: Add vfs_getname probe to get syscall args filenames             : Ok
   82: build id cache operations                                       : Ok
   #
 
   $ git log --oneline -1
   744aec4df2 (HEAD -> perf/core, quaco/perf/core) perf c2c: Update documentation for metrics reorganization
   $ make -C tools/perf build-test
   make: Entering directory '/home/acme/git/perf/tools/perf'
   - tarpkg: ./tests/perf-targz-src-pkg .
             make_install_bin_O: make install-bin
                  make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1
   make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
                 make_no_newt_O: make NO_NEWT=1
            make_no_libbionic_O: make NO_LIBBIONIC=1
                  make_no_sdt_O: make NO_SDT=1
                   make_debug_O: make DEBUG=1
                  make_perf_o_O: make perf.o
               make_no_libbpf_O: make NO_LIBBPF=1
         make_no_libbpf_DEBUG_O: make NO_LIBBPF=1 DEBUG=1
               make_clean_all_O: make clean all
                    make_tags_O: make tags
         make_with_babeltrace_O: make LIBBABELTRACE=1
          make_with_clangllvm_O: make LIBCLANGLLVM=1
              make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
               make_no_libelf_O: make NO_LIBELF=1
            make_no_libcrypto_O: make NO_LIBCRYPTO=1
            make_with_libpfm4_O: make LIBPFM4=1
            make_no_libunwind_O: make NO_LIBUNWIND=1
              make_util_map_o_O: make util/map.o
                make_no_slang_O: make NO_SLANG=1
               make_with_gtk2_O: make GTK2=1
                   make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
        make_util_pmu_bison_o_O: make util/pmu-bison.o
            make_no_backtrace_O: make NO_BACKTRACE=1
             make_no_demangle_O: make NO_DEMANGLE=1
                    make_help_O: make help
                    make_pure_O: make
                 make_no_gtk2_O: make NO_GTK2=1
          make_install_prefix_O: make install prefix=/tmp/krava
              make_no_libnuma_O: make NO_LIBNUMA=1
            make_no_libpython_O: make NO_LIBPYTHON=1
    make_install_prefix_slash_O: make install prefix=/tmp/krava/
             make_no_libaudit_O: make NO_LIBAUDIT=1
             make_no_auxtrace_O: make NO_AUXTRACE=1
                 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 NO_SYSCALL_TABLE=1
                 make_install_O: make install
                     make_doc_O: make doc
              make_no_libperl_O: make NO_LIBPERL=1
          make_no_syscall_tbl_O: make NO_SYSCALL_TABLE=1
   OK
   make: Leaving directory '/home/acme/git/perf/tools/perf'
   $
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCX4iuzgAKCRCyPKLppCJ+
 J1khAP4iMQMFCMpNsBaL6KLtj3aTOhrooYuhbNL3kajqYVyW/QD8Dws35k6m2+tB
 tcOMJykFjPkQ4I13zsxKyugeJuUzSQw=
 =KdSj
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.10-2020-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools updates from Arnaldo Carvalho de Melo:

 - cgroup improvements for 'perf stat', allowing for compact
   specification of events and cgroups in the command line.

 - Support per thread topdown metrics in 'perf stat'.

 - Support sample-read topdown metric group in 'perf record'

 - Show start of latency in addition to its start in 'perf sched
   latency'.

 - Add min, max to 'perf script' futex-contention output, in addition to
   avg.

 - Allow usage of 'perf_event_attr->exclusive' attribute via the new
   ':e' event modifier.

 - Add 'snapshot' command to 'perf record --control', using it with
   Intel PT.

 - Support FIFO file names as alternative options to 'perf record
   --control'.

 - Introduce branch history "streams", to compare 'perf record' runs
   with 'perf diff' based on branch records and report hot streams.

 - Support PE executable symbol tables using libbfd, to profile, for
   instance, wine binaries.

 - Add filter support for option 'perf ftrace -F/--funcs'.

 - Allow configuring the 'disassembler_style' 'perf annotate' knob via
   'perf config'

 - Update CascadelakeX and SkylakeX JSON vendor events files.

 - Add support for parsing perchip/percore JSON vendor events.

 - Add power9 hv_24x7 core level metric events.

 - Add L2 prefetch, ITLB instruction fetch hits JSON events for AMD
   zen1.

 - Enable Family 19h users by matching Zen2 AMD vendor events.

 - Use debuginfod in 'perf probe' when required debug files not found
   locally.

 - Display negative tid in non-sample events in 'perf script'.

 - Make GTK2 support opt-in

 - Add build test with GTK+

 - Add missing -lzstd to the fast path feature detection

 - Add scripts to auto generate 'mmap', 'mremap' string<->id tables for
   use in 'perf trace'.

 - Show python test script in verbose mode.

 - Fix uncore metric expressions

 - Msan uninitialized use fixes.

 - Use condition variables in 'perf bench numa'

 - Autodetect python3 binary in systems without python2.

 - Support md5 build ids in addition to sha1.

 - Add build id 'perf test' regression test.

 - Fix printable strings in python3 scripts.

 - Fix off by ones in 'perf trace' in arches using libaudit.

 - Fix JSON event code for events referencing std arch events.

 - Introduce 'perf test' shell script for Arm CoreSight testing.

 - Add rdtsc() for Arm64 for used in the PERF_RECORD_TIME_CONV metadata
   event and in 'perf test tsc'.

 - 'perf c2c' improvements: Add "RMT Load Hit" metric, "Total Stores",
   fixes and documentation update.

 - Fix usage of reloc_sym in 'perf probe' when using both kallsyms and
   debuginfo files.

 - Do not print 'Metric Groups:' unnecessarily in 'perf list'

 - Refcounting fixes in the event parsing code.

 - Add expand cgroup event 'perf test' entry.

 - Fix out of bounds CPU map access when handling armv8_pmu events in
   'perf stat'.

 - Add build-id injection 'perf bench' benchmark.

 - Enter namespace when reading build-id in 'perf inject'.

 - Do not load map/dso when injecting build-id speeding up the 'perf
   inject' process.

 - Add --buildid-all option to avoid processing all samples, just the
   mmap metadata events.

 - Add feature test to check if libbfd has buildid support

 - Add 'perf test' entry for PE binary format support.

 - Fix typos in power8 PMU vendor events JSON files.

 - Hide libtraceevent non API functions.

* tag 'perf-tools-for-v5.10-2020-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (113 commits)
  perf c2c: Update documentation for metrics reorganization
  perf c2c: Add metrics "RMT Load Hit"
  perf c2c: Correct LLC load hit metrics
  perf c2c: Change header for LLC local hit
  perf c2c: Use more explicit headers for HITM
  perf c2c: Change header from "LLC Load Hitm" to "Load Hitm"
  perf c2c: Organize metrics based on memory hierarchy
  perf c2c: Display "Total Stores" as a standalone metrics
  perf c2c: Display the total numbers continuously
  perf bench: Use condition variables in numa.
  perf jevents: Fix event code for events referencing std arch events
  perf diff: Support hot streams comparison
  perf streams: Report hot streams
  perf streams: Calculate the sum of total streams hits
  perf streams: Link stream pair
  perf streams: Compare two streams
  perf streams: Get the evsel_streams by evsel_idx
  perf streams: Introduce branch history "streams"
  perf intel-pt: Improve PT documentation slightly
  perf tools: Add support for exclusive groups/events
  ...
2020-10-17 11:47:46 -07:00
Linus Torvalds 9ff9b0d392 networking changes for the 5.10 merge window
Add redirect_neigh() BPF packet redirect helper, allowing to limit stack
 traversal in common container configs and improving TCP back-pressure.
 Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
 
 Expand netlink policy support and improve policy export to user space.
 (Ge)netlink core performs request validation according to declared
 policies. Expand the expressiveness of those policies (min/max length
 and bitmasks). Allow dumping policies for particular commands.
 This is used for feature discovery by user space (instead of kernel
 version parsing or trial and error).
 
 Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge.
 
 Allow more than 255 IPv4 multicast interfaces.
 
 Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
 packets of TCPv6.
 
 In Multi-patch TCP (MPTCP) support concurrent transmission of data
 on multiple subflows in a load balancing scenario. Enhance advertising
 addresses via the RM_ADDR/ADD_ADDR options.
 
 Support SMC-Dv2 version of SMC, which enables multi-subnet deployments.
 
 Allow more calls to same peer in RxRPC.
 
 Support two new Controller Area Network (CAN) protocols -
 CAN-FD and ISO 15765-2:2016.
 
 Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
 kernel problem.
 
 Add TC actions for implementing MPLS L2 VPNs.
 
 Improve nexthop code - e.g. handle various corner cases when nexthop
 objects are removed from groups better, skip unnecessary notifications
 and make it easier to offload nexthops into HW by converting
 to a blocking notifier.
 
 Support adding and consuming TCP header options by BPF programs,
 opening the doors for easy experimental and deployment-specific
 TCP option use.
 
 Reorganize TCP congestion control (CC) initialization to simplify life
 of TCP CC implemented in BPF.
 
 Add support for shipping BPF programs with the kernel and loading them
 early on boot via the User Mode Driver mechanism, hence reusing all the
 user space infra we have.
 
 Support sleepable BPF programs, initially targeting LSM and tracing.
 
 Add bpf_d_path() helper for returning full path for given 'struct path'.
 
 Make bpf_tail_call compatible with bpf-to-bpf calls.
 
 Allow BPF programs to call map_update_elem on sockmaps.
 
 Add BPF Type Format (BTF) support for type and enum discovery, as
 well as support for using BTF within the kernel itself (current use
 is for pretty printing structures).
 
 Support listing and getting information about bpf_links via the bpf
 syscall.
 
 Enhance kernel interfaces around NIC firmware update. Allow specifying
 overwrite mask to control if settings etc. are reset during update;
 report expected max time operation may take to users; support firmware
 activation without machine reboot incl. limits of how much impact
 reset may have (e.g. dropping link or not).
 
 Extend ethtool configuration interface to report IEEE-standard
 counters, to limit the need for per-vendor logic in user space.
 
 Adopt or extend devlink use for debug, monitoring, fw update
 in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw,
 mv88e6xxx, dpaa2-eth).
 
 In mlxsw expose critical and emergency SFP module temperature alarms.
 Refactor port buffer handling to make the defaults more suitable and
 support setting these values explicitly via the DCBNL interface.
 
 Add XDP support for Intel's igb driver.
 
 Support offloading TC flower classification and filtering rules to
 mscc_ocelot switches.
 
 Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
 fixed interval period pulse generator and one-step timestamping in
 dpaa-eth.
 
 Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
 offload.
 
 Add Lynx PHY/PCS MDIO module, and convert various drivers which have
 this HW to use it. Convert mvpp2 to split PCS.
 
 Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
 7-port Mediatek MT7531 IP.
 
 Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
 and wcn3680 support in wcn36xx.
 
 Improve performance for packets which don't require much offloads
 on recent Mellanox NICs by 20% by making multiple packets share
 a descriptor entry.
 
 Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto
 subtree to drivers/net. Move MDIO drivers out of the phy directory.
 
 Clean up a lot of W=1 warnings, reportedly the actively developed
 subsections of networking drivers should now build W=1 warning free.
 
 Make sure drivers don't use in_interrupt() to dynamically adapt their
 code. Convert tasklets to use new tasklet_setup API (sadly this
 conversion is not yet complete).
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+ItRwACgkQMUZtbf5S
 IrtTMg//UxpdR/MirT1DatBU0K/UGAZY82hV7F/UC8tPgjfHZeHvWlDFxfi3YP81
 PtPKbhRZ7DhwBXefUp6nY3UdvjftrJK2lJm8prJUPSsZRye8Wlcb7y65q7/P2y2U
 Efucyopg6RUrmrM0DUsIGYGJgylQLHnMYUl/keCsD4t5Bp4ksyi9R2t5eitGoWzh
 r3QGdbSa0AuWx4iu0i+tqp6Tj0ekMBMXLVb35dtU1t0joj2KTNEnSgABN3prOa8E
 iWYf2erOau68Ogp3yU3miCy0ZU4p/7qGHTtzbcp677692P/ekak6+zmfHLT9/Pjy
 2Stq2z6GoKuVxdktr91D9pA3jxG4LxSJmr0TImcGnXbvkMP3Ez3g9RrpV5fn8j6F
 mZCH8TKZAoD5aJrAJAMkhZmLYE1pvDa7KolSk8WogXrbCnTEb5Nv8FHTS1Qnk3yl
 wSKXuvutFVNLMEHCnWQLtODbTST9DI/aOi6EctPpuOA/ZyL1v3pl+gfp37S+LUTe
 owMnT/7TdvKaTD0+gIyU53M6rAWTtr5YyRQorX9awIu/4Ha0F0gYD7BJZQUGtegp
 HzKt59NiSrFdbSH7UdyemdBF4LuCgIhS7rgfeoUXMXmuPHq7eHXyHZt5dzPPa/xP
 81P0MAvdpFVwg8ij2yp2sHS7sISIRKq17fd1tIewUabxQbjXqPc=
 =bc1U
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:

 - Add redirect_neigh() BPF packet redirect helper, allowing to limit
   stack traversal in common container configs and improving TCP
   back-pressure.

   Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

 - Expand netlink policy support and improve policy export to user
   space. (Ge)netlink core performs request validation according to
   declared policies. Expand the expressiveness of those policies
   (min/max length and bitmasks). Allow dumping policies for particular
   commands. This is used for feature discovery by user space (instead
   of kernel version parsing or trial and error).

 - Support IGMPv3/MLDv2 multicast listener discovery protocols in
   bridge.

 - Allow more than 255 IPv4 multicast interfaces.

 - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
   packets of TCPv6.

 - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
   multiple subflows in a load balancing scenario. Enhance advertising
   addresses via the RM_ADDR/ADD_ADDR options.

 - Support SMC-Dv2 version of SMC, which enables multi-subnet
   deployments.

 - Allow more calls to same peer in RxRPC.

 - Support two new Controller Area Network (CAN) protocols - CAN-FD and
   ISO 15765-2:2016.

 - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
   kernel problem.

 - Add TC actions for implementing MPLS L2 VPNs.

 - Improve nexthop code - e.g. handle various corner cases when nexthop
   objects are removed from groups better, skip unnecessary
   notifications and make it easier to offload nexthops into HW by
   converting to a blocking notifier.

 - Support adding and consuming TCP header options by BPF programs,
   opening the doors for easy experimental and deployment-specific TCP
   option use.

 - Reorganize TCP congestion control (CC) initialization to simplify
   life of TCP CC implemented in BPF.

 - Add support for shipping BPF programs with the kernel and loading
   them early on boot via the User Mode Driver mechanism, hence reusing
   all the user space infra we have.

 - Support sleepable BPF programs, initially targeting LSM and tracing.

 - Add bpf_d_path() helper for returning full path for given 'struct
   path'.

 - Make bpf_tail_call compatible with bpf-to-bpf calls.

 - Allow BPF programs to call map_update_elem on sockmaps.

 - Add BPF Type Format (BTF) support for type and enum discovery, as
   well as support for using BTF within the kernel itself (current use
   is for pretty printing structures).

 - Support listing and getting information about bpf_links via the bpf
   syscall.

 - Enhance kernel interfaces around NIC firmware update. Allow
   specifying overwrite mask to control if settings etc. are reset
   during update; report expected max time operation may take to users;
   support firmware activation without machine reboot incl. limits of
   how much impact reset may have (e.g. dropping link or not).

 - Extend ethtool configuration interface to report IEEE-standard
   counters, to limit the need for per-vendor logic in user space.

 - Adopt or extend devlink use for debug, monitoring, fw update in many
   drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
   dpaa2-eth).

 - In mlxsw expose critical and emergency SFP module temperature alarms.
   Refactor port buffer handling to make the defaults more suitable and
   support setting these values explicitly via the DCBNL interface.

 - Add XDP support for Intel's igb driver.

 - Support offloading TC flower classification and filtering rules to
   mscc_ocelot switches.

 - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
   fixed interval period pulse generator and one-step timestamping in
   dpaa-eth.

 - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
   offload.

 - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
   this HW to use it. Convert mvpp2 to split PCS.

 - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
   7-port Mediatek MT7531 IP.

 - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
   and wcn3680 support in wcn36xx.

 - Improve performance for packets which don't require much offloads on
   recent Mellanox NICs by 20% by making multiple packets share a
   descriptor entry.

 - Move chelsio inline crypto drivers (for TLS and IPsec) from the
   crypto subtree to drivers/net. Move MDIO drivers out of the phy
   directory.

 - Clean up a lot of W=1 warnings, reportedly the actively developed
   subsections of networking drivers should now build W=1 warning free.

 - Make sure drivers don't use in_interrupt() to dynamically adapt their
   code. Convert tasklets to use new tasklet_setup API (sadly this
   conversion is not yet complete).

* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
  Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
  net, sockmap: Don't call bpf_prog_put() on NULL pointer
  bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
  bpf, sockmap: Add locking annotations to iterator
  netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
  net: fix pos incrementment in ipv6_route_seq_next
  net/smc: fix invalid return code in smcd_new_buf_create()
  net/smc: fix valid DMBE buffer sizes
  net/smc: fix use-after-free of delayed events
  bpfilter: Fix build error with CONFIG_BPFILTER_UMH
  cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
  net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
  bpf: Fix register equivalence tracking.
  rxrpc: Fix loss of final ack on shutdown
  rxrpc: Fix bundle counting for exclusive connections
  netfilter: restore NF_INET_NUMHOOKS
  ibmveth: Identify ingress large send packets.
  ibmveth: Switch order of ibmveth_helper calls.
  cxgb4: handle 4-tuple PEDIT to NAT mode translation
  selftests: Add VRF route leaking tests
  ...
2020-10-15 18:42:13 -07:00
Leo Yan 744aec4df2 perf c2c: Update documentation for metrics reorganization
The output format for metrics has been reorganized, update documentation
to reflect the changes for it.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Al Grant <al.grant@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20201015144548.18482-10-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-15 12:02:12 -03:00
Leo Yan 91d933c221 perf c2c: Add metrics "RMT Load Hit"
The metrics "LLC Ld Miss" and "Load Dram" overlap with each other for
accouting items:

  "LLC Ld Miss" = "lcl_dram" + "rmt_dram" + "rmt_hit" + "rmt_hitm"
  "Load Dram"   = "lcl_dram" + "rmt_dram"

Furthermore, the metrics "LLC Ld Miss" is not directive to show
statistics due to it contains summary value and cannot give out
breakdown details.

For this reason, add a new metrics "RMT Load Hit" which is used to
present the remote cache hit; it contains two items:

  "RMT Load Hit" = remote hit ("rmt_hit") + remote hitm ("rmt_hitm")

As result, the metrics "LLC Ld Miss" is perfectly divided into two
metrics "RMT Load Hit" and "Load Dram".  It's not necessary to keep
metrics "LLC Ld Miss", so remove it.

Before:

  #        ----------- Cacheline ----------      Tot  ------- Load Hitm -------    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --      LLC  --- Load Dram ----
  # Index             Address  Node  PA cnt     Hitm    Total  LclHitm  RmtHitm  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm  Ld Miss       Lcl       Rmt
  # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  .......  ........  ........
  #
        0      0x55f07d580100     0    1499   85.89%      481      481        0     7243     3879     3364     2599      765      548     2615       66       169      481        0         0         0
        1      0x55f07d580080     0       1   13.93%       78       78        0      664      664        0        0        0      187      361       27        11       78        0         0         0
        2      0x55f07d5800c0     0       1    0.18%        1        1        0      405      405        0        0        0      131        0       10       263        1        0         0         0

After:

  #        ----------- Cacheline ----------      Tot  ------- Load Hitm -------    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt     Hitm    Total  LclHitm  RmtHitm  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
  #
        0      0x55f07d580100     0    1499   85.89%      481      481        0     7243     3879     3364     2599      765      548     2615       66       169      481         0        0         0         0
        1      0x55f07d580080     0       1   13.93%       78       78        0      664      664        0        0        0      187      361       27        11       78         0        0         0         0
        2      0x55f07d5800c0     0       1    0.18%        1        1        0      405      405        0        0        0      131        0       10       263        1         0        0         0         0

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Joe Mario <jmario@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20201014050921.5591-9-leo.yan@linaro.org
2020-10-15 09:34:51 -03:00
Leo Yan 77c158698c perf c2c: Correct LLC load hit metrics
"rmt_hit" is accounted into two metrics: one is accounted into the
metrics "LLC Ld Miss" (see the function llc_miss() for calculation
"llcmiss"); and it's accounted into metrics "LLC Load Hit".  Thus,
for the literal meaning, it is contradictory that "rmt_hit" is
accounted for both "LLC Ld Miss" (LLC miss) and "LLC Load Hit"
(LLC hit).

Thus this is easily to introduce confusion: "LLC Load Hit" gives
impression that all items belong to it are LLC hit; in fact "rmt_hit"
is LLC miss and remote cache hit.

To give out clear semantics for metric "LLC Load Hit", "rmt_hit" is
moved out from it and changes "LLC Load Hit" to contain two items:

  LLC Load Hit = LLC's hit ("ld_llchit") + LLC's hitm ("lcl_hitm")

For output alignment, adjusts the header for "LLC Load Hit".

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Joe Mario <jmario@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201014050921.5591-8-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-15 09:34:51 -03:00
Leo Yan ed626a3e52 perf c2c: Change header for LLC local hit
Replace the header string "Lcl" with "LclHit", which is more explicit
to express the event type is LLC local hit.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Joe Mario <jmario@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201014050921.5591-7-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-15 09:34:49 -03:00
Leo Yan 0fbe2fe965 perf c2c: Use more explicit headers for HITM
Local and remote HITM use the headers 'Lcl' and 'Rmt' respectively,
suppose if we want to extend the tool to display these two dimensions
under any one metrics, users cannot understand the semantics if only
based on the header string 'Lcl' or 'Rmt'.

To explicit express the meaning for HITM items, this patch changes the
headers string as "LclHitm" and "RmtHitm", the strings are more readable
and this allows to extend metrics for using HITM items.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Joe Mario <jmario@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201014050921.5591-6-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-15 09:34:47 -03:00
Leo Yan fdd32d7e8e perf c2c: Change header from "LLC Load Hitm" to "Load Hitm"
The metrics "LLC Load Hitm" contains two items: one is "local Hitm" and
another is "remote Hitm".

"local Hitm" means: L3 HIT and was serviced by another processor core
with a cross core snoop where modified copies were found; it's no doubt
that "local Hitm" belongs to LLC access.

But for "remote Hitm", based on the code in util/mem-events, it's the
event for remote cache HIT and was serviced by another processor core
with modified copies.  Thus the remote Hitm is a remote cache's hit and
actually it's LLC load miss.

Now the display format gives users the impression that "local Hitm" and
"remote Hitm" both belong to the LLC load, but this is not the fact as
described.

This patch changes the header from "LLC Load Hitm" to "Load Hitm", this
can avoid the give the wrong impression that all Hitm belong to LLC.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Joe Mario <jmario@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201014050921.5591-5-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-15 09:34:45 -03:00
Leo Yan 6d662d730d perf c2c: Organize metrics based on memory hierarchy
The metrics are not organized based on memory hierarchy, e.g. the tool
doesn't organize the metrics order based on memory nodes from the close
node (e.g. L1/L2 cache) to far node (e.g. L3 cache and DRAM).

To output metrics with more friendly form, this patch refines the
metrics order based on memory hierarchy:

  "Core Load Hit" => "LLC Load Hit" => "LLC Ld Miss" => "Load Dram"

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Joe Mario <jmario@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201014050921.5591-4-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-15 09:34:41 -03:00
Leo Yan 4f28641bde perf c2c: Display "Total Stores" as a standalone metrics
The total stores is displayed under the metrics "Store Reference", to
output the same format with total records and all loads, extract the
total stores number as a standalone metrics "Total Stores".

After this patch, the tool shows the summary numbers ("Total records",
"Total loads", "Total Stores") in the unified form.

Before:

  #        ----------- Cacheline ----------      Tot  ----- LLC Load Hitm -----    Total    Total  ---- Store Reference ----  --- Load Dram ----      LLC  ----- Core Load Hit -----  -- LLC Load Hit --
  # Index             Address  Node  PA cnt     Hitm    Total      Lcl      Rmt  records    Loads    Total    L1Hit   L1Miss       Lcl       Rmt  Ld Miss       FB       L1       L2       Llc       Rmt
  # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  ........  .......  .......  .......  .......  ........  ........
  #
        0      0x55f07d580100     0    1499   85.89%      481      481        0     7243     3879     3364     2599      765         0         0        0      548     2615       66       169         0
        1      0x55f07d580080     0       1   13.93%       78       78        0      664      664        0        0        0         0         0        0      187      361       27        11         0
        2      0x55f07d5800c0     0       1    0.18%        1        1        0      405      405        0        0        0         0         0        0      131        0       10       263         0

After:

  #        ----------- Cacheline ----------      Tot  ----- LLC Load Hitm -----    Total    Total    Total  ---- Stores ----  --- Load Dram ----      LLC  ----- Core Load Hit -----  -- LLC Load Hit --
  # Index             Address  Node  PA cnt     Hitm    Total      Lcl      Rmt  records    Loads   Stores    L1Hit   L1Miss       Lcl       Rmt  Ld Miss       FB       L1       L2       Llc       Rmt
  # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  ........  .......  .......  .......  .......  ........  ........
  #
        0      0x55f07d580100     0    1499   85.89%      481      481        0     7243     3879     3364     2599      765         0         0        0      548     2615       66       169         0
        1      0x55f07d580080     0       1   13.93%       78       78        0      664      664        0        0        0         0         0        0      187      361       27        11         0
        2      0x55f07d5800c0     0       1    0.18%        1        1        0      405      405        0        0        0         0         0        0      131        0       10       263         0

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Joe Mario <jmario@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201014050921.5591-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-15 09:34:36 -03:00
Leo Yan b596e979c8 perf c2c: Display the total numbers continuously
To view the statistics with "breakdown" mode, it's good to show the
summary numbers for the total records, all stores and all loads, then
the sequential conlumns can be used to break into more detailed items.

To achieve this purpose, this patch displays the summary numbers for
records/stores/loads continuously and places them before breakdown
items, this can allow uses to easily read the summarized statistics.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Joe Mario <jmario@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20201014050921.5591-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-15 09:34:33 -03:00
Ian Rogers f92993851f perf bench: Use condition variables in numa.
The existing approach to synchronization between threads in the numa
benchmark is unbalanced mutexes.

This synchronization causes thread sanitizer to warn of locks being
taken twice on a thread without an unlock, as well as unlocks with no
corresponding locks.

This change replaces the synchronization with more regular condition
variables.

While this fixes one class of thread sanitizer warnings, there still
remain warnings of data races due to threads reading and writing shared
memory without any atomics.

Committer testing:

  Basic run on a non-NUMA machine.

  # perf bench numa

          # List of available benchmarks for collection 'numa':

             mem: Benchmark for NUMA workloads
             all: Run all NUMA benchmarks

  # perf bench numa all
  # Running numa/mem benchmark...

   # Running main, "perf bench numa numa-mem"
   #
   # Running test on: Linux five 5.8.12-200.fc32.x86_64 #1 SMP Mon Sep 28 12:17:31 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
   #

   # Running RAM-bw-local, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp  1 --no-data_rand_walk"
           20.076 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.073 secs average thread-runtime
            0.190 % difference between max/avg runtime
          241.828 GB data processed, per thread
          241.828 GB data processed, total
            0.083 nsecs/byte/thread runtime
           12.045 GB/sec/thread speed
           12.045 GB/sec total speed

   # Running RAM-bw-local-NOTHP, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp  1 --no-data_rand_walk --thp -1"
           20.045 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.014 secs average thread-runtime
            0.111 % difference between max/avg runtime
          234.304 GB data processed, per thread
          234.304 GB data processed, total
            0.086 nsecs/byte/thread runtime
           11.689 GB/sec/thread speed
           11.689 GB/sec total speed

   # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 1 -s 20 -zZq --thp  1 --no-data_rand_walk"

  Test not applicable, system has only 1 nodes.

   # Running RAM-bw-local-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 0x2 -s 20 -zZq --thp  1 --no-data_rand_walk"
           20.138 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.121 secs average thread-runtime
            0.342 % difference between max/avg runtime
          135.961 GB data processed, per thread
          271.922 GB data processed, total
            0.148 nsecs/byte/thread runtime
            6.752 GB/sec/thread speed
           13.503 GB/sec total speed

   # Running RAM-bw-remote-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 1x2 -s 20 -zZq --thp  1 --no-data_rand_walk"

  Test not applicable, system has only 1 nodes.

   # Running RAM-bw-cross, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp  1 --no-data_rand_walk"

  Test not applicable, system has only 1 nodes.

   # Running  1x3-convergence, "perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp  1"
            0.747 secs latency to NUMA-converge
            0.747 secs slowest (max) thread-runtime
            0.000 secs fastest (min) thread-runtime
            0.714 secs average thread-runtime
           50.000 % difference between max/avg runtime
            3.228 GB data processed, per thread
            9.683 GB data processed, total
            0.231 nsecs/byte/thread runtime
            4.321 GB/sec/thread speed
           12.964 GB/sec total speed

   # Running  1x4-convergence, "perf bench numa mem -p 1 -t 4 -P 512 -s 100 -zZ0qcm --thp  1"
            1.127 secs latency to NUMA-converge
            1.127 secs slowest (max) thread-runtime
            1.000 secs fastest (min) thread-runtime
            1.089 secs average thread-runtime
            5.624 % difference between max/avg runtime
            3.765 GB data processed, per thread
           15.062 GB data processed, total
            0.299 nsecs/byte/thread runtime
            3.342 GB/sec/thread speed
           13.368 GB/sec total speed

   # Running  1x6-convergence, "perf bench numa mem -p 1 -t 6 -P 1020 -s 100 -zZ0qcm --thp  1"
            1.003 secs latency to NUMA-converge
            1.003 secs slowest (max) thread-runtime
            0.000 secs fastest (min) thread-runtime
            0.889 secs average thread-runtime
           50.000 % difference between max/avg runtime
            2.141 GB data processed, per thread
           12.847 GB data processed, total
            0.469 nsecs/byte/thread runtime
            2.134 GB/sec/thread speed
           12.805 GB/sec total speed

   # Running  2x3-convergence, "perf bench numa mem -p 2 -t 3 -P 1020 -s 100 -zZ0qcm --thp  1"
            1.814 secs latency to NUMA-converge
            1.814 secs slowest (max) thread-runtime
            1.000 secs fastest (min) thread-runtime
            1.716 secs average thread-runtime
           22.440 % difference between max/avg runtime
            3.747 GB data processed, per thread
           22.483 GB data processed, total
            0.484 nsecs/byte/thread runtime
            2.065 GB/sec/thread speed
           12.393 GB/sec total speed

   # Running  3x3-convergence, "perf bench numa mem -p 3 -t 3 -P 1020 -s 100 -zZ0qcm --thp  1"
            2.065 secs latency to NUMA-converge
            2.065 secs slowest (max) thread-runtime
            1.000 secs fastest (min) thread-runtime
            1.947 secs average thread-runtime
           25.788 % difference between max/avg runtime
            2.855 GB data processed, per thread
           25.694 GB data processed, total
            0.723 nsecs/byte/thread runtime
            1.382 GB/sec/thread speed
           12.442 GB/sec total speed

   # Running  4x4-convergence, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp  1"
            1.912 secs latency to NUMA-converge
            1.912 secs slowest (max) thread-runtime
            1.000 secs fastest (min) thread-runtime
            1.775 secs average thread-runtime
           23.852 % difference between max/avg runtime
            1.479 GB data processed, per thread
           23.668 GB data processed, total
            1.293 nsecs/byte/thread runtime
            0.774 GB/sec/thread speed
           12.378 GB/sec total speed

   # Running  4x4-convergence-NOTHP, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp  1 --thp -1"
            1.783 secs latency to NUMA-converge
            1.783 secs slowest (max) thread-runtime
            1.000 secs fastest (min) thread-runtime
            1.633 secs average thread-runtime
           21.960 % difference between max/avg runtime
            1.345 GB data processed, per thread
           21.517 GB data processed, total
            1.326 nsecs/byte/thread runtime
            0.754 GB/sec/thread speed
           12.067 GB/sec total speed

   # Running  4x6-convergence, "perf bench numa mem -p 4 -t 6 -P 1020 -s 100 -zZ0qcm --thp  1"
            5.396 secs latency to NUMA-converge
            5.396 secs slowest (max) thread-runtime
            4.000 secs fastest (min) thread-runtime
            4.928 secs average thread-runtime
           12.937 % difference between max/avg runtime
            2.721 GB data processed, per thread
           65.306 GB data processed, total
            1.983 nsecs/byte/thread runtime
            0.504 GB/sec/thread speed
           12.102 GB/sec total speed

   # Running  4x8-convergence, "perf bench numa mem -p 4 -t 8 -P 512 -s 100 -zZ0qcm --thp  1"
            3.121 secs latency to NUMA-converge
            3.121 secs slowest (max) thread-runtime
            2.000 secs fastest (min) thread-runtime
            2.836 secs average thread-runtime
           17.962 % difference between max/avg runtime
            1.194 GB data processed, per thread
           38.192 GB data processed, total
            2.615 nsecs/byte/thread runtime
            0.382 GB/sec/thread speed
           12.236 GB/sec total speed

   # Running  8x4-convergence, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp  1"
            4.302 secs latency to NUMA-converge
            4.302 secs slowest (max) thread-runtime
            3.000 secs fastest (min) thread-runtime
            4.045 secs average thread-runtime
           15.133 % difference between max/avg runtime
            1.631 GB data processed, per thread
           52.178 GB data processed, total
            2.638 nsecs/byte/thread runtime
            0.379 GB/sec/thread speed
           12.128 GB/sec total speed

   # Running  8x4-convergence-NOTHP, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp  1 --thp -1"
            4.418 secs latency to NUMA-converge
            4.418 secs slowest (max) thread-runtime
            3.000 secs fastest (min) thread-runtime
            4.104 secs average thread-runtime
           16.045 % difference between max/avg runtime
            1.664 GB data processed, per thread
           53.254 GB data processed, total
            2.655 nsecs/byte/thread runtime
            0.377 GB/sec/thread speed
           12.055 GB/sec total speed

   # Running  3x1-convergence, "perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZ0qcm --thp  1"
            0.973 secs latency to NUMA-converge
            0.973 secs slowest (max) thread-runtime
            0.000 secs fastest (min) thread-runtime
            0.955 secs average thread-runtime
           50.000 % difference between max/avg runtime
            4.124 GB data processed, per thread
           12.372 GB data processed, total
            0.236 nsecs/byte/thread runtime
            4.238 GB/sec/thread speed
           12.715 GB/sec total speed

   # Running  4x1-convergence, "perf bench numa mem -p 4 -t 1 -P 512 -s 100 -zZ0qcm --thp  1"
            0.820 secs latency to NUMA-converge
            0.820 secs slowest (max) thread-runtime
            0.000 secs fastest (min) thread-runtime
            0.808 secs average thread-runtime
           50.000 % difference between max/avg runtime
            2.555 GB data processed, per thread
           10.220 GB data processed, total
            0.321 nsecs/byte/thread runtime
            3.117 GB/sec/thread speed
           12.468 GB/sec total speed

   # Running  8x1-convergence, "perf bench numa mem -p 8 -t 1 -P 512 -s 100 -zZ0qcm --thp  1"
            0.667 secs latency to NUMA-converge
            0.667 secs slowest (max) thread-runtime
            0.000 secs fastest (min) thread-runtime
            0.607 secs average thread-runtime
           50.000 % difference between max/avg runtime
            1.009 GB data processed, per thread
            8.069 GB data processed, total
            0.661 nsecs/byte/thread runtime
            1.512 GB/sec/thread speed
           12.095 GB/sec total speed

   # Running 16x1-convergence, "perf bench numa mem -p 16 -t 1 -P 256 -s 100 -zZ0qcm --thp  1"
            1.546 secs latency to NUMA-converge
            1.546 secs slowest (max) thread-runtime
            1.000 secs fastest (min) thread-runtime
            1.485 secs average thread-runtime
           17.664 % difference between max/avg runtime
            1.162 GB data processed, per thread
           18.594 GB data processed, total
            1.331 nsecs/byte/thread runtime
            0.752 GB/sec/thread speed
           12.025 GB/sec total speed

   # Running 32x1-convergence, "perf bench numa mem -p 32 -t 1 -P 128 -s 100 -zZ0qcm --thp  1"
            0.812 secs latency to NUMA-converge
            0.812 secs slowest (max) thread-runtime
            0.000 secs fastest (min) thread-runtime
            0.739 secs average thread-runtime
           50.000 % difference between max/avg runtime
            0.309 GB data processed, per thread
            9.874 GB data processed, total
            2.630 nsecs/byte/thread runtime
            0.380 GB/sec/thread speed
           12.166 GB/sec total speed

   # Running  2x1-bw-process, "perf bench numa mem -p 2 -t 1 -P 1024 -s 20 -zZ0q --thp  1"
           20.044 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.020 secs average thread-runtime
            0.109 % difference between max/avg runtime
          125.750 GB data processed, per thread
          251.501 GB data processed, total
            0.159 nsecs/byte/thread runtime
            6.274 GB/sec/thread speed
           12.548 GB/sec total speed

   # Running  3x1-bw-process, "perf bench numa mem -p 3 -t 1 -P 1024 -s 20 -zZ0q --thp  1"
           20.148 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.090 secs average thread-runtime
            0.367 % difference between max/avg runtime
           85.267 GB data processed, per thread
          255.800 GB data processed, total
            0.236 nsecs/byte/thread runtime
            4.232 GB/sec/thread speed
           12.696 GB/sec total speed

   # Running  4x1-bw-process, "perf bench numa mem -p 4 -t 1 -P 1024 -s 20 -zZ0q --thp  1"
           20.169 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.100 secs average thread-runtime
            0.419 % difference between max/avg runtime
           63.144 GB data processed, per thread
          252.576 GB data processed, total
            0.319 nsecs/byte/thread runtime
            3.131 GB/sec/thread speed
           12.523 GB/sec total speed

   # Running  8x1-bw-process, "perf bench numa mem -p 8 -t 1 -P  512 -s 20 -zZ0q --thp  1"
           20.175 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.107 secs average thread-runtime
            0.433 % difference between max/avg runtime
           31.267 GB data processed, per thread
          250.133 GB data processed, total
            0.645 nsecs/byte/thread runtime
            1.550 GB/sec/thread speed
           12.398 GB/sec total speed

   # Running  8x1-bw-process-NOTHP, "perf bench numa mem -p 8 -t 1 -P  512 -s 20 -zZ0q --thp  1 --thp -1"
           20.216 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.113 secs average thread-runtime
            0.535 % difference between max/avg runtime
           30.998 GB data processed, per thread
          247.981 GB data processed, total
            0.652 nsecs/byte/thread runtime
            1.533 GB/sec/thread speed
           12.266 GB/sec total speed

   # Running 16x1-bw-process, "perf bench numa mem -p 16 -t 1 -P 256 -s 20 -zZ0q --thp  1"
           20.234 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.174 secs average thread-runtime
            0.577 % difference between max/avg runtime
           15.377 GB data processed, per thread
          246.039 GB data processed, total
            1.316 nsecs/byte/thread runtime
            0.760 GB/sec/thread speed
           12.160 GB/sec total speed

   # Running  1x4-bw-thread, "perf bench numa mem -p 1 -t 4 -T 256 -s 20 -zZ0q --thp  1"
           20.040 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.028 secs average thread-runtime
            0.099 % difference between max/avg runtime
           66.832 GB data processed, per thread
          267.328 GB data processed, total
            0.300 nsecs/byte/thread runtime
            3.335 GB/sec/thread speed
           13.340 GB/sec total speed

   # Running  1x8-bw-thread, "perf bench numa mem -p 1 -t 8 -T 256 -s 20 -zZ0q --thp  1"
           20.064 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.034 secs average thread-runtime
            0.160 % difference between max/avg runtime
           32.911 GB data processed, per thread
          263.286 GB data processed, total
            0.610 nsecs/byte/thread runtime
            1.640 GB/sec/thread speed
           13.122 GB/sec total speed

   # Running 1x16-bw-thread, "perf bench numa mem -p 1 -t 16 -T 128 -s 20 -zZ0q --thp  1"
           20.092 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.052 secs average thread-runtime
            0.230 % difference between max/avg runtime
           16.131 GB data processed, per thread
          258.088 GB data processed, total
            1.246 nsecs/byte/thread runtime
            0.803 GB/sec/thread speed
           12.845 GB/sec total speed

   # Running 1x32-bw-thread, "perf bench numa mem -p 1 -t 32 -T 64 -s 20 -zZ0q --thp  1"
           20.099 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.063 secs average thread-runtime
            0.247 % difference between max/avg runtime
            7.962 GB data processed, per thread
          254.773 GB data processed, total
            2.525 nsecs/byte/thread runtime
            0.396 GB/sec/thread speed
           12.676 GB/sec total speed

   # Running  2x3-bw-process, "perf bench numa mem -p 2 -t 3 -P 512 -s 20 -zZ0q --thp  1"
           20.150 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.120 secs average thread-runtime
            0.372 % difference between max/avg runtime
           44.827 GB data processed, per thread
          268.960 GB data processed, total
            0.450 nsecs/byte/thread runtime
            2.225 GB/sec/thread speed
           13.348 GB/sec total speed

   # Running  4x4-bw-process, "perf bench numa mem -p 4 -t 4 -P 512 -s 20 -zZ0q --thp  1"
           20.258 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.168 secs average thread-runtime
            0.636 % difference between max/avg runtime
           17.079 GB data processed, per thread
          273.263 GB data processed, total
            1.186 nsecs/byte/thread runtime
            0.843 GB/sec/thread speed
           13.489 GB/sec total speed

   # Running  4x6-bw-process, "perf bench numa mem -p 4 -t 6 -P 512 -s 20 -zZ0q --thp  1"
           20.559 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.382 secs average thread-runtime
            1.359 % difference between max/avg runtime
           10.758 GB data processed, per thread
          258.201 GB data processed, total
            1.911 nsecs/byte/thread runtime
            0.523 GB/sec/thread speed
           12.559 GB/sec total speed

   # Running  4x8-bw-process, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp  1"
           20.744 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.516 secs average thread-runtime
            1.792 % difference between max/avg runtime
            8.069 GB data processed, per thread
          258.201 GB data processed, total
            2.571 nsecs/byte/thread runtime
            0.389 GB/sec/thread speed
           12.447 GB/sec total speed

   # Running  4x8-bw-process-NOTHP, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp  1 --thp -1"
           20.855 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.561 secs average thread-runtime
            2.050 % difference between max/avg runtime
            8.069 GB data processed, per thread
          258.201 GB data processed, total
            2.585 nsecs/byte/thread runtime
            0.387 GB/sec/thread speed
           12.381 GB/sec total speed

   # Running  3x3-bw-process, "perf bench numa mem -p 3 -t 3 -P 512 -s 20 -zZ0q --thp  1"
           20.134 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.077 secs average thread-runtime
            0.333 % difference between max/avg runtime
           28.091 GB data processed, per thread
          252.822 GB data processed, total
            0.717 nsecs/byte/thread runtime
            1.395 GB/sec/thread speed
           12.557 GB/sec total speed

   # Running  5x5-bw-process, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp  1"
           20.588 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.375 secs average thread-runtime
            1.427 % difference between max/avg runtime
           10.177 GB data processed, per thread
          254.436 GB data processed, total
            2.023 nsecs/byte/thread runtime
            0.494 GB/sec/thread speed
           12.359 GB/sec total speed

   # Running 2x16-bw-process, "perf bench numa mem -p 2 -t 16 -P 512 -s 20 -zZ0q --thp  1"
           20.657 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.429 secs average thread-runtime
            1.589 % difference between max/avg runtime
            8.170 GB data processed, per thread
          261.429 GB data processed, total
            2.528 nsecs/byte/thread runtime
            0.395 GB/sec/thread speed
           12.656 GB/sec total speed

   # Running 1x32-bw-process, "perf bench numa mem -p 1 -t 32 -P 2048 -s 20 -zZ0q --thp  1"
           22.981 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           21.996 secs average thread-runtime
            6.486 % difference between max/avg runtime
            8.863 GB data processed, per thread
          283.606 GB data processed, total
            2.593 nsecs/byte/thread runtime
            0.386 GB/sec/thread speed
           12.341 GB/sec total speed

   # Running numa02-bw, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp  1"
           20.047 secs slowest (max) thread-runtime
           19.000 secs fastest (min) thread-runtime
           20.026 secs average thread-runtime
            2.611 % difference between max/avg runtime
            8.441 GB data processed, per thread
          270.111 GB data processed, total
            2.375 nsecs/byte/thread runtime
            0.421 GB/sec/thread speed
           13.474 GB/sec total speed

   # Running numa02-bw-NOTHP, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp  1 --thp -1"
           20.088 secs slowest (max) thread-runtime
           19.000 secs fastest (min) thread-runtime
           20.025 secs average thread-runtime
            2.709 % difference between max/avg runtime
            8.411 GB data processed, per thread
          269.142 GB data processed, total
            2.388 nsecs/byte/thread runtime
            0.419 GB/sec/thread speed
           13.398 GB/sec total speed

   # Running numa01-bw-thread, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp  1"
           20.293 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.175 secs average thread-runtime
            0.721 % difference between max/avg runtime
            7.918 GB data processed, per thread
          253.374 GB data processed, total
            2.563 nsecs/byte/thread runtime
            0.390 GB/sec/thread speed
           12.486 GB/sec total speed

   # Running numa01-bw-thread-NOTHP, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp  1 --thp -1"
           20.411 secs slowest (max) thread-runtime
           20.000 secs fastest (min) thread-runtime
           20.226 secs average thread-runtime
            1.006 % difference between max/avg runtime
            7.931 GB data processed, per thread
          253.778 GB data processed, total
            2.574 nsecs/byte/thread runtime
            0.389 GB/sec/thread speed
           12.434 GB/sec total speed

  #

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20201012161611.366482-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 14:24:53 -03:00
Linus Torvalds 6873139ed0 objtool changes for v5.10:
- Most of the changes are cleanups and reorganization to make the objtool code
    more arch-agnostic. This is in preparation for non-x86 support.
 
 Fixes:
 
  - KASAN fixes.
  - Handle unreachable trap after call to noreturn functions better.
  - Ignore unreachable fake jumps.
  - Misc smaller fixes & cleanups.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+FgwIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1juGw/6A6goA5/HHapM965yG1eY/rTLp3eIbcma
 1ZbkUsP0YfT6wVUzw/sOeZzKNOwOq1FuMfkjuH2KcnlxlcMekIaKvLk8uauW4igM
 hbFGuuZfZ0An5ka9iQ1W6HGdsuD3vVlN1w/kxdWk0c3lJCVQSTxdCfzF8fuF3gxX
 lF3Bc1D/ZFcHIHT/hu/jeIUCgCYpD3qZDjQJBScSwVthZC+Fw6weLLGp2rKDaCao
 HhSQft6MUfDrUKfH3LBIUNPRPCOrHo5+AX6BXxLXJVxqlwO/YU3e0GMwSLedMtBy
 TASWo7/9GAp+wNNZe8EliyTKrfC3sLxN1QImfjuojxbBVXx/YQ/ToTt9fVGpF4Y+
 XhhRFv9520v1tS2wPHIgQGwbh7EWG6mdrmo10RAs/31ViONPrbEZ4WmcA08b/5FY
 KEkOVb18yfmDVzVZPpSc+HpIFkppEBOf7wPg27Bj3RTZmzIl/y+rKSnxROpsJsWb
 R6iov7SFVET14lHl1G7tPNXfqRaS7HaOQIj3rSUyAP0ZfX+yIupVJp32dc6Ofg8b
 SddUCwdIHoFdUNz4Y9csUCrewtCVJbxhV4MIdv0GpWbrgSw96RFZgetaH+6mGRpj
 0Kh6M1eC3irDbhBuarWUBAr2doPAq4iOUeQU36Q6YSAbCs83Ws2uKOWOHoFBVwCH
 uSKT0wqqG+E=
 =KX5o
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2020-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:
 "Most of the changes are cleanups and reorganization to make the
  objtool code more arch-agnostic. This is in preparation for non-x86
  support.

  Other changes:

   - KASAN fixes

   - Handle unreachable trap after call to noreturn functions better

   - Ignore unreachable fake jumps

   - Misc smaller fixes & cleanups"

* tag 'objtool-core-2020-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  perf build: Allow nested externs to enable BUILD_BUG() usage
  objtool: Allow nested externs to enable BUILD_BUG()
  objtool: Permit __kasan_check_{read,write} under UACCESS
  objtool: Ignore unreachable trap after call to noreturn functions
  objtool: Handle calling non-function symbols in other sections
  objtool: Ignore unreachable fake jumps
  objtool: Remove useless tests before save_reg()
  objtool: Decode unwind hint register depending on architecture
  objtool: Make unwind hint definitions available to other architectures
  objtool: Only include valid definitions depending on source file type
  objtool: Rename frame.h -> objtool.h
  objtool: Refactor jump table code to support other architectures
  objtool: Make relocation in alternative handling arch dependent
  objtool: Abstract alternative special case handling
  objtool: Move macros describing structures to arch-dependent code
  objtool: Make sync-check consider the target architecture
  objtool: Group headers to check in a single list
  objtool: Define 'struct orc_entry' only when needed
  objtool: Skip ORC entry creation for non-text sections
  objtool: Move ORC logic out of check()
  ...
2020-10-14 10:13:37 -07:00
John Garry caf7f9685d perf jevents: Fix event code for events referencing std arch events
The event code for events referencing std arch events is incorrectly
evaluated in json_events().

The issue is that je.event is evaluated properly from try_fixup(), but
later NULLified from the real_event() call, as "event" may be NULL.

Fix by setting "event" same je.event in try_fixup().

Also remove support for overwriting event code for events using std arch
events, as it is not used.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-By: Kajol Jain<kjain@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/1602170368-11892-1-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 13:43:31 -03:00
Jin Yao 2a09a84c72 perf diff: Support hot streams comparison
This patch enables perf-diff with "--stream" option.

"--stream": Enable hot streams comparison

Now let's see example.

perf record -b ...      Generate perf.data.old with branch data
perf record -b ...      Generate perf.data with branch data
perf diff --stream

[ Matched hot streams ]

hot chain pair 1:
            cycles: 1, hits: 27.77%                  cycles: 1, hits: 9.24%
        ---------------------------              --------------------------
                      main div.c:39                           main div.c:39
                      main div.c:44                           main div.c:44

hot chain pair 2:
           cycles: 34, hits: 20.06%                cycles: 27, hits: 16.98%
        ---------------------------              --------------------------
          __random_r random_r.c:360               __random_r random_r.c:360
          __random_r random_r.c:388               __random_r random_r.c:388
          __random_r random_r.c:388               __random_r random_r.c:388
          __random_r random_r.c:380               __random_r random_r.c:380
          __random_r random_r.c:357               __random_r random_r.c:357
              __random random.c:293                   __random random.c:293
              __random random.c:293                   __random random.c:293
              __random random.c:291                   __random random.c:291
              __random random.c:291                   __random random.c:291
              __random random.c:291                   __random random.c:291
              __random random.c:288                   __random random.c:288
                     rand rand.c:27                          rand rand.c:27
                     rand rand.c:26                          rand rand.c:26
                           rand@plt                                rand@plt
                           rand@plt                                rand@plt
              compute_flag div.c:25                   compute_flag div.c:25
              compute_flag div.c:22                   compute_flag div.c:22
                      main div.c:40                           main div.c:40
                      main div.c:40                           main div.c:40
                      main div.c:39                           main div.c:39

hot chain pair 3:
             cycles: 9, hits: 4.48%                  cycles: 6, hits: 4.51%
        ---------------------------              --------------------------
          __random_r random_r.c:360               __random_r random_r.c:360
          __random_r random_r.c:388               __random_r random_r.c:388
          __random_r random_r.c:388               __random_r random_r.c:388
          __random_r random_r.c:380               __random_r random_r.c:380

[ Hot streams in old perf data only ]

hot chain 1:
            cycles: 18, hits: 6.75%
         --------------------------
          __random_r random_r.c:360
          __random_r random_r.c:388
          __random_r random_r.c:388
          __random_r random_r.c:380
          __random_r random_r.c:357
              __random random.c:293
              __random random.c:293
              __random random.c:291
              __random random.c:291
              __random random.c:291
              __random random.c:288
                     rand rand.c:27
                     rand rand.c:26
                           rand@plt
                           rand@plt
              compute_flag div.c:25
              compute_flag div.c:22
                      main div.c:40

hot chain 2:
            cycles: 29, hits: 2.78%
         --------------------------
              compute_flag div.c:22
                      main div.c:40
                      main div.c:40
                      main div.c:39

[ Hot streams in new perf data only ]

hot chain 1:
                                                     cycles: 4, hits: 4.54%
                                                 --------------------------
                                                              main div.c:42
                                                      compute_flag div.c:28

hot chain 2:
                                                     cycles: 5, hits: 3.51%
                                                 --------------------------
                                                              main div.c:39
                                                              main div.c:44
                                                              main div.c:42
                                                      compute_flag div.c:28

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20201009022845.13141-8-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 13:34:48 -03:00
Jin Yao 5bbd6bad3b perf streams: Report hot streams
We show the streams separately. They are divided into different sections.

1. "Matched hot streams"

2. "Hot streams in old perf data only"

3. "Hot streams in new perf data only".

For each stream, we report the cycles and hot percent (hits%).

For example,

     cycles: 2, hits: 4.08%
 --------------------------
              main div.c:42
      compute_flag div.c:28

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20201009022845.13141-7-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 13:34:26 -03:00
Jin Yao 28904f4dce perf streams: Calculate the sum of total streams hits
We have used callchain_node->hit to measure the hot level of one stream.
This patch calculates the sum of hits of total streams.

Thus in next patch, we can use following formula to report hot percent
for one stream.

hot percent = callchain_node->hit / sum of total hits

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20201009022845.13141-6-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 13:34:06 -03:00
Jin Yao fa79aa6485 perf streams: Link stream pair
In previous patch, we have created an evsel_streams for one event, and
top N hottest streams will be saved in a stream array in evsel_streams.

This patch compares total streams among two evsel_streams.

Once two streams are fully matched, they will be linked as a pair. From
the pair, we can know which streams are matched.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20201009022845.13141-5-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 13:32:36 -03:00
Jin Yao 47ef8398c3 perf streams: Compare two streams
Stream is the branch history which is aggregated by the branch records
from perf samples. Now we support the callchain as stream.

If the callchain entries of one stream are fully matched with the
callchain entries of another stream, we think two streams are matched.

For example,

   cycles: 1, hits: 26.80%                 cycles: 1, hits: 27.30%
   -----------------------                 -----------------------
             main div.c:39                           main div.c:39
             main div.c:44                           main div.c:44

Above two streams are matched (we don't consider the case that source
code is changed).

The matching logic is, compare the chain string first. If it's not
matched, fallback to dso address comparison.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20201009022845.13141-4-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 13:31:56 -03:00
Jin Yao dd1d841810 perf streams: Get the evsel_streams by evsel_idx
In previous patch, we have created evsel_streams array.

This patch returns the specified evsel_streams according to the
evsel_idx.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20201009022845.13141-3-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 13:30:13 -03:00
Jin Yao 480accbb17 perf streams: Introduce branch history "streams"
We define a stream as the branch history which is aggregated by the
branch records from perf samples. For example, the callchains aggregated
from the branch records are considered as streams.  By browsing the hot
stream, we can understand the hot code path.

Now we only support the callchain for stream. For measuring the hot
level for a stream, we use the callchain_node->hit, higher is hotter.

There may be many callchains sampled so we only focus on the top N
hottest callchains. N is a user defined parameter or predefined default
value (nr_streams_max).

This patch creates an evsel_streams array per event, and saves the top N
hottest streams in a stream array.

So now we can get the per-event top N hottest streams.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20201009022845.13141-2-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 13:27:28 -03:00
Andi Kleen 6556a75bec perf intel-pt: Improve PT documentation slightly
Document the higher level --insn-trace etc. perf script options.

Include the howto how to build xed into the manpage

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20201014035346.4772-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 13:14:40 -03:00
Andi Kleen 0997a2662f perf tools: Add support for exclusive groups/events
Peter suggested that using the exclusive mode in perf could avoid some
problems with bad scheduling of groups. Exclusive is implemented in the
kernel, but wasn't exposed by the perf tool, so hard to use without
custom low level API users.

Add support for marking groups or events with :e for exclusive in the
perf tool.  The implementation is basically the same as the existing
pinned attribute.

Committer testing:

  # perf test "parse event"
   6: Parse event definition strings                                  : Ok
  # perf test -v "parse event" |& grep :u*e
  running test 56 'instructions:uep'
  running test 57 '{cycles,cache-misses,branch-misses}:e'
  #
  #
  # grep "model name" -m1 /proc/cpuinfo
  model name	: AMD Ryzen 9 3900X 12-Core Processor
  #
  # perf stat -a -e '{cycles,cache-misses,branch-misses}:e' sleep 1

   Performance counter stats for 'system wide':

       <not counted>      cycles                                                        (0.00%)
       <not counted>      cache-misses                                                  (0.00%)
       <not counted>      branch-misses                                                 (0.00%)

         1.001269893 seconds time elapsed

  Some events weren't counted. Try disabling the NMI watchdog:
  	echo 0 > /proc/sys/kernel/nmi_watchdog
  	perf stat ...
  	echo 1 > /proc/sys/kernel/nmi_watchdog
  # echo 0 > /proc/sys/kernel/nmi_watchdog
  # perf stat -a -e '{cycles,cache-misses,branch-misses}:e' sleep 1

   Performance counter stats for 'system wide':

       1,298,663,141      cycles
          30,962,215      cache-misses
           5,325,150      branch-misses

         1.001474934 seconds time elapsed

  #
  # The output for asking for precise events on AMD needs to improve, it
  # supposedly works only for system wide or per CPU
  #
  # perf stat -a -e '{cycles,cache-misses,branch-misses}:uep' sleep 1
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles).
  /bin/dmesg | grep -i perf may provide additional information.

  # perf stat -a -e '{cycles,cache-misses,branch-misses}:ue' sleep 1

   Performance counter stats for 'system wide':

         746,363,126      cycles
          16,881,611      cache-misses
           2,871,259      branch-misses

         1.001636066 seconds time elapsed

  #

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20201014144255.22699-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 12:24:28 -03:00
Jiri Olsa 78b2c50c5d perf test: Add build id shell test
Add a test for the build id cache that adds a binary with sha1 and md5
build ids and verifies it's added properly.

The test updates build id cache with 'perf record' and 'perf buildid-cache -a'.

Committer testing:

  # perf test "build id"
  82: build id cache operations                                       : Ok
  #
  # perf test -v "build id"
  82: build id cache operations                                       :
  --- start ---
  test child forked, pid 447218
  test binaries: /tmp/perf.ex.SHA1.B8I /tmp/perf.ex.MD5.7Nv
  Adding d1abc1eb7568358cf23c959566f23462461834d1 /tmp/perf.ex.SHA1.B8I: Ok
  build id: d1abc1eb7568358cf23c959566f23462461834d1
  link: /tmp/perf.debug.sS2/.build-id/d1/abc1eb7568358cf23c959566f23462461834d1
  file: /tmp/perf.debug.sS2/.build-id/d1/../../tmp/perf.ex.SHA1.B8I/d1abc1eb7568358cf23c959566f23462461834d1/elf
  OK for /tmp/perf.ex.SHA1.B8I
  Adding a50e350e97c43b4708d09bcd85ebfff7 /tmp/perf.ex.MD5.7Nv: Ok
  build id: a50e350e97c43b4708d09bcd85ebfff7
  link: /tmp/perf.debug.IuW/.build-id/a5/0e350e97c43b4708d09bcd85ebfff7
  file: /tmp/perf.debug.IuW/.build-id/a5/../../tmp/perf.ex.MD5.7Nv/a50e350e97c43b4708d09bcd85ebfff7/elf
  OK for /tmp/perf.ex.MD5.7Nv
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.034 MB /tmp/perf.data.xrH ]
  build id: d1abc1eb7568358cf23c959566f23462461834d1
  link: /tmp/perf.debug.eGR/.build-id/d1/abc1eb7568358cf23c959566f23462461834d1
  file: /tmp/perf.debug.eGR/.build-id/d1/../../tmp/perf.ex.SHA1.B8I/d1abc1eb7568358cf23c959566f23462461834d1/elf
  OK for /tmp/perf.ex.SHA1.B8I
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 0.034 MB /tmp/perf.data.cbE ]
  build id: a50e350e97c43b4708d09bcd85ebfff7
  link: /tmp/perf.debug.82t/.build-id/a5/0e350e97c43b4708d09bcd85ebfff7
  file: /tmp/perf.debug.82t/.build-id/a5/../../tmp/perf.ex.MD5.7Nv/a50e350e97c43b4708d09bcd85ebfff7/elf
  OK for /tmp/perf.ex.MD5.7Nv
  test child finished with 0
  ---- end ----
  build id cache operations: Ok
  #

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 11:28:52 -03:00
Jiri Olsa e9ad94381c perf tools: Align buildid list output for short build ids
With shorter md5 build ids we need to align their paths properly with
other build ids:

  $ perf buildid-list
  17f4e448cc746582ea1881528deb549f7fdb3fd5 [kernel.kallsyms]
  a50e350e97c43b4708d09bcd85ebfff7         .../tools/perf/buildid-ex-md5
  1805c738c8f3ec0f47b7ea09080c28f34d18a82b /usr/lib64/ld-2.31.so
  $

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 11:28:52 -03:00
Jiri Olsa b0a323c7f0 perf tools: Add size to 'struct perf_record_header_build_id'
We do not store size with build ids in perf data, but there's enough
space to do it. Adding misc bit PERF_RECORD_MISC_BUILD_ID_SIZE to mark
build id event with size.

With this fix the dso with md5 build id will have correct build id data
and will be usable for debuginfod processing if needed (coming in
following patches).

Committer notes:

Use %zu with size_t to fix this error on 32-bit arches:

  util/header.c: In function '__event_process_build_id':
  util/header.c:2105:3: error: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'size_t' [-Werror=format=]
     pr_debug("build id event received for %s: %s [%lu]\n",
     ^

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 11:28:12 -03:00
Jiri Olsa 39be8d0115 perf tools: Pass build_id object to dso__build_id_equal()
Passing build_id object to dso__build_id_equal(), so we can properly
check build id with different size than sha1.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 09:25:36 -03:00
Jiri Olsa 8dfdf440d3 perf tools: Pass build_id object to dso__set_build_id()
Passing build_id object to dso__set_build_id(), so it's easier
to initialize dos's build id object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 08:46:42 -03:00
Jiri Olsa bf5411695a perf tools: Pass build_id object to build_id__sprintf()
Passing build_id object to build_id__sprintf function, so it can operate
with the proper size of build id.

This will create proper md5 build id readable names,
like following:

  a50e350e97c43b4708d09bcd85ebfff7

instead of:

  a50e350e97c43b4708d09bcd85ebfff700000000

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 08:46:22 -03:00
Jiri Olsa 3ff1b8c8cc perf tools: Pass build id object to sysfs__read_build_id()
Passing build id object to sysfs__read_build_id function, so it can
populate the size of the build_id object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 08:46:02 -03:00