Commit Graph

7709 Commits

Author SHA1 Message Date
Adrian Hunter ab66fdace8 perf build-id: Fix caching files with a wrong build ID
Build ID events associate a file name with a build ID.  However, when
using perf inject, there is no guarantee that the file on the current
machine at the current time has that build ID. Fix by comparing the
build IDs and skip adding to the cache if they are different.

Example:

  $ echo "int main() {return 0;}" > prog.c
  $ gcc -o prog prog.c
  $ perf record --buildid-all ./prog
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.019 MB perf.data ]
  $ file-buildid() { file $1 | awk -F= '{print $2}' | awk -F, '{print $1}' ; }
  $ file-buildid prog
  444ad9be165d8058a48ce2ffb4e9f55854a3293e
  $ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
  444ad9be165d8058a48ce2ffb4e9f55854a3293e
  $ echo "int main() {return 1;}" > prog.c
  $ gcc -o prog prog.c
  $ file-buildid prog
  885524d5aaa24008a3e2b06caa3ea95d013c0fc5

Before:

  $ perf buildid-cache --purge $(pwd)/prog
  $ perf inject -i perf.data -o junk
  $ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
  885524d5aaa24008a3e2b06caa3ea95d013c0fc5
  $

After:

  $ perf buildid-cache --purge $(pwd)/prog
  $ perf inject -i perf.data -o junk
  $ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf

  $

Fixes: 454c407ec1 ("perf: add perf-inject builtin")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Link: https://lore.kernel.org/r/20220621125144.5623-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-26 12:32:55 -03:00
Ian Rogers c788ef61ef perf metrics: Ensure at least 1 id per metric
We may have no events for a metric evaluated to a constant. In such a
case ensure a tool event is at least evaluated for metric parsing and
displaying.

Fixes: 8586d2744f ("perf metrics: Don't add all tool events for sharing")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220618013957.999321-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 11:24:05 -03:00
Leo Yan 51ba539f5b perf arm-spe: Don't set data source if it's not a memory operation
Except for memory load and store operations, ARM SPE records also can
support other operation types, bug when set the data source field the
current code assumes a record is a either load operation or store
operation, this leads to wrongly synthesize memory samples.

This patch strictly checks the record operation type, it only sets data
source only for the operation types ARM_SPE_LD and ARM_SPE_ST,
otherwise, returns zero for data source.  Therefore, we can synthesize
memory samples only when data source is a non-zero value, the function
arm_spe__is_memory_event() is useless and removed.

Fixes: e55ed3423c ("perf arm-spe: Synthesize memory event")
Reviewed-by: Ali Saidi <alisaidi@amazon.com>
Reviewed-by: German Gomez <german.gomez@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Ali Saidi <alisaidi@amazon.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: alisaidi@amazon.com
Cc: Andrew Kilroy <andrew.kilroy@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Forrington <nick.forrington@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20220517020326.18580-5-alisaidi@amazon.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 10:41:43 -03:00
Ian Rogers e5287e6dd3 perf expr: Allow exponents on floating point values
Pass the optional exponent component through to strtod that already
supports it. We already have exponents in ScaleUnit and so this adds
uniformity.

Reported-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Reviewed-By: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20220527020653.4160884-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 10:41:43 -03:00
Ian Rogers 1d98cdf7fa perf unwind: Fix uninitialized variable
The 'ret' variable may be uninitialized on error goto paths.

Fixes: dc2cf4ca86 ("perf unwind: Fix segbase for ld.lld linked objects")
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Fangrui Song <maskray@google.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
Cc: Fangrui Song <maskray@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: llvm@lists.linux.dev
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Ullrich <sebasti@nullri.ch>
Link: https://lore.kernel.org/r/20220607000851.39798-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19 10:41:43 -03:00
Fangrui Song dc2cf4ca86 perf unwind: Fix segbase for ld.lld linked objects
segbase is the address of .eh_frame_hdr and table_data is segbase plus
the header size. find_proc_info computes segbase as `map->start +
segbase - map->pgoff` which is wrong when

* .eh_frame_hdr and .text are in different PT_LOAD program headers
* and their p_vaddr difference does not equal their p_offset difference

Since 10.0, ld.lld's default --rosegment -z noseparate-code layout has
such R and RX PT_LOAD program headers.

    ld.lld (default) => perf report fails to unwind `perf record
    --call-graph dwarf` recorded data
    ld.lld --no-rosegment => ok (trivial, no R PT_LOAD)
    ld.lld -z separate-code => ok but by luck: there are two PT_LOAD but
    their p_vaddr difference equals p_offset difference

    ld.bfd -z noseparate-code => ok (trivial, no R PT_LOAD)
    ld.bfd -z separate-code (default for Linux/x86) => ok but by luck:
    there are two PT_LOAD but their p_vaddr difference equals p_offset
    difference

To fix the issue, compute segbase as dso's base address plus
PT_GNU_EH_FRAME's p_vaddr. The base address is computed by iterating
over all dso-associated maps and then subtract the first PT_LOAD p_vaddr
(the minimum guaranteed by generic ABI) from the minimum address.

In libunwind, find_proc_info transitively called by unw_step is cached,
so the iteration overhead is acceptable.

Reported-by: Sebastian Ullrich <sebasti@nullri.ch>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Fangrui Song <maskray@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: llvm@lists.linux.dev
Link: https://github.com/ClangBuiltLinux/linux/issues/1646
Link: https://lore.kernel.org/r/20220527182039.673248-1-maskray@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-03 21:20:25 +02:00
Leo Yan c4f462235c perf scripting python: Expose dso and map information
This change adds dso build_id and corresponding map's start and end
address.  The info of dso build_id can be used to find dso file path,
and we can validate if a branch address falls into the range of map's
start and end addresses.

In addition, the map's start address can be used as an offset for
disassembly.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Eelco Chaudron <echaudro@redhat.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Tanmay Jagdale <tanmay@marvell.com>
Cc: coresight@lists.linaro.org
Cc: zengshun . wu <zengshun.wu@outlook.com>
Link: https://lore.kernel.org/r/20220521130446.4163597-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-27 13:22:13 -03:00
James Clark 2be00431c5 perf tools arm64: Add support for VG register
Add the name of the VG register so it can be used in --user-regs

The event will fail to open if the register is requested but not
available so only add it to the mask if the kernel supports sve and also
if it supports that specific register.

Committer notes:

Add conditional definition of HWCAP_SVE, as suggested by Leo Yan, to
build on older systems where this is not available in the system
headers.

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220525154114.718321-6-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-27 13:21:33 -03:00
James Clark 721052048b perf unwind: Use dynamic register set for DWARF unwind
Architectures can detect availability of extra registers at runtime so
use this more complete set for unwinding. This will include the VG
register on arm64 in a later commit.

If the function isn't implemented then PERF_REGS_MASK is returned and
there is no change.

Committer notes:

Added util/perf_regs.c to tools/perf/util/python-ext-sources so that
'perf test python' passes, i.e. the perf python binding has all the
symbols it needs, addressing:

  $ perf test -v python
   19: 'import perf' in python                                         :
  --- start ---
  test child forked, pid 2037817
  python usage test: "echo "import sys ; sys.path.append('/tmp/build/perf/python'); import perf" | '/usr/bin/python3' "
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  ImportError: /tmp/build/perf/python/perf.cpython-310-x86_64-linux-gnu.so: undefined symbol: arch__user_reg_mask
  test child finished with -1
  ---- end ----
  'import perf' in python: FAILED!
  $

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220525154114.718321-4-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 12:41:36 -03:00
James Clark 8803880f7d perf unwind arm64: Use perf's copy of kernel headers
Fix this include path to use perf's copy of the kernel header rather
than the one from the root of the repo.

This fixes build errors when only applying the perf tools part of a
patchset rather than both sides.

Reported-by: German Gomez <german.gomez@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Tested-by: German Gomez <german.gomez@arm.com>
Cc: <broonie@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220525154114.718321-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 12:36:58 -03:00
Namhyung Kim 685439a7a0 perf record: Add cgroup support for off-cpu profiling
This covers two different use cases.  The first one is cgroup
filtering given by -G/--cgroup option which controls the off-cpu
profiling for tasks in the given cgroups only.

The other use case is cgroup sampling which is enabled by
--all-cgroups option and it adds PERF_SAMPLE_CGROUP to the sample_type
to set the cgroup id of the task in the sample data.

Example output.

  $ sudo perf record -a --off-cpu --all-cgroups sleep 1

  $ sudo perf report --stdio -s comm,cgroup --call-graph=no
  ...
  # Samples: 144  of event 'offcpu-time'
  # Event count (approx.): 48452045427
  #
  # Children      Self  Command          Cgroup
  # ........  ........  ...............  ..........................................
  #
      61.57%     5.60%  Chrome_ChildIOT  /user.slice/user-657345.slice/user@657345.service/app.slice/...
      29.51%     7.38%  Web Content      /user.slice/user-657345.slice/user@657345.service/app.slice/...
      17.48%     1.59%  Chrome_IOThread  /user.slice/user-657345.slice/user@657345.service/app.slice/...
      16.48%     4.12%  pipewire-pulse   /user.slice/user-657345.slice/user@657345.service/session.slice/...
      14.48%     2.07%  perf             /user.slice/user-657345.slice/user@657345.service/app.slice/...
      14.30%     7.15%  CompositorTileW  /user.slice/user-657345.slice/user@657345.service/app.slice/...
      13.33%     6.67%  Timer            /user.slice/user-657345.slice/user@657345.service/app.slice/...
  ...

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220518224725.742882-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 12:36:58 -03:00
Namhyung Kim b36888f71c perf record: Handle argument change in sched_switch
Recently sched_switch tracepoint added a new argument for prev_state,
but it's hard to handle the change in a BPF program.  Instead, we can
check the function prototype in BTF before loading the program.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220518224725.742882-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 12:36:57 -03:00
Namhyung Kim 10742d0c07 perf record: Implement basic filtering for off-cpu
It should honor cpu and task filtering with -a, -C or -p, -t options.

Committer testing:

  # perf record --off-cpu --cpu 1 perf bench sched messaging -l 1000
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 10 groups == 400 processes run

       Total time: 1.722 [sec]
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 1.446 MB perf.data (7248 samples) ]
  #
  # perf script | head -20
              perf 97164 [001] 38287.696761:          1      cycles:  ffffffffb6070174 native_write_msr+0x4 (vmlinux)
              perf 97164 [001] 38287.696764:          1      cycles:  ffffffffb6070174 native_write_msr+0x4 (vmlinux)
              perf 97164 [001] 38287.696765:          9      cycles:  ffffffffb6070174 native_write_msr+0x4 (vmlinux)
              perf 97164 [001] 38287.696767:        212      cycles:  ffffffffb6070176 native_write_msr+0x6 (vmlinux)
              perf 97164 [001] 38287.696768:       5130      cycles:  ffffffffb6070176 native_write_msr+0x6 (vmlinux)
              perf 97164 [001] 38287.696770:     123063      cycles:  ffffffffb6e0011e syscall_return_via_sysret+0x38 (vmlinux)
              perf 97164 [001] 38287.696803:    2292748      cycles:  ffffffffb636c82d __fput+0xad (vmlinux)
           swapper     0 [001] 38287.702852:    1927474      cycles:  ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux)
            :97513 97513 [001] 38287.767207:    1172536      cycles:  ffffffffb612ff65 newidle_balance+0x5 (vmlinux)
           swapper     0 [001] 38287.769567:    1073081      cycles:  ffffffffb618216d ktime_get_mono_fast_ns+0xd (vmlinux)
            :97533 97533 [001] 38287.770962:     984460      cycles:  ffffffffb65b2900 selinux_socket_sendmsg+0x0 (vmlinux)
            :97540 97540 [001] 38287.772242:     883462      cycles:  ffffffffb6d0bf59 irqentry_exit_to_user_mode+0x9 (vmlinux)
           swapper     0 [001] 38287.773633:     741963      cycles:  ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux)
            :97552 97552 [001] 38287.774539:     606680      cycles:  ffffffffb62eda0a page_add_file_rmap+0x7a (vmlinux)
            :97556 97556 [001] 38287.775333:     502254      cycles:  ffffffffb634f964 get_obj_cgroup_from_current+0xc4 (vmlinux)
            :97561 97561 [001] 38287.776163:     427891      cycles:  ffffffffb61b1522 cgroup_rstat_updated+0x22 (vmlinux)
           swapper     0 [001] 38287.776854:     359030      cycles:  ffffffffb612fc5e load_balance+0x9ce (vmlinux)
            :97567 97567 [001] 38287.777312:     330371      cycles:  ffffffffb6a8d8d0 skb_set_owner_w+0x0 (vmlinux)
            :97566 97566 [001] 38287.777589:     311622      cycles:  ffffffffb614a7a8 native_queued_spin_lock_slowpath+0x148 (vmlinux)
            :97512 97512 [001] 38287.777671:     307851      cycles:  ffffffffb62e0f35 find_vma+0x55 (vmlinux)
  #
  # perf record --off-cpu --cpu 4 perf bench sched messaging -l 1000
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 10 groups == 400 processes run

       Total time: 1.613 [sec]
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 1.415 MB perf.data (6729 samples) ]
  # perf script | head -20
              perf 97650 [004] 38323.728036:          1      cycles:  ffffffffb6070174 native_write_msr+0x4 (vmlinux)
              perf 97650 [004] 38323.728040:          1      cycles:  ffffffffb6070174 native_write_msr+0x4 (vmlinux)
              perf 97650 [004] 38323.728041:          9      cycles:  ffffffffb6070174 native_write_msr+0x4 (vmlinux)
              perf 97650 [004] 38323.728042:        208      cycles:  ffffffffb6070176 native_write_msr+0x6 (vmlinux)
              perf 97650 [004] 38323.728044:       5026      cycles:  ffffffffb6070176 native_write_msr+0x6 (vmlinux)
              perf 97650 [004] 38323.728046:     119970      cycles:  ffffffffb6d0bebc syscall_exit_to_user_mode+0x1c (vmlinux)
              perf 97650 [004] 38323.728078:    2190103      cycles:            54b756 perf_tool__process_synth_event+0x16 (/home/acme/bin/perf)
           swapper     0 [004] 38323.783357:    1593139      cycles:  ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux)
           swapper     0 [004] 38323.785352:    1593139      cycles:  ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux)
           swapper     0 [004] 38323.797330:    1418936      cycles:  ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux)
           swapper     0 [004] 38323.802350:    1418936      cycles:  ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux)
           swapper     0 [004] 38323.806333:    1418936      cycles:  ffffffffb6761378 mwait_idle_with_hints.constprop.0+0x48 (vmlinux)
            :97996 97996 [004] 38323.807145:    1418936      cycles:      7f5db9be6917 [unknown] ([unknown])
            :97959 97959 [004] 38323.807730:    1445074      cycles:  ffffffffb6329d36 memcg_slab_post_alloc_hook+0x146 (vmlinux)
            :97959 97959 [004] 38323.808103:    1341584      cycles:  ffffffffb62fd90f get_page_from_freelist+0x112f (vmlinux)
            :97959 97959 [004] 38323.808451:    1227537      cycles:  ffffffffb65b2905 selinux_socket_sendmsg+0x5 (vmlinux)
            :97959 97959 [004] 38323.808768:    1184321      cycles:  ffffffffb6d1ba35 _raw_spin_lock_irqsave+0x15 (vmlinux)
            :97959 97959 [004] 38323.809073:    1153017      cycles:  ffffffffb6a8d92d skb_set_owner_w+0x5d (vmlinux)
            :97959 97959 [004] 38323.809402:    1126875      cycles:  ffffffffb6329c64 memcg_slab_post_alloc_hook+0x74 (vmlinux)
            :97959 97959 [004] 38323.809695:    1073248      cycles:  ffffffffb6e0001d entry_SYSCALL_64+0x1d (vmlinux)
  #

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220518224725.742882-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 12:36:57 -03:00
Namhyung Kim edc41a1099 perf record: Enable off-cpu analysis with BPF
Add --off-cpu option to enable the off-cpu profiling with BPF.  It'd
use a bpf_output event and rename it to "offcpu-time".  Samples will
be synthesized at the end of the record session using data from a BPF
map which contains the aggregated off-cpu time at context switches.
So it needs root privilege to get the off-cpu profiling.

Each sample will have a separate user stacktrace so it will skip
kernel threads.  The sample ip will be set from the stacktrace and
other sample data will be updated accordingly.  Currently it only
handles some basic sample types.

The sample timestamp is set to a dummy value just not to bother with
other events during the sorting.  So it has a very big initial value
and increase it on processing each samples.

Good thing is that it can be used together with regular profiling like
cpu cycles.  If you don't want to that, you can use a dummy event to
enable off-cpu profiling only.

Example output:
  $ sudo perf record --off-cpu perf bench sched messaging -l 1000

  $ sudo perf report --stdio --call-graph=no
  # Total Lost Samples: 0
  #
  # Samples: 41K of event 'cycles'
  # Event count (approx.): 42137343851
  ...

  # Samples: 1K of event 'offcpu-time'
  # Event count (approx.): 587990831640
  #
  # Children      Self  Command          Shared Object       Symbol
  # ........  ........  ...............  ..................  .........................
  #
      81.66%     0.00%  sched-messaging  libc-2.33.so        [.] __libc_start_main
      81.66%     0.00%  sched-messaging  perf                [.] cmd_bench
      81.66%     0.00%  sched-messaging  perf                [.] main
      81.66%     0.00%  sched-messaging  perf                [.] run_builtin
      81.43%     0.00%  sched-messaging  perf                [.] bench_sched_messaging
      40.86%    40.86%  sched-messaging  libpthread-2.33.so  [.] __read
      37.66%    37.66%  sched-messaging  libpthread-2.33.so  [.] __write
       2.91%     2.91%  sched-messaging  libc-2.33.so        [.] __poll
  ...

As you can see it spent most of off-cpu time in read and write in
bench_sched_messaging().  The --call-graph=no was added just to make
the output concise here.

It uses perf hooks facility to control BPF program during the record
session rather than adding new BPF/off-cpu specific calls.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220518224725.742882-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 12:36:57 -03:00
Namhyung Kim 303ead45c4 perf report: Do not extend sample type of bpf-output event
Currently evsel__new_idx() sets more sample_type bits when it finds a
BPF-output event.  But it should honor what's recorded in the perf
data file rather than blindly sets the bits.  Otherwise it could lead
to a parse error when it recorded with a modified sample_type.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220518224725.742882-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 12:36:57 -03:00
Adrian Hunter d3345fecf9 perf stat: Add requires_cpu flag for uncore
Uncore events require a CPU i.e. it cannot be -1.

The evsel system_wide flag is intended for events that should be on every
CPU, which does not make sense for uncore events because uncore events do
not map one-to-one with CPUs.

These 2 requirements are not exactly the same, so introduce a new flag
'requires_cpu' for the uncore case.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20220524075436.29144-13-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 12:36:57 -03:00
Adrian Hunter 7be1fedd2a perf tools: Allow all_cpus to be a superset of user_requested_cpus
To support collection of system-wide events with user requested CPUs,
all_cpus must be a superset of user_requested_cpus.

In order to support all_cpus to be a superset of user_requested_cpus,
all_cpus must be used instead of user_requested_cpus when dealing with CPUs
of all events instead of CPUs of requested events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20220524075436.29144-10-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 12:36:57 -03:00
Adrian Hunter 126d68fdca perf evlist: Add evlist__add_dummy_on_all_cpus()
Add evlist__add_dummy_on_all_cpus() to enable creating a system-wide dummy
event that sets up the system-wide maps before map propagation.

For convenience, add evlist__add_aux_dummy() so that the logic can be used
whether or not the event needs to be system-wide.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20220524075436.29144-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 12:36:57 -03:00
Adrian Hunter 8294489914 perf evlist: Factor out evlist__dummy_event()
Factor out evlist__dummy_event() so it can be reused.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20220524075436.29144-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 12:36:57 -03:00
Adrian Hunter 84bd5aba88 perf auxtrace: Remove auxtrace_mmap_params__set_idx() per_cpu parameter
Remove auxtrace_mmap_params__set_idx() per_cpu parameter because it isn't
needed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20220524075436.29144-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 12:36:57 -03:00
Adrian Hunter d01508f2df perf auxtrace: Add mmap_needed to auxtrace_mmap_params
Add mmap_needed to auxtrace_mmap_params.

Currently an auxtrace mmap is always attempted even if the event is not an
auxtrace event. That works because, when AUX area tracing, there is always
an auxtrace event first for every mmap. Prepare for that not being the
case, which it won't be when sideband tracking events are allowed on
all CPUs even when auxtrace is limited to selected CPUs.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20220524075436.29144-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 12:36:56 -03:00
Arnaldo Carvalho de Melo df76e00383 perf build: Stop using __weak bpf_map_create() to handle older libbpf versions
By adding a feature test for bpf_map_create() and providing a fallback if
it isn't present in older versions of libbpf.

This also fixes the build with torvalds/master at this point:

  $ git log --oneline -5 torvalds/master
  babf0bb978 (torvalds/master) Merge tag 'xfs-5.19-for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
  e375780b63 Merge tag 'fsnotify_for_v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
  8b728edc5b Merge tag 'fs_for_v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
  3f306ea2e1 Merge tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping
  fbe86daca0 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
  $

Coping with:

  $ git log --oneline -2 d16495a982
  d16495a982 libbpf: remove bpf_create_map*() APIs
  e2371b1632 libbpf: start 1.0 development cycle
  $

As the __weak function fails to build as it calls the now removed
bpf_create_map() API.

Testing:

  $ rpm -q libbpf-devel
  libbpf-devel-0.4.0-2.fc35.x86_64
  $
  $ make -C tools/perf BUILD_BPF_SKEL=1 LIBBPF_DYNAMIC=1 O=/tmp/build/perf install-bin
  $ cat /tmp/build/perf/feature/test-libbpf-bpf_map_create.make.output
  test-libbpf-bpf_map_create.c: In function ‘main’:
  test-libbpf-bpf_map_create.c:6:16: error: implicit declaration of function ‘bpf_map_create’; did you mean ‘bpf_map_freeze’? [-Werror=implicit-function-declaration]
      6 |         return bpf_map_create(0 /* map_type */, NULL /* map_name */, 0, /* key_size */,
        |                ^~~~~~~~~~~~~~
        |                bpf_map_freeze
  test-libbpf-bpf_map_create.c:6:87: error: expected expression before ‘,’ token
      6 |         return bpf_map_create(0 /* map_type */, NULL /* map_name */, 0, /* key_size */,
        |                                                                                       ^
  cc1: all warnings being treated as errors
  $
  $ objdump -dS /tmp/build/perf/perf | grep '<bpf_map_create>:' -A20
  000000000058b290 <bpf_map_create>:
  {
    58b290:	55                   	push   %rbp
    58b291:	48 89 e5             	mov    %rsp,%rbp
    58b294:	48 83 ec 10          	sub    $0x10,%rsp
    58b298:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
    58b29f:	00 00
    58b2a1:	48 89 45 f8          	mov    %rax,-0x8(%rbp)
    58b2a5:	31 c0                	xor    %eax,%eax
  	return bpf_create_map(map_type, key_size, value_size, max_entries, 0);
    58b2a7:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
    58b2ab:	64 48 2b 04 25 28 00 	sub    %fs:0x28,%rax
    58b2b2:	00 00
    58b2b4:	75 10                	jne    58b2c6 <bpf_map_create+0x36>
  }
    58b2b6:	c9                   	leave
    58b2b7:	89 d6                	mov    %edx,%esi
    58b2b9:	89 ca                	mov    %ecx,%edx
    58b2bb:	44 89 c1             	mov    %r8d,%ecx
  	return bpf_create_map(map_type, key_size, value_size, max_entries, 0);
    58b2be:	45 31 c0             	xor    %r8d,%r8d
  $

Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/linux-perf-users/Yo+XvQNKL4K5khl2@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 12:36:56 -03:00
Jiri Olsa 982be47751 perf build: Stop using __weak btf__raw_data() to handle older libbpf versions
By adding a feature test for btf__raw_data() and providing a fallback if
it isn't present in older versions of libbpf.

Committer testing:

  $ rpm -q libbpf-devel
  libbpf-devel-0.4.0-2.fc35.x86_64
  $ make -C tools/perf LIBBPF_DYNAMIC=1 O=/tmp/build/perf install-bin
  $ cat /tmp/build/perf/feature/test-libbpf-btf__raw_data.make.output
  test-libbpf-btf__raw_data.c: In function ‘main’:
  test-libbpf-btf__raw_data.c:6:9: error: implicit declaration of function ‘btf__raw_data’; did you mean ‘btf__get_raw_data’? [-Werror=implicit-function-declaration]
      6 |         btf__raw_data(NULL /* btf_ro */, NULL /* size */);
        |         ^~~~~~~~~~~~~
        |         btf__get_raw_data
  cc1: all warnings being treated as errors
  $ objdump -dS /tmp/build/perf/perf | grep '<btf__raw_data>:' -A20
  00000000005b3050 <btf__raw_data>:
  {
    5b3050:	55                   	push   %rbp
    5b3051:	48 89 e5             	mov    %rsp,%rbp
    5b3054:	48 83 ec 10          	sub    $0x10,%rsp
    5b3058:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
    5b305f:	00 00
    5b3061:	48 89 45 f8          	mov    %rax,-0x8(%rbp)
    5b3065:	31 c0                	xor    %eax,%eax
	  return btf__get_raw_data(btf_ro, size);
    5b3067:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
    5b306b:	64 48 2b 04 25 28 00 	sub    %fs:0x28,%rax
    5b3072:	00 00
    5b3074:	75 06                	jne    5b307c <btf__raw_data+0x2c>
  }
    5b3076:	c9                   	leave
	  return btf__get_raw_data(btf_ro, size);
    5b3077:	e9 14 99 e5 ff       	jmp    40c990 <btf__get_raw_data@plt>
    5b307c:	e8 af a7 e5 ff       	call   40d830 <__stack_chk_fail@plt>
    5b3081:	66 66 2e 0f 1f 84 00 	data16 cs nopw 0x0(%rax,%rax,1)
    5b3088:	00 00 00 00
    $

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/linux-perf-users/YozLKby7ITEtchC9@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 11:02:02 -03:00
Jiri Olsa 739c9180cf perf build: Stop using __weak bpf_object__next_map() to handle older libbpf versions
By adding a feature test for bpf_object__next_map() and providing a fallback if
it isn't present in older versions of libbpf.

Committer testing:

  $ rpm -q libbpf-devel
  libbpf-devel-0.4.0-2.fc35.x86_64
  $ make -C tools/perf LIBBPF_DYNAMIC=1 O=/tmp/build/perf install-bin
  $ cat /tmp/build/perf/feature/test-libbpf-bpf_object__next_map.make.output
  test-libbpf-bpf_object__next_map.c: In function ‘main’:
  test-libbpf-bpf_object__next_map.c:6:9: error: implicit declaration of function ‘bpf_object__next_map’; did you mean ‘bpf_object__next’? [-Werror=implicit-function-declaration]
      6 |         bpf_object__next_map(NULL /* obj */, NULL /* prev */);
        |         ^~~~~~~~~~~~~~~~~~~~
        |         bpf_object__next
    cc1: all warnings being treated as errors
  $
  $ objdump -dS /tmp/build/perf/perf | grep '<bpf_object__next_map>:' -A20
  00000000005b2e00 <bpf_object__next_map>:
  {
    5b2e00:	55                   	push   %rbp
    5b2e01:	48 89 e5             	mov    %rsp,%rbp
    5b2e04:	48 83 ec 10          	sub    $0x10,%rsp
    5b2e08:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
    5b2e0f:	00 00
    5b2e11:	48 89 45 f8          	mov    %rax,-0x8(%rbp)
    5b2e15:	31 c0                	xor    %eax,%eax
	  return bpf_map__next(prev, obj);
    5b2e17:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
    5b2e1b:	64 48 2b 04 25 28 00 	sub    %fs:0x28,%rax
    5b2e22:	00 00
    5b2e24:	75 0f                	jne    5b2e35 <bpf_object__next_map+0x35>
  }
    5b2e26:	c9                   	leave
    5b2e27:	49 89 f8             	mov    %rdi,%r8
    5b2e2a:	48 89 f7             	mov    %rsi,%rdi
	  return bpf_map__next(prev, obj);
    5b2e2d:	4c 89 c6             	mov    %r8,%rsi
    5b2e30:	e9 cb b1 e5 ff       	jmp    40e000 <bpf_map__next@plt>
  $

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/linux-perf-users/YozLKby7ITEtchC9@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 11:02:02 -03:00
Jiri Olsa 8916d72554 perf build: Stop using __weak bpf_object__next_program() to handle older libbpf versions
By adding a feature test for bpf_object__next_program() and providing a fallback if
it isn't present in older versions of libbpf.

Committer testing:

  $ rpm -q libbpf-devel
  libbpf-devel-0.4.0-2.fc35.x86_64
  $ make -C tools/perf LIBBPF_DYNAMIC=1 O=/tmp/build/perf install-bin
  $ cat /tmp/build/perf/feature/test-libbpf-bpf_object__next_program.make.output
  test-libbpf-bpf_object__next_program.c: In function ‘main’:
  test-libbpf-bpf_object__next_program.c:6:9: error: implicit declaration of function ‘bpf_object__next_program’; did you mean ‘bpf_object__unpin_programs’? [-Werror=implicit-function-declaration]
      6 |         bpf_object__next_program(NULL /* obj */, NULL /* prev */);
        |         ^~~~~~~~~~~~~~~~~~~~~~~~
        |         bpf_object__unpin_programs
  cc1: all warnings being treated as errors
  $
  $ objdump -dS /tmp/build/perf/perf | grep '<bpf_object__next_program>:' -A20
  00000000005b2dc0 <bpf_object__next_program>:
  {
    5b2dc0:	55                   	push   %rbp
    5b2dc1:	48 89 e5             	mov    %rsp,%rbp
    5b2dc4:	48 83 ec 10          	sub    $0x10,%rsp
    5b2dc8:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
    5b2dcf:	00 00
    5b2dd1:	48 89 45 f8          	mov    %rax,-0x8(%rbp)
    5b2dd5:	31 c0                	xor    %eax,%eax
	  return bpf_program__next(prev, obj);
    5b2dd7:	48 8b 45 f8          	mov    -0x8(%rbp),%rax
    5b2ddb:	64 48 2b 04 25 28 00 	sub    %fs:0x28,%rax
    5b2de2:	00 00
    5b2de4:	75 0f                	jne    5b2df5 <bpf_object__next_program+0x35>
  }
    5b2de6:	c9                   	leave
    5b2de7:	49 89 f8             	mov    %rdi,%r8
    5b2dea:	48 89 f7             	mov    %rsi,%rdi
	  return bpf_program__next(prev, obj);
    5b2ded:	4c 89 c6             	mov    %r8,%rsi
    5b2df0:	e9 3b b4 e5 ff       	jmp    40e230 <bpf_program__next@plt>
    $

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/linux-perf-users/YozLKby7ITEtchC9@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 11:02:02 -03:00
Jiri Olsa 5c83eff381 perf build: Stop using __weak bpf_prog_load() to handle older libbpf versions
By adding a feature test for bpf_prog_load() and providing a fallback if
it isn't present in older versions of libbpf.

Committer testing:

  $ rpm -q libbpf-devel
  libbpf-devel-0.4.0-2.fc35.x86_64
  $ make -C tools/perf LIBBPF_DYNAMIC=1 O=/tmp/build/perf install-bin
  $ cat /tmp/build/perf/feature/test-libbpf-bpf_prog_load.make.output
  test-libbpf-bpf_prog_load.c: In function ‘main’:
  test-libbpf-bpf_prog_load.c:6:16: error: implicit declaration of function ‘bpf_prog_load’ [-Werror=implicit-function-declaration]
      6 |         return bpf_prog_load(0 /* prog_type */, NULL /* prog_name */,
        |                ^~~~~~~~~~~~~
  cc1: all warnings being treated as errors
  $

  $ objdump -dS /tmp/build/perf/perf | grep '<bpf_prog_load>:' -A20
  00000000005b2d70 <bpf_prog_load>:
  {
    5b2d70:	55                   	push   %rbp
    5b2d71:	48 89 ce             	mov    %rcx,%rsi
    5b2d74:	4c 89 c8             	mov    %r9,%rax
    5b2d77:	49 89 d2             	mov    %rdx,%r10
    5b2d7a:	4c 89 c2             	mov    %r8,%rdx
    5b2d7d:	48 89 e5             	mov    %rsp,%rbp
    5b2d80:	48 83 ec 18          	sub    $0x18,%rsp
    5b2d84:	64 48 8b 0c 25 28 00 	mov    %fs:0x28,%rcx
    5b2d8b:	00 00
    5b2d8d:	48 89 4d f8          	mov    %rcx,-0x8(%rbp)
    5b2d91:	31 c9                	xor    %ecx,%ecx
  	return bpf_load_program(prog_type, insns, insn_cnt, license,
    5b2d93:	41 8b 49 5c          	mov    0x5c(%r9),%ecx
    5b2d97:	51                   	push   %rcx
    5b2d98:	4d 8b 49 60          	mov    0x60(%r9),%r9
    5b2d9c:	4c 89 d1             	mov    %r10,%rcx
    5b2d9f:	44 8b 40 1c          	mov    0x1c(%rax),%r8d
    5b2da3:	e8 f8 aa e5 ff       	call   40d8a0 <bpf_load_program@plt>
  }
  $

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/linux-perf-users/YozLKby7ITEtchC9@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-26 11:02:02 -03:00
Adrian Hunter 5d2b6bc3a6 perf intel-pt: Add guest_code support
A common case for KVM test programs is that the test program acts as the
hypervisor, creating, running and destroying the virtual machine, and
providing the guest object code from its own object code. In this case,
the VM is not running an OS, but only the functions loaded into it by the
hypervisor test program, and conveniently, loaded at the same virtual
addresses.

To support that, a new option "--guest-code" has been added in
previous patches.

In this patch, add support also to Intel PT.

In particular, ensure guest_code thread is set up before attempting to
walk object code or synthesize samples.

Example:

 # perf record --kcore -e intel_pt/cyc/ -- tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.280 MB perf.data ]
 # perf script --guest-code --itrace=bep --ns -F-period,+addr,+flags
 [SNIP]
   tsc_msrs_test 18436 [007] 10897.962087733:      branches:   call                   ffffffffc13b2ff5 __vmx_vcpu_run+0x15 (vmlinux) => ffffffffc13b2f50 vmx_update_host_rsp+0x0 (vmlinux)
   tsc_msrs_test 18436 [007] 10897.962087733:      branches:   return                 ffffffffc13b2f5d vmx_update_host_rsp+0xd (vmlinux) => ffffffffc13b2ffa __vmx_vcpu_run+0x1a (vmlinux)
   tsc_msrs_test 18436 [007] 10897.962087733:      branches:   call                   ffffffffc13b303b __vmx_vcpu_run+0x5b (vmlinux) => ffffffffc13b2f80 vmx_vmenter+0x0 (vmlinux)
   tsc_msrs_test 18436 [007] 10897.962087836:      branches:   vmentry                ffffffffc13b2f82 vmx_vmenter+0x2 (vmlinux) =>                0 [unknown] ([unknown])
   [guest/18436] 18436 [007] 10897.962087836:      branches:   vmentry                               0 [unknown] ([unknown]) =>           402c81 guest_code+0x131 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test)
   [guest/18436] 18436 [007] 10897.962087836:      branches:   call                             402c81 guest_code+0x131 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) =>           40dba0 ucall+0x0 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test)
   [guest/18436] 18436 [007] 10897.962088248:      branches:   vmexit                           40dba0 ucall+0x0 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) =>                0 [unknown] ([unknown])
   tsc_msrs_test 18436 [007] 10897.962088248:      branches:   vmexit                                0 [unknown] ([unknown]) => ffffffffc13b2fa0 vmx_vmexit+0x0 (vmlinux)
   tsc_msrs_test 18436 [007] 10897.962088248:      branches:   jmp                    ffffffffc13b2fa0 vmx_vmexit+0x0 (vmlinux) => ffffffffc13b2fd2 vmx_vmexit+0x32 (vmlinux)
   tsc_msrs_test 18436 [007] 10897.962088256:      branches:   return                 ffffffffc13b2fd2 vmx_vmexit+0x32 (vmlinux) => ffffffffc13b3040 __vmx_vcpu_run+0x60 (vmlinux)
   tsc_msrs_test 18436 [007] 10897.962088270:      branches:   return                 ffffffffc13b30b6 __vmx_vcpu_run+0xd6 (vmlinux) => ffffffffc13b2f2e vmx_vcpu_enter_exit+0x4e (vmlinux)
 [SNIP]
   tsc_msrs_test 18436 [007] 10897.962089321:      branches:   call                   ffffffffc13b2ff5 __vmx_vcpu_run+0x15 (vmlinux) => ffffffffc13b2f50 vmx_update_host_rsp+0x0 (vmlinux)
   tsc_msrs_test 18436 [007] 10897.962089321:      branches:   return                 ffffffffc13b2f5d vmx_update_host_rsp+0xd (vmlinux) => ffffffffc13b2ffa __vmx_vcpu_run+0x1a (vmlinux)
   tsc_msrs_test 18436 [007] 10897.962089321:      branches:   call                   ffffffffc13b303b __vmx_vcpu_run+0x5b (vmlinux) => ffffffffc13b2f80 vmx_vmenter+0x0 (vmlinux)
   tsc_msrs_test 18436 [007] 10897.962089424:      branches:   vmentry                ffffffffc13b2f82 vmx_vmenter+0x2 (vmlinux) =>                0 [unknown] ([unknown])
   [guest/18436] 18436 [007] 10897.962089424:      branches:   vmentry                               0 [unknown] ([unknown]) =>           40dba0 ucall+0x0 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test)
   [guest/18436] 18436 [007] 10897.962089701:      branches:   jmp                              40dc1b ucall+0x7b (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) =>           40dc39 ucall+0x99 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test)
   [guest/18436] 18436 [007] 10897.962089701:      branches:   jcc                              40dc3c ucall+0x9c (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) =>           40dc20 ucall+0x80 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test)
   [guest/18436] 18436 [007] 10897.962089701:      branches:   jcc                              40dc3c ucall+0x9c (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) =>           40dc20 ucall+0x80 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test)
   [guest/18436] 18436 [007] 10897.962089701:      branches:   jcc                              40dc37 ucall+0x97 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) =>           40dc50 ucall+0xb0 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test)
   [guest/18436] 18436 [007] 10897.962089878:      branches:   vmexit                           40dc55 ucall+0xb5 (/home/ahunter/git/work/tools/testing/selftests/kselftest_install/kvm/tsc_msrs_test) =>                0 [unknown] ([unknown])
   tsc_msrs_test 18436 [007] 10897.962089878:      branches:   vmexit                                0 [unknown] ([unknown]) => ffffffffc13b2fa0 vmx_vmexit+0x0 (vmlinux)
   tsc_msrs_test 18436 [007] 10897.962089878:      branches:   jmp                    ffffffffc13b2fa0 vmx_vmexit+0x0 (vmlinux) => ffffffffc13b2fd2 vmx_vmexit+0x32 (vmlinux)
   tsc_msrs_test 18436 [007] 10897.962089887:      branches:   return                 ffffffffc13b2fd2 vmx_vmexit+0x32 (vmlinux) => ffffffffc13b3040 __vmx_vcpu_run+0x60 (vmlinux)
   tsc_msrs_test 18436 [007] 10897.962089901:      branches:   return                 ffffffffc13b30b6 __vmx_vcpu_run+0xd6 (vmlinux) => ffffffffc13b2f2e vmx_vcpu_enter_exit+0x4e (vmlinux)
 [SNIP]

 # perf kvm --guest-code --guest --host report -i perf.data --stdio | head -20

 # To display the perf.data header info, please use --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 12  of event 'instructions'
 # Event count (approx.): 2274583
 #
 # Children      Self  Command        Shared Object         Symbol
 # ........  ........  .............  ....................  ...........................................
 #
    54.70%     0.00%  tsc_msrs_test  [kernel.vmlinux]      [k] entry_SYSCALL_64_after_hwframe
            |
            ---entry_SYSCALL_64_after_hwframe
               do_syscall_64
               |
               |--29.44%--syscall_exit_to_user_mode
               |          exit_to_user_mode_prepare
               |          task_work_run
               |          __fput

For more information about Perf tools support for Intel® Processor Trace
refer:

  https://perf.wiki.kernel.org/index.php/Perf_tools_support_for_Intel%C2%AE_Processor_Trace

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/20220517131011.6117-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 10:19:24 -03:00
Adrian Hunter 096fc36180 perf tools: Add guest_code support
A common case for KVM test programs is that the test program acts as the
hypervisor, creating, running and destroying the virtual machine, and
providing the guest object code from its own object code. In this case,
the VM is not running an OS, but only the functions loaded into it by the
hypervisor test program, and conveniently, loaded at the same virtual
addresses.

Normally to resolve addresses, MMAP events are needed to map addresses
back to the object code and debug symbols for that object code.

Currently, there is no way to get such mapping information from guests
but, in the scenario described above, the guest has the same mappings
as the hypervisor, so support for that scenario can be achieved.

To support that, copy the host thread's maps to the guest thread's maps.
Note, we do not discover the guest until we encounter a guest event,
which works well because it is not until then that we know that the host
thread's maps have been set up.

Typically the main function for the guest object code is called
"guest_code", hence the name chosen for this feature. Note, that is just a
convention, the function could be named anything, and the tools do not
care.

This is primarily aimed at supporting Intel PT, or similar, where trace
data can be recorded for a guest. Refer to the final patch in this series
"perf intel-pt: Add guest_code support" for an example.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/20220517131011.6117-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 10:18:38 -03:00
Adrian Hunter c98e064d54 perf tools: Factor out thread__set_guest_comm()
Factor out thread__set_guest_comm() so it can be reused.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/20220517131011.6117-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 10:18:27 -03:00
Adrian Hunter a088031c49 perf tools: Add machine to machines back pointer
When dealing with guest machines, it can be necessary to get a reference
to the host machine. Add a machines pointer to struct machine to make that
possible.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/20220517131011.6117-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 10:18:06 -03:00
Adrian Hunter a4455e0053 perf data: Add has_kcore_dir()
Add a helper function has_kcore_dir(), so that perf inject can determine if
it needs to keep the kcore_dir.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220520132404.25853-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 10:11:39 -03:00
Adrian Hunter 180b3d0626 perf inject: Keep some features sections from input file
perf inject overwrites feature sections with information from the current
machine. It makes more sense to keep original information that describes
the machine or software when perf record was run.

Example: perf.data from "Desktop" injected on "nuc11"

 Before:

  $ perf script --header-only -i perf.data-from-desktop | head -15
  # ========
  # captured on    : Thu May 19 09:55:50 2022
  # header version : 1
  # data offset    : 1208
  # data size      : 837480
  # feat offset    : 838688
  # hostname : Desktop
  # os release : 5.13.0-41-generic
  # perf version : 5.18.rc5.gac837f7ca7ed
  # arch : x86_64
  # nrcpus online : 28
  # nrcpus avail : 28
  # cpudesc : Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz
  # cpuid : GenuineIntel,6,85,4
  # total memory : 65548656 kB

  $ perf inject -i perf.data-from-desktop -o injected-perf.data

  $ perf script --header-only -i injected-perf.data | head -15
  # ========
  # captured on    : Fri May 20 15:06:55 2022
  # header version : 1
  # data offset    : 1208
  # data size      : 837480
  # feat offset    : 838688
  # hostname : nuc11
  # os release : 5.17.5-local
  # perf version : 5.18.rc5.g0f828fdeb9af
  # arch : x86_64
  # nrcpus online : 8
  # nrcpus avail : 8
  # cpudesc : 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  # cpuid : GenuineIntel,6,140,1
  # total memory : 16012124 kB

 After:

  $ perf inject -i perf.data-from-desktop -o injected-perf.data

  $ perf script --header-only -i injected-perf.data | head -15
  # ========
  # captured on    : Fri May 20 15:08:54 2022
  # header version : 1
  # data offset    : 1208
  # data size      : 837480
  # feat offset    : 838688
  # hostname : Desktop
  # os release : 5.13.0-41-generic
  # perf version : 5.18.rc5.gac837f7ca7ed
  # arch : x86_64
  # nrcpus online : 28
  # nrcpus avail : 28
  # cpudesc : Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz
  # cpuid : GenuineIntel,6,85,4
  # total memory : 65548656 kB

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220520132404.25853-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 10:11:25 -03:00
Adrian Hunter 237c96b8c1 perf header: Add ability to keep feature sections
Many feature sections should not be re-written during perf inject. In
preparation to support that, add callbacks that a tool can use to copy
a feature section from elsewhere. perf inject will use this facility to
copy features sections from the input file.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220520132404.25853-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 10:11:01 -03:00
Ian Rogers 0b9462d0ac perf stat: Make use of index clearer with perf_counts
Try to disambiguate further when perf_counts is being accessed it is
with a cpu map index rather than a CPU.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Lv Ruyi <lv.ruyi@zte.com.cn>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220519032005.1273691-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 09:54:02 -03:00
Ian Rogers 54668a4ea0 perf bpf_counter: Tidy use of CPU map index
BPF counters are typically running across all CPUs and so the CPU map
index and CPU number are the same. There may be cases with offline CPUs
where this isn't the case and so ensure the cpu map index for
perf_counts is going to be a valid index by explicitly iterating over
the CPU map. This also makes it clearer that users of perf_counts are
using an index. Collapse some multiple uses of perf_counts into single
uses.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Lv Ruyi <lv.ruyi@zte.com.cn>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220519032005.1273691-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 09:53:06 -03:00
Leo Yan 9845063710 perf mem: Add stats for store operation with no available memory level
Sometimes we don't know memory store operations happen on exactly which
memory (or cache) level, the memory level flag is set to PERF_MEM_LVL_NA
in this case; a practical example is Arm SPE AUX trace sets this flag
for all store operations due to absent info for cache level.

This patch is to add a new item "st_na" in structure c2c_stats to add
statistics for store operations with no available cache level.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adam Li <adamli@amperemail.onmicrosoft.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Alyssa Ross <hi@alyssa.is>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Like Xu <likexu@tencent.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220518055729.1869566-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 09:36:12 -03:00
Arnaldo Carvalho de Melo 0869331fba Merge remote-tracking branch 'torvalds/master' into perf/core
To get the rest of 5.18.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 09:32:49 -03:00
Chengdong Li 51d0bf99b8 perf session: Fix Intel LBR callstack entries and nr print message
When generating callstack information from branch_stack(Intel LBR), the
actual number of callstack entry should be bigger than the number of
branch_stack, for example:

	branch_stack records:
		B() -> C()
		A() -> B()
	converted callstack records should be:
		C()
		B()
		A()
though, the number of callstack equals
to the number of branch stack plus 1.

This patch fixes above issue in branch_stack__printf(). For example,

	# echo 'scale=2000; 4*a(1)' > cmd
	# perf record --call-graph lbr bc -l < cmd

Before applying this patch, `perf script -D` output:

	1220022677386876 0x2a40 [0xd8]: PERF_RECORD_SAMPLE(IP, 0x4002): 17990/17990: 0x40a6d6 period: 894172 addr: 0
	... LBR call chain: nr:8
	.....  0: fffffffffffffe00
	.....  1: 000000000040a410
	.....  2: 000000000040573c
	.....  3: 0000000000408650
	.....  4: 00000000004022f2
	.....  5: 00000000004015f5
	.....  6: 00007f5ed6dcb553
	.....  7: 0000000000401698
	... FP chain: nr:2
	.....  0: fffffffffffffe00
	.....  1: 000000000040a6d8
	... branch callstack: nr:6    # which is not consistent with LBR records.
	.....  0: 000000000040a410
	.....  1: 0000000000408650    # ditto
	.....  2: 00000000004022f2
	.....  3: 00000000004015f5
	.....  4: 00007f5ed6dcb553
	.....  5: 0000000000401698
	 ... thread: bc:17990
	 ...... dso: /usr/bin/bc
	bc 17990 1220022.677386:     894172 cycles:
			  40a410 [unknown] (/usr/bin/bc)
			  40573c [unknown] (/usr/bin/bc)
			  408650 [unknown] (/usr/bin/bc)
			  4022f2 [unknown] (/usr/bin/bc)
			  4015f5 [unknown] (/usr/bin/bc)
		    7f5ed6dcb553 __libc_start_main+0xf3 (/usr/lib64/libc-2.17.so)
			  401698 [unknown] (/usr/bin/bc)

After applied:

	1220022677386876 0x2a40 [0xd8]: PERF_RECORD_SAMPLE(IP, 0x4002): 17990/17990: 0x40a6d6 period: 894172 addr: 0
	... LBR call chain: nr:8
	.....  0: fffffffffffffe00
	.....  1: 000000000040a410
	.....  2: 000000000040573c
	.....  3: 0000000000408650
	.....  4: 00000000004022f2
	.....  5: 00000000004015f5
	.....  6: 00007f5ed6dcb553
	.....  7: 0000000000401698
	... FP chain: nr:2
	.....  0: fffffffffffffe00
	.....  1: 000000000040a6d8
	... branch callstack: nr:7
	.....  0: 000000000040a410
	.....  1: 000000000040573c
	.....  2: 0000000000408650
	.....  3: 00000000004022f2
	.....  4: 00000000004015f5
	.....  5: 00007f5ed6dcb553
	.....  6: 0000000000401698
	 ... thread: bc:17990
	 ...... dso: /usr/bin/bc
	bc 17990 1220022.677386:     894172 cycles:
			  40a410 [unknown] (/usr/bin/bc)
			  40573c [unknown] (/usr/bin/bc)
			  408650 [unknown] (/usr/bin/bc)
			  4022f2 [unknown] (/usr/bin/bc)
			  4015f5 [unknown] (/usr/bin/bc)
		    7f5ed6dcb553 __libc_start_main+0xf3 (/usr/lib64/libc-2.17.so)
			  401698 [unknown] (/usr/bin/bc)

Change from v1:
	- refined code style according to Jiri's review comments.

Signed-off-by: Chengdong Li <chengdongli@tencent.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: likexu@tencent.com
Link: https://lore.kernel.org/r/20220517015726.96131-1-chengdongli@tencent.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-21 14:56:24 -03:00
Kan Liang e8f4f794d7 perf stat: Always keep perf metrics topdown events in a group
If any member in a group has a different cpu mask than the other
members, the current perf stat disables group. when the perf metrics
topdown events are part of the group, the below <not supported> error
will be triggered.

  $ perf stat -e "{slots,topdown-retiring,uncore_imc_free_running_0/dclk/}" -a sleep 1
  WARNING: grouped events cpus do not match, disabling group:
    anon group { slots, topdown-retiring, uncore_imc_free_running_0/dclk/ }

   Performance counter stats for 'system wide':

         141,465,174      slots
     <not supported>      topdown-retiring
       1,605,330,334      uncore_imc_free_running_0/dclk/

The perf metrics topdown events must always be grouped with a slots
event as leader.

Factor out evsel__remove_from_group() to only remove the regular events
from the group.

Remove evsel__must_be_in_group(), since no one use it anymore.

With the patch, the topdown events aren't broken from the group for the
splitting.

  $ perf stat -e "{slots,topdown-retiring,uncore_imc_free_running_0/dclk/}" -a sleep 1
  WARNING: grouped events cpus do not match, disabling group:
    anon group { slots, topdown-retiring, uncore_imc_free_running_0/dclk/ }

   Performance counter stats for 'system wide':

         346,110,588      slots
         124,608,256      topdown-retiring
       1,606,869,976      uncore_imc_free_running_0/dclk/

         1.003877592 seconds time elapsed

Fixes: a9a1790247 ("perf stat: Ensure group is defined on top of the same cpu mask")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220518143900.1493980-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-20 11:11:13 -03:00
Ian Rogers 92d579ea32 perf stat: Fix and validate CPU map inputs in synthetic PERF_RECORD_STAT events
Stat events can come from disk and so need a degree of validation. They
contain a CPU which needs looking up via CPU map to access a counter.

Add the CPU to index translation, alongside validity checking.

Discussion thread:

  https://lore.kernel.org/linux-perf-users/CAP-5=fWQR=sCuiSMktvUtcbOLidEpUJLCybVF6=BRvORcDOq+g@mail.gmail.com/

Fixes: 7ac0089d13 ("perf evsel: Pass cpu not cpu map index to synthesize")
Reported-by: Michael Petlan <mpetlan@redhat.com>
Suggested-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Lv Ruyi <lv.ruyi@zte.com.cn>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: netdev@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lore.kernel.org/lkml/20220519032005.1273691-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-20 10:54:06 -03:00
Arnaldo Carvalho de Melo 0ae065a5d2 perf build: Fix check for btf__load_from_kernel_by_id() in libbpf
Avi Kivity reported a problem where the __weak
btf__load_from_kernel_by_id() in tools/perf/util/bpf-event.c was being
used and it called btf__get_from_id() in tools/lib/bpf/btf.c that in
turn called back to btf__load_from_kernel_by_id(), resulting in an
endless loop.

Fix this by adding a feature test to check if
btf__load_from_kernel_by_id() is available when building perf with
LIBBPF_DYNAMIC=1, and if not then provide the fallback to the old
btf__get_from_id(), that doesn't call back to btf__load_from_kernel_by_id()
since at that time it didn't exist at all.

Tested on Fedora 35 where we have libbpf-devel 0.4.0 with LIBBPF_DYNAMIC
where we don't have btf__load_from_kernel_by_id() and thus its feature
test fail, not defining HAVE_LIBBPF_BTF__LOAD_FROM_KERNEL_BY_ID:

  $ cat /tmp/build/perf-urgent/feature/test-libbpf-btf__load_from_kernel_by_id.make.output
  test-libbpf-btf__load_from_kernel_by_id.c: In function ‘main’:
  test-libbpf-btf__load_from_kernel_by_id.c:6:16: error: implicit declaration of function ‘btf__load_from_kernel_by_id’ [-Werror=implicit-function-declaration]
      6 |         return btf__load_from_kernel_by_id(20151128, NULL);
        |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors
  $

  $ nm /tmp/build/perf-urgent/perf | grep btf__load_from_kernel_by_id
  00000000005ba180 T btf__load_from_kernel_by_id
  $

  $ objdump --disassemble=btf__load_from_kernel_by_id -S /tmp/build/perf-urgent/perf

  /tmp/build/perf-urgent/perf:     file format elf64-x86-64
  <SNIP>
  00000000005ba180 <btf__load_from_kernel_by_id>:
  #include "record.h"
  #include "util/synthetic-events.h"

  #ifndef HAVE_LIBBPF_BTF__LOAD_FROM_KERNEL_BY_ID
  struct btf *btf__load_from_kernel_by_id(__u32 id)
  {
    5ba180:	55                   	push   %rbp
    5ba181:	48 89 e5             	mov    %rsp,%rbp
    5ba184:	48 83 ec 10          	sub    $0x10,%rsp
    5ba188:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
    5ba18f:	00 00
    5ba191:	48 89 45 f8          	mov    %rax,-0x8(%rbp)
    5ba195:	31 c0                	xor    %eax,%eax
         struct btf *btf;
  #pragma GCC diagnostic push
  #pragma GCC diagnostic ignored "-Wdeprecated-declarations"
         int err = btf__get_from_id(id, &btf);
    5ba197:	48 8d 75 f0          	lea    -0x10(%rbp),%rsi
    5ba19b:	e8 a0 57 e5 ff       	call   40f940 <btf__get_from_id@plt>
    5ba1a0:	89 c2                	mov    %eax,%edx
  #pragma GCC diagnostic pop

         return err ? ERR_PTR(err) : btf;
    5ba1a2:	48 98                	cltq
    5ba1a4:	85 d2                	test   %edx,%edx
    5ba1a6:	48 0f 44 45 f0       	cmove  -0x10(%rbp),%rax
  }
  <SNIP>

Fixes: 218e7b775d ("perf bpf: Provide a weak btf__load_from_kernel_by_id() for older libbpf versions")
Reported-by: Avi Kivity <avi@scylladb.com>
Link: https://lore.kernel.org/linux-perf-users/f0add43b-3de5-20c5-22c4-70aff4af959f@scylladb.com
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/linux-perf-users/YobjjFOblY4Xvwo7@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-20 09:45:41 -03:00
Ian Rogers d98079c05b perf evlist: Keep topdown counters in weak group
On Intel Icelake, topdown events must always be grouped with a slots
event as leader. When a metric is parsed a weak group is formed and
retried if perf_event_open fails. The retried events aren't grouped
breaking the slots leader requirement. This change modifies the weak
group "reset" behavior so that topdown events aren't broken from the
group for the retry.

  $ perf stat -e '{slots,topdown-bad-spec,topdown-be-bound,topdown-fe-bound,topdown-retiring,branch-instructions,branch-misses,bus-cycles,cache-misses,cache-references,cpu-cycles,instructions,mem-loads,mem-stores,ref-cycles,baclears.any,ARITH.DIVIDER_ACTIVE}:W' -a sleep 1

   Performance counter stats for 'system wide':

    47,867,188,483      slots                                                         (92.27%)
   <not supported>      topdown-bad-spec
   <not supported>      topdown-be-bound
   <not supported>      topdown-fe-bound
   <not supported>      topdown-retiring
     2,173,346,937      branch-instructions                                           (92.27%)
        10,540,253      branch-misses             #    0.48% of all branches          (92.29%)
        96,291,140      bus-cycles                                                    (92.29%)
         6,214,202      cache-misses              #   20.120 % of all cache refs      (92.29%)
        30,886,082      cache-references                                              (76.91%)
    11,773,726,641      cpu-cycles                                                    (84.62%)
    11,807,585,307      instructions              #    1.00  insn per cycle           (92.31%)
                 0      mem-loads                                                     (92.32%)
     2,212,928,573      mem-stores                                                    (84.69%)
    10,024,403,118      ref-cycles                                                    (92.35%)
        16,232,978      baclears.any                                                  (92.35%)
        23,832,633      ARITH.DIVIDER_ACTIVE                                          (84.59%)

       0.981070734 seconds time elapsed

After:

  $ perf stat -e '{slots,topdown-bad-spec,topdown-be-bound,topdown-fe-bound,topdown-retiring,branch-instructions,branch-misses,bus-cycles,cache-misses,cache-references,cpu-cycles,instructions,mem-loads,mem-stores,ref-cycles,baclears.any,ARITH.DIVIDER_ACTIVE}:W' -a sleep 1

   Performance counter stats for 'system wide':

       31040189283      slots                                                         (92.27%)
        8997514811      topdown-bad-spec          #     28.2% bad speculation         (92.27%)
       10997536028      topdown-be-bound          #     34.5% backend bound           (92.27%)
        4778060526      topdown-fe-bound          #     15.0% frontend bound          (92.27%)
        7086628768      topdown-retiring          #     22.2% retiring                (92.27%)
        1417611942      branch-instructions                                           (92.26%)
           5285529      branch-misses             #    0.37% of all branches          (92.28%)
          62922469      bus-cycles                                                    (92.29%)
           1440708      cache-misses              #    8.292 % of all cache refs      (92.30%)
          17374098      cache-references                                              (76.94%)
        8040889520      cpu-cycles                                                    (84.63%)
        7709992319      instructions              #    0.96  insn per cycle           (92.32%)
                 0      mem-loads                                                     (92.32%)
        1515669558      mem-stores                                                    (84.68%)
        6542411177      ref-cycles                                                    (92.35%)
           4154149      baclears.any                                                  (92.35%)
          20556152      ARITH.DIVIDER_ACTIVE                                          (84.59%)

       1.010799593 seconds time elapsed

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220517052724.283874-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-17 12:01:18 -03:00
Adrian Hunter d7015e50a9 perf intel-pt: Add support for emulated ptwrite
ptwrite is an Intel x86 instruction that writes arbitrary values into an
Intel PT trace. It is not supported on all hardware, so provide an
alternative that makes use of TNT packets to convey the payload data.
TNT packets encode Taken/Not-taken conditional branch information, so
taking branches based on the payload value will encode the value into
the TNT packet. Refer to the changes to the documentation file
perf-intel-pt.txt in this patch for an example.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220509152400.376613-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-17 11:55:49 -03:00
Adrian Hunter 843e5ba75e perf tools: Remove unused machines__find_host()
machines__find_host() does not exist. Remove declaration.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220513084459.6581-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-13 11:11:27 -03:00
Adrian Hunter 7df319e5b3 perf auxtrace: Record whether an auxtrace mmap is needed
Add a flag needs_auxtrace_mmap to record whether an auxtrace mmap is
needed, in preparation for correctly determining whether or not an
auxtrace mmap is needed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220506122601.367589-10-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-10 14:27:19 -03:00
Adrian Hunter 8f111be643 libperf evlist: Add evsel as a parameter to ->idx()
Add evsel as a parameter to ->idx() in preparation for correctly
determining whether an auxtrace mmap is needed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220506122601.367589-9-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-10 14:26:48 -03:00
Adrian Hunter 6a7b8a5a30 libperf evlist: Remove ->idx() per_cpu parameter
Remove ->idx() per_cpu parameter because it isn't needed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220506122601.367589-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-10 14:25:57 -03:00
Adrian Hunter d205a3a665 perf auxtrace: Do not mix up mmap idx
The idx is with respect to evlist not evsel. That hasn't mattered because
they are the same at present. Prepare for that not being the case, which it
won't be when sideband tracking events are allowed on all CPUs even when
auxtrace is limited to selected CPUs.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220506122601.367589-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-10 14:24:32 -03:00
Adrian Hunter 024b3b42ad perf auxtrace: Move evlist__enable_event_idx() to auxtrace.c
evlist__enable_event_idx() is used only by auxtrace. Move it to auxtrace.c
in preparation for making it even more auxtrace specific.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220506122601.367589-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-10 14:23:34 -03:00
Adrian Hunter a40bb7518e perf evlist: Use libperf functions in evlist__enable_event_idx()
evlist__enable_event_idx() is used only for auxtrace events which are never
system_wide. Simplify by using libperf enable event functions.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220506122601.367589-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-10 14:22:47 -03:00