Build:
- Allow to generate vmlinux.h from BTF using `make GEN_VMLINUX_H=1`
and skip if the vmlinux has no BTF.
- Replace deprecated clang -target xxx option by --target=xxx.
perf record:
- Print event attributes with well known type and config symbols in the
debug output like below:
# perf record -e cycles,cpu-clock -C0 -vv true
<SNIP>
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0 (PERF_COUNT_SW_CPU_CLOCK)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
- Update AMD IBS event error message since it now support per-process
profiling but no priviledge filters.
$ sudo perf record -e ibs_op//k -C 0
Error:
AMD IBS doesn't support privilege filtering. Try again without
the privilege modifiers (like 'k') at the end.
perf lock contention:
- Support CSV style output using -x option
$ sudo perf lock con -ab -x, sleep 1
# output: contended, total wait, max wait, avg wait, type, caller
19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
8, 67608, 27404, 8451, spinlock, __queue_work+0x174
3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
1, 24619, 24619, 24619, spinlock, rcu_core+0xd4
- Add --output option to save the data to a file not to be interfered
by other debug messages.
Test:
- Fix event parsing test on ARM where there's no raw PMU nor supports
PERF_PMU_CAP_EXTENDED_HW_TYPE.
- Update the lock contention test case for CSV output.
- Fix a segfault in the daemon command test.
Vendor events (JSON):
- Add has_event() to check if the given event is available on system
at runtime. On Intel machines, some transaction events may not be
present when TSC extensions are disabled.
- Update Intel event metrics.
Misc:
- Sort symbols by name using an external array of pointers instead of
a rbtree node in the symbol. This will save 16-bytes or 24-bytes
per symbol whether the sorting is actually requested or not.
- Fix unwinding DWARF callstacks using libdw when --symfs option is
used.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZKb4mwAKCRCMstVUGiXM
g1QqAPwKZow/DhAzyN7KvzdNd+SojRGpUMl6RkVphY/9ntDqPAD+L3V5aXLTiC1L
8kUzdpRX5VMjqdR9U7TycUOi4QU40QA=
=dEF1
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next
Pull more perf tools updates from Namhyung Kim:
"These are remaining changes and fixes for this cycle.
Build:
- Allow generating vmlinux.h from BTF using `make GEN_VMLINUX_H=1`
and skip if the vmlinux has no BTF.
- Replace deprecated clang -target xxx option by --target=xxx.
perf record:
- Print event attributes with well known type and config symbols in
the debug output like below:
# perf record -e cycles,cpu-clock -C0 -vv true
<SNIP>
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0 (PERF_COUNT_SW_CPU_CLOCK)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
- Update AMD IBS event error message since it now support per-process
profiling but no priviledge filters.
$ sudo perf record -e ibs_op//k -C 0
Error:
AMD IBS doesn't support privilege filtering. Try again without
the privilege modifiers (like 'k') at the end.
perf lock contention:
- Support CSV style output using -x option
$ sudo perf lock con -ab -x, sleep 1
# output: contended, total wait, max wait, avg wait, type, caller
19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
8, 67608, 27404, 8451, spinlock, __queue_work+0x174
3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
1, 24619, 24619, 24619, spinlock, rcu_core+0xd4
- Add --output option to save the data to a file not to be interfered
by other debug messages.
Test:
- Fix event parsing test on ARM where there's no raw PMU nor supports
PERF_PMU_CAP_EXTENDED_HW_TYPE.
- Update the lock contention test case for CSV output.
- Fix a segfault in the daemon command test.
Vendor events (JSON):
- Add has_event() to check if the given event is available on system
at runtime. On Intel machines, some transaction events may not be
present when TSC extensions are disabled.
- Update Intel event metrics.
Misc:
- Sort symbols by name using an external array of pointers instead of
a rbtree node in the symbol. This will save 16-bytes or 24-bytes
per symbol whether the sorting is actually requested or not.
- Fix unwinding DWARF callstacks using libdw when --symfs option is
used"
* tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (38 commits)
perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported.
perf test: Fix event parsing test on Arm
perf evsel amd: Fix IBS error message
perf: unwind: Fix symfs with libdw
perf symbol: Fix uninitialized return value in symbols__find_by_name()
perf test: Test perf lock contention CSV output
perf lock contention: Add --output option
perf lock contention: Add -x option for CSV style output
perf lock: Remove stale comments
perf vendor events intel: Update tigerlake to 1.13
perf vendor events intel: Update skylakex to 1.31
perf vendor events intel: Update skylake to 57
perf vendor events intel: Update sapphirerapids to 1.14
perf vendor events intel: Update icelakex to 1.21
perf vendor events intel: Update icelake to 1.19
perf vendor events intel: Update cascadelakex to 1.19
perf vendor events intel: Update meteorlake to 1.03
perf vendor events intel: Add rocketlake events/metrics
perf vendor metrics intel: Make transaction metrics conditional
perf jevents: Support for has_event function
...
Arm has multiple PMU types for heterogeneous systems, but doesn't
currently support PERF_PMU_CAP_EXTENDED_HW_TYPE. Make the tests
support both scenarios so that they pass on Arm, and will still pass
once PERF_PMU_CAP_EXTENDED_HW_TYPE support is added.
Fixes: 27c9fcfc1e ("perf test: Update parse-events expectations to test for multiple events")
Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230705082653.23566-3-james.clark@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The test looks for a PMU from sysfs with type = PERF_TYPE_RAW when
opening a raw event. Arm doesn't have a real raw PMU, only core PMUs
with unique types other than raw.
Instead of looking for a matching PMU, just test that the event type
was parsed as raw and skip the PMU search on Arm. The raw event type
test should also apply to all platforms so add it outside of the ifdef.
Fixes: aefde50a44 ("perf test: Fix parse-events tests for >1 core PMU")
Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230705082653.23566-2-james.clark@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
AMD IBS can do per-process profiling[1] and is no longer restricted to
per-cpu or systemwide only. Remove stale error message. Also, checking
just exclude_kernel is not sufficient since IBS does not support any
privilege filters. So include all exclude_* checks. And finally, move
these checks under tools/perf/arch/x86/ from generic code.
Before:
$ sudo ./perf record -e ibs_op//k -C 0
Error:
AMD IBS may only be available in system-wide/per-cpu mode. Try
using -a, or -C and workload affinity
After:
$ sudo ./perf record -e ibs_op//k -C 0
Error:
AMD IBS doesn't support privilege filtering. Try again without
the privilege modifiers (like 'k') at the end.
[1] https://git.kernel.org/torvalds/c/30093056f7b2
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: ananth.narayan@amd.com
Cc: sandipan.das@amd.com
Cc: santosh.shukla@amd.com
Cc: irogers@google.com
Cc: peterz@infradead.org
Cc: adrian.hunter@intel.com
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Link: https://lore.kernel.org/r/20230630085230.437-1-ravi.bangoria@amd.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Pass the full path including the symfs (if any) to libdw. Without this
unwinding fails with errors like this when a symfs is used:
unwind: failed with 'No such file or directory'"
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: kernel@axis.com
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230630-perf-libdw-symfs-v2-1-469760dd4d5b@axis.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
found_idx and s aren't initialized, so if no symbol is found then the
assert at the end will index off the end of the array causing a
segfault. The function also doesn't return NULL when the symbol isn't
found even if the assert passes. Fix it by initializing the values and
only setting them when something is found.
Fixes the following test failure:
$ perf test 1
1: vmlinux symtab matches kallsyms : FAILED!
Fixes: 259dce914e ("perf symbol: Remove symbol_name_rb_node")
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230630153840.858668-1-james.clark@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To verify CSV output, just check the number of separators (",") using
the tr and wc commands like this.
grep -v "^#" ${result} | tr -d -c | wc -c
Now it expects 6 columns (and 5 separators) in the output, but it may
be changed later so count the field in the header first and compare it
to the actual output lines.
$ cat ${result}
# output: contended, total wait, max wait, avg wait, type, caller
1, 28787, 28787, 28787, spinlock, raw_spin_rq_lock_nested+0x1b
The test looks like below now:
$ sudo ./perf test -v contention
86: kernel lock contention analysis test :
--- start ---
test child forked, pid 2705822
Testing perf lock record and perf lock contention
Testing perf lock contention --use-bpf
Testing perf lock record and perf lock contention at the same time
Testing perf lock contention --threads
Testing perf lock contention --lock-addr
Testing perf lock contention --type-filter (w/ spinlock)
Testing perf lock contention --lock-filter (w/ tasklist_lock)
Testing perf lock contention --callstack-filter (w/ unix_stream)
Testing perf lock contention --callstack-filter with task aggregation
Testing perf lock contention CSV output
test child finished with 0
---- end ----
kernel lock contention analysis test: Ok
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230628200141.2739587-5-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
To avoid formatting failures for example in CSV output due to debug
messages, add --output option to put the result in a file.
Unfortunately the short -o option was taken by the --owner already.
$ sudo ./perf lock con -ab --output lock-out.txt -v sleep 1
Looking at the vmlinux_path (8 entries long)
symsrc__init: cannot get elf header.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
$ head lock-out.txt
contended total wait max wait avg wait type caller
3 76.79 us 26.89 us 25.60 us rwlock:R ep_poll_callback+0x2d
0xffffffff9a23f4b5 _raw_read_lock_irqsave+0x45
0xffffffff99bbd4dd ep_poll_callback+0x2d
0xffffffff999029f3 __wake_up_common+0x73
0xffffffff99902b82 __wake_up_common_lock+0x82
0xffffffff99fa5b1c sock_def_readable+0x3c
0xffffffff9a11521d unix_stream_sendmsg+0x18d
0xffffffff99f9fc9c sock_sendmsg+0x5c
Suggested-by: Ian Rogers <irogers@google.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230628200141.2739587-4-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Sometimes we want to process the output by external programs. Let's add
the -x option to specify the field separator like perf stat.
$ sudo ./perf lock con -ab -x, sleep 1
# output: contended, total wait, max wait, avg wait, type, caller
19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
8, 67608, 27404, 8451, spinlock, __queue_work+0x174
3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
1, 24619, 24619, 24619, spinlock, rcu_core+0xd4
The first line is a comment that shows the output format. Each line is
separated by the given string ("," in this case). The time is printed
in nsec without the unit so that it can be parsed easily.
The characters can be used in the output like (":", "+" and ".") are not
allowed for the -x option.
$ ./perf lock con -x:
Cannot use the separator that is already used
Usage: perf lock contention [<options>]
-x, --field-separator <separator>
print result in CSV format with custom separator
The stacktraces are printed in the same line separated by ":". The
header is updated to show the stacktrace. Also the debug output is
added at the end as a comment.
$ sudo ./perf lock con -abv -x, -F wait_total sleep 1
Looking at the vmlinux_path (8 entries long)
symsrc__init: cannot get elf header.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
# output: total wait, type, caller, stacktrace
37134, spinlock, rcu_core+0xd4, 0xffffffff9d0401e4 _raw_spin_lock_irqsave+0x44: 0xffffffff9c738114 rcu_core+0xd4: ...
21213, spinlock, raw_spin_rq_lock_nested+0x1b, 0xffffffff9d0407c0 _raw_spin_lock+0x30: 0xffffffff9c6d9cfb raw_spin_rq_lock_nested+0x1b: ...
20506, rwlock:W, ep_done_scan+0x2d, 0xffffffff9c9bc4dd ep_done_scan+0x2d: 0xffffffff9c9bd5f1 do_epoll_wait+0x6d1: ...
18044, rwlock:R, ep_poll_callback+0x2d, 0xffffffff9d040555 _raw_read_lock_irqsave+0x45: 0xffffffff9c9bc81d ep_poll_callback+0x2d: ...
17890, rwlock:W, do_epoll_wait+0x47b, 0xffffffff9c9bd39b do_epoll_wait+0x47b: 0xffffffff9c9be9ef __x64_sys_epoll_wait+0x6d1: ...
12114, spinlock, futex_wait_queue+0x60, 0xffffffff9d0407c0 _raw_spin_lock+0x30: 0xffffffff9d037cae __schedule+0xbe: ...
# debug: total=7, bad=0, bad_task=0, bad_stack=0, bad_time=0, bad_data=0
Also note that some field (like lock symbols) can be empty.
$ sudo ./perf lock con -abl -x, -E 10 sleep 1
# output: contended, total wait, max wait, avg wait, address, symbol, type
6, 275025, 61764, 45837, ffff9dcc9f7d60d0, , spinlock
18, 87716, 11196, 4873, ffff9dc540059000, , spinlock
2, 6472, 5499, 3236, ffff9dcc7f730e00, rq_lock, spinlock
3, 4429, 2341, 1476, ffff9dcc7f7b0e00, rq_lock, spinlock
3, 3974, 1635, 1324, ffff9dcc7f7f0e00, rq_lock, spinlock
4, 3290, 1326, 822, ffff9dc5f4e2cde0, , rwlock
3, 2894, 1023, 964, ffffffff9e0d7700, rcu_state, spinlock
1, 2567, 2567, 2567, ffff9dcc7f6b0e00, rq_lock, spinlock
4, 1259, 596, 314, ffff9dc69c2adde0, , rwlock
1, 934, 934, 934, ffff9dcc7f670e00, rq_lock, spinlock
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230628200141.2739587-3-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The comment was for symbol_conf.sort_by_name which was deleted already.
Let's get rid of the stale comments as well.
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230628200141.2739587-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Internal cleanup:
- Refactor PMU data management to handle hybrid systems in a generic way.
Do more work in the lexer so that legacy event types parse more easily.
A side-effect of this is that if a PMU is specified, scanning sysfs is
avoided improving start-up time.
- Fix hybrid metrics, for example, the TopdownL1 works for both performance
and efficiency cores on Intel machines. To support this, sort and regroup
events after parsing.
- Add reference count checking for the 'thread' data structure.
- Lots of fixes for memory leaks in various places thanks to the ASAN and
Ian's refcount checker.
- Reduce the binary size by replacing static variables with local or
dynamically allocated memory.
- Introduce shared_mutex for annotate data to reduce memory footprint.
- Make filesystem access library functions more thread safe.
Test:
- Organize cpu_map tests into a single suite.
- Add metric value validation test to check if the values are within correct
value ranges.
- Add perf stat stdio output test to check if event and metric names match.
- Add perf data converter JSON output test.
- Fix a lot of issues reported by shellcheck(1). This is a preparation to
enable shellcheck by default.
- Make the large x86 new instructions test optional at build time using
EXTRA_TESTS=1.
- Add a test for libpfm4 events.
perf script:
- Add 'dsoff' outpuf field to display offset from the DSO.
$ perf script -F comm,pid,event,ip,dsoff
ls 2695501 cycles: 152cc73ef4b5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so+0x1c4b5)
ls 2695501 cycles: ffffffff99045b3e ([kernel.kallsyms])
ls 2695501 cycles: ffffffff9968e107 ([kernel.kallsyms])
ls 2695501 cycles: ffffffffc1f54afb ([kernel.kallsyms])
ls 2695501 cycles: ffffffff9968382f ([kernel.kallsyms])
ls 2695501 cycles: ffffffff99e00094 ([kernel.kallsyms])
ls 2695501 cycles: 152cc718a8d0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1+0x68d0)
ls 2695501 cycles: ffffffff992a6db0 ([kernel.kallsyms])
- Adjust width for large PID/TID values.
perf report:
- Robustify reading addr2line output for srcline by checking sentinel output
before the actual data and by using timeout of 1 second.
- Allow config terms (like 'name=ABC') with breakpoint events.
$ perf record -e mem:0x55feb98dd169:x/name=breakpoint/ -p 19646 -- sleep 1
perf annotate:
- Handle x86 instruction suffix like 'l' in 'movl' generally.
- Parse instruction operands properly even with a whitespace. This is needed
for llvm-objdump output.
- Support RISC-V binutils lookup using the triplet prefixes.
- Add '<' and '>' key to navigate to prev/next symbols in TUI.
- Fix instruction association and parsing for LoongArch.
perf stat:
- Add --per-cache aggregation option, optionally specify a cache level
like `--per-cache=L2`.
$ sudo perf stat --per-cache -a -e ls_dmnd_fills_from_sys.ext_cache_remote --\
taskset -c 0-15,64-79,128-143,192-207\
perf bench sched messaging -p -t -l 100000 -g 8
# Running 'sched/messaging' benchmark:
# 20 sender and receiver threads per group
# 8 groups == 320 threads run
Total time: 7.648 [sec]
Performance counter stats for 'system wide':
S0-D0-L3-ID0 16 17,145,912 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID8 16 14,977,628 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID16 16 262,539 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID24 16 3,140 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID32 16 27,403 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID40 16 17,026 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID48 16 7,292 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID56 16 2,464 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID64 16 22,489,306 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID72 16 21,455,257 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID80 16 11,619 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID88 16 30,978 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID96 16 37,628 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID104 16 13,594 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID112 16 10,164 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID120 16 11,259 ls_dmnd_fills_from_sys.ext_cache_remote
7.779171484 seconds time elapsed
- Change default (no event/metric) formatting for default metrics so that
events are hidden and the metric and group appear.
Performance counter stats for 'ls /':
1.85 msec task-clock # 0.594 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
97 page-faults # 52.517 K/sec
2,187,173 cycles # 1.184 GHz
2,474,459 instructions # 1.13 insn per cycle
531,584 branches # 287.805 M/sec
13,626 branch-misses # 2.56% of all branches
TopdownL1 # 23.5 % tma_backend_bound
# 11.5 % tma_bad_speculation
# 39.1 % tma_frontend_bound
# 25.9 % tma_retiring
- Allow --cputype option to have any PMU name (not just hybrid).
- Fix output value not to added when it runs multiple times with -r option.
perf list:
- Show metricgroup description from JSON file called metricgroups.json.
- Allow 'pfm' argument to list only libpfm4 events and check each event is
supported before showing it.
JSON vendor events:
- Avoid event grouping using "NO_GROUP_EVENTS" constraints. The topdown
events are correctly grouped even if no group exists.
- Add "Default" metric group to print it in the default output. And use
"DefaultMetricgroupName" to indicate the real metric group name.
- Add AmpereOne core PMU events.
Misc:
- Define man page date correctly.
- Track exception level properly on ARM CoreSight ETM.
- Allow anonymous struct, union or enum when retrieving type names from DWARF.
- Fix incorrect filename when calling `perf inject --jit`.
- Handle PLT size correctly on LoongArch.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZJxT3gAKCRCMstVUGiXM
g3//AQDyH3tbAVxU6JkvEOjjDvK7MWeXef7GQh8MP8D9Wkxk1AD9HgyxZWXn+mer
wxzBMntnxlr9+mkBerrVwUzYMd/IJQk=
=hPh8
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.5-1-2023-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next
Pull perf tools updates from Namhyung Kim:
"Internal cleanup:
- Refactor PMU data management to handle hybrid systems in a generic
way.
Do more work in the lexer so that legacy event types parse more
easily. A side-effect of this is that if a PMU is specified,
scanning sysfs is avoided improving start-up time.
- Fix hybrid metrics, for example, the TopdownL1 works for both
performance and efficiency cores on Intel machines. To support
this, sort and regroup events after parsing.
- Add reference count checking for the 'thread' data structure.
- Lots of fixes for memory leaks in various places thanks to the ASAN
and Ian's refcount checker.
- Reduce the binary size by replacing static variables with local or
dynamically allocated memory.
- Introduce shared_mutex for annotate data to reduce memory
footprint.
- Make filesystem access library functions more thread safe.
Test:
- Organize cpu_map tests into a single suite.
- Add metric value validation test to check if the values are within
correct value ranges.
- Add perf stat stdio output test to check if event and metric names
match.
- Add perf data converter JSON output test.
- Fix a lot of issues reported by shellcheck(1). This is a
preparation to enable shellcheck by default.
- Make the large x86 new instructions test optional at build time
using EXTRA_TESTS=1.
- Add a test for libpfm4 events.
perf script:
- Add 'dsoff' outpuf field to display offset from the DSO.
$ perf script -F comm,pid,event,ip,dsoff
ls 2695501 cycles: 152cc73ef4b5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so+0x1c4b5)
ls 2695501 cycles: ffffffff99045b3e ([kernel.kallsyms])
ls 2695501 cycles: ffffffff9968e107 ([kernel.kallsyms])
ls 2695501 cycles: ffffffffc1f54afb ([kernel.kallsyms])
ls 2695501 cycles: ffffffff9968382f ([kernel.kallsyms])
ls 2695501 cycles: ffffffff99e00094 ([kernel.kallsyms])
ls 2695501 cycles: 152cc718a8d0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1+0x68d0)
ls 2695501 cycles: ffffffff992a6db0 ([kernel.kallsyms])
- Adjust width for large PID/TID values.
perf report:
- Robustify reading addr2line output for srcline by checking sentinel
output before the actual data and by using timeout of 1 second.
- Allow config terms (like 'name=ABC') with breakpoint events.
$ perf record -e mem:0x55feb98dd169:x/name=breakpoint/ -p 19646 -- sleep 1
perf annotate:
- Handle x86 instruction suffix like 'l' in 'movl' generally.
- Parse instruction operands properly even with a whitespace. This is
needed for llvm-objdump output.
- Support RISC-V binutils lookup using the triplet prefixes.
- Add '<' and '>' key to navigate to prev/next symbols in TUI.
- Fix instruction association and parsing for LoongArch.
perf stat:
- Add --per-cache aggregation option, optionally specify a cache
level like `--per-cache=L2`.
$ sudo perf stat --per-cache -a -e ls_dmnd_fills_from_sys.ext_cache_remote --\
taskset -c 0-15,64-79,128-143,192-207\
perf bench sched messaging -p -t -l 100000 -g 8
# Running 'sched/messaging' benchmark:
# 20 sender and receiver threads per group
# 8 groups == 320 threads run
Total time: 7.648 [sec]
Performance counter stats for 'system wide':
S0-D0-L3-ID0 16 17,145,912 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID8 16 14,977,628 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID16 16 262,539 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID24 16 3,140 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID32 16 27,403 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID40 16 17,026 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID48 16 7,292 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID56 16 2,464 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID64 16 22,489,306 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID72 16 21,455,257 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID80 16 11,619 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID88 16 30,978 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID96 16 37,628 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID104 16 13,594 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID112 16 10,164 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID120 16 11,259 ls_dmnd_fills_from_sys.ext_cache_remote
7.779171484 seconds time elapsed
- Change default (no event/metric) formatting for default metrics so
that events are hidden and the metric and group appear.
Performance counter stats for 'ls /':
1.85 msec task-clock # 0.594 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
97 page-faults # 52.517 K/sec
2,187,173 cycles # 1.184 GHz
2,474,459 instructions # 1.13 insn per cycle
531,584 branches # 287.805 M/sec
13,626 branch-misses # 2.56% of all branches
TopdownL1 # 23.5 % tma_backend_bound
# 11.5 % tma_bad_speculation
# 39.1 % tma_frontend_bound
# 25.9 % tma_retiring
- Allow --cputype option to have any PMU name (not just hybrid).
- Fix output value not to added when it runs multiple times with -r
option.
perf list:
- Show metricgroup description from JSON file called
metricgroups.json.
- Allow 'pfm' argument to list only libpfm4 events and check each
event is supported before showing it.
JSON vendor events:
- Avoid event grouping using "NO_GROUP_EVENTS" constraints. The
topdown events are correctly grouped even if no group exists.
- Add "Default" metric group to print it in the default output. And
use "DefaultMetricgroupName" to indicate the real metric group
name.
- Add AmpereOne core PMU events.
Misc:
- Define man page date correctly.
- Track exception level properly on ARM CoreSight ETM.
- Allow anonymous struct, union or enum when retrieving type names
from DWARF.
- Fix incorrect filename when calling `perf inject --jit`.
- Handle PLT size correctly on LoongArch"
* tag 'perf-tools-for-v6.5-1-2023-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (269 commits)
perf test: Skip metrics w/o event name in stat STD output linter
perf test: Reorder event name checks in stat STD output linter
perf pmu: Remove a hard coded cpu PMU assumption
perf pmus: Add notion of default PMU for JSON events
perf unwind: Fix map reference counts
perf test: Set PERF_EXEC_PATH for script execution
perf script: Initialize buffer for regs_map()
perf tests: Fix test_arm_callgraph_fp variable expansion
perf symbol: Add LoongArch case in get_plt_sizes()
perf test: Remove x permission from lib/stat_output.sh
perf test: Rerun failed metrics with longer workload
perf test: Add skip list for metrics known would fail
perf test: Add metric value validation test
perf jit: Fix incorrect file name in DWARF line table
perf annotate: Fix instruction association and parsing for LoongArch
perf annotation: Switch lock from a mutex to a sharded_mutex
perf sharded_mutex: Introduce sharded_mutex
tools: Fix incorrect calculation of object size by sizeof
perf subcmd: Fix missing check for return value of malloc() in add_cmdname()
perf parse-events: Remove unneeded semicolon
...
Updates were released in:
9a3cd5ad68
Adds the events ICACHE_DATA.STALLS, ICACHE_TAG.STALLS and
DECODE.LCP. Descriptions are also updated.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Sohom Datta <sohomdatta1@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230623151016.4193660-13-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Updates were released in:
78d47cbbae
Adds the events ICACHE_DATA.STALLS, ICACHE_TAG.STALLS and
DECODE.LCP. Descriptions are also updated.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Sohom Datta <sohomdatta1@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230623151016.4193660-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Updates were released in:
f3d841189f
A number of events are deprecated and event descriptions updated. Adds
events ICACHE_DATA.STALLS, ICACHE_TAG.STALLS and DECODE.LCP.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Sohom Datta <sohomdatta1@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230623151016.4193660-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
1.03 events were released in:
501a29e88b
It added a lot of events and all uncore events.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Sohom Datta <sohomdatta1@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230623151016.4193660-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Make the transaction metrics conditional on the cycles-tx event being
present. This event may not be present when TSX extensions have been
disabled.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Sohom Datta <sohomdatta1@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230623151016.4193660-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Support for the new has_event function in metrics.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Sohom Datta <sohomdatta1@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230623151016.4193660-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Some events are dependent on firmware/kernel enablement. Allow such
events to be detected when the metric is parsed so that the metric's
event parsing doesn't fail.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Sohom Datta <sohomdatta1@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230623151016.4193660-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The thread__find_map() is to find a map for a given address in the
given thread's address space. It searches maps based on the cpu mode
and fills various information in the addr_location data structure.
It might change al->maps and al->map, but not al->thread. Then I
think no reason to not set the al->thread at the beginning.
Also get rid of the duplicate 'al->map = NULL' part.
Fixes: 0dd5041c9a ("perf addr_location: Add init/exit/copy functions")
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: James Clark <james.clark@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Core
----
- Rework the sendpage & splice implementations. Instead of feeding
data into sockets page by page extend sendmsg handlers to support
taking a reference on the data, controlled by a new flag called
MSG_SPLICE_PAGES. Rework the handling of unexpected-end-of-file
to invoke an additional callback instead of trying to predict what
the right combination of MORE/NOTLAST flags is.
Remove the MSG_SENDPAGE_NOTLAST flag completely.
- Implement SCM_PIDFD, a new type of CMSG type analogous to
SCM_CREDENTIALS, but it contains pidfd instead of plain pid.
- Enable socket busy polling with CONFIG_RT.
- Improve reliability and efficiency of reporting for ref_tracker.
- Auto-generate a user space C library for various Netlink families.
Protocols
---------
- Allow TCP to shrink the advertised window when necessary, prevent
sk_rcvbuf auto-tuning from growing the window all the way up to
tcp_rmem[2].
- Use per-VMA locking for "page-flipping" TCP receive zerocopy.
- Prepare TCP for device-to-device data transfers, by making sure
that payloads are always attached to skbs as page frags.
- Make the backoff time for the first N TCP SYN retransmissions
linear. Exponential backoff is unnecessarily conservative.
- Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO).
- Avoid waking up applications using TLS sockets until we have
a full record.
- Allow using kernel memory for protocol ioctl callbacks, paving
the way to issuing ioctls over io_uring.
- Add nolocalbypass option to VxLAN, forcing packets to be fully
encapsulated even if they are destined for a local IP address.
- Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
in-kernel ECMP implementation (e.g. Open vSwitch) select the same
link for all packets. Support L4 symmetric hashing in Open vSwitch.
- PPPoE: make number of hash bits configurable.
- Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
(ipconfig).
- Add layer 2 miss indication and filtering, allowing higher layers
(e.g. ACL filters) to make forwarding decisions based on whether
packet matched forwarding state in lower devices (bridge).
- Support matching on Connectivity Fault Management (CFM) packets.
- Hide the "link becomes ready" IPv6 messages by demoting their
printk level to debug.
- HSR: don't enable promiscuous mode if device offloads the proto.
- Support active scanning in IEEE 802.15.4.
- Continue work on Multi-Link Operation for WiFi 7.
BPF
---
- Add precision propagation for subprogs and callbacks. This allows
maintaining verification efficiency when subprograms are used,
or in fact passing the verifier at all for complex programs,
especially those using open-coded iterators.
- Improve BPF's {g,s}setsockopt() length handling. Previously BPF
assumed the length is always equal to the amount of written data.
But some protos allow passing a NULL buffer to discover what
the output buffer *should* be, without writing anything.
- Accept dynptr memory as memory arguments passed to helpers.
- Add routing table ID to bpf_fib_lookup BPF helper.
- Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands.
- Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
maps as read-only).
- Show target_{obj,btf}_id in tracing link fdinfo.
- Addition of several new kfuncs (most of the names are self-explanatory):
- Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
and bpf_dynptr_clone().
- bpf_task_under_cgroup()
- bpf_sock_destroy() - force closing sockets
- bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs
Netfilter
---------
- Relax set/map validation checks in nf_tables. Allow checking
presence of an entry in a map without using the value.
- Increase ip_vs_conn_tab_bits range for 64BIT builds.
- Allow updating size of a set.
- Improve NAT tuple selection when connection is closing.
Driver API
----------
- Integrate netdev with LED subsystem, to allow configuring HW
"offloaded" blinking of LEDs based on link state and activity
(i.e. packets coming in and out).
- Support configuring rate selection pins of SFP modules.
- Factor Clause 73 auto-negotiation code out of the drivers, provide
common helper routines.
- Add more fool-proof helpers for managing lifetime of MDIO devices
associated with the PCS layer.
- Allow drivers to report advanced statistics related to Time Aware
scheduler offload (taprio).
- Allow opting out of VF statistics in link dump, to allow more VFs
to fit into the message.
- Split devlink instance and devlink port operations.
New hardware / drivers
----------------------
- Ethernet:
- Synopsys EMAC4 IP support (stmmac)
- Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
- Marvell 88E6250 7 port switches
- Microchip LAN8650/1 Rev.B0 PHYs
- MediaTek MT7981/MT7988 built-in 1GE PHY driver
- WiFi:
- Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
- Realtek RTL8723DS (SDIO variant)
- Realtek RTL8851BE
- CAN:
- Fintek F81604
Drivers
-------
- Ethernet NICs:
- Intel (100G, ice):
- support dynamic interrupt allocation
- use meta data match instead of VF MAC addr on slow-path
- nVidia/Mellanox:
- extend link aggregation to handle 4, rather than just 2 ports
- spawn sub-functions without any features by default
- OcteonTX2:
- support HTB (Tx scheduling/QoS) offload
- make RSS hash generation configurable
- support selecting Rx queue using TC filters
- Wangxun (ngbe/txgbe):
- add basic Tx/Rx packet offloads
- add phylink support (SFP/PCS control)
- Freescale/NXP (enetc):
- report TAPRIO packet statistics
- Solarflare/AMD:
- support matching on IP ToS and UDP source port of outer header
- VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
- add devlink dev info support for EF10
- Virtual NICs:
- Microsoft vNIC:
- size the Rx indirection table based on requested configuration
- support VLAN tagging
- Amazon vNIC:
- try to reuse Rx buffers if not fully consumed, useful for ARM
servers running with 16kB pages
- Google vNIC:
- support TCP segmentation of >64kB frames
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- enable USXGMII (88E6191X)
- Microchip:
- lan966x: add support for Egress Stage 0 ACL engine
- lan966x: support mapping packet priority to internal switch
priority (based on PCP or DSCP)
- Ethernet PHYs:
- Broadcom PHYs:
- support for Wake-on-LAN for BCM54210E/B50212E
- report LPI counter
- Microsemi PHYs: support RGMII delay configuration (VSC85xx)
- Micrel PHYs: receive timestamp in the frame (LAN8841)
- Realtek PHYs: support optional external PHY clock
- Altera TSE PCS: merge the driver into Lynx PCS which it is
a variant of
- CAN: Kvaser PCIEcan:
- support packet timestamping
- WiFi:
- Intel (iwlwifi):
- major update for new firmware and Multi-Link Operation (MLO)
- configuration rework to drop test devices and split
the different families
- support for segmented PNVM images and power tables
- new vendor entries for PPAG (platform antenna gain) feature
- Qualcomm 802.11ax (ath11k):
- Multiple Basic Service Set Identifier (MBSSID) and
Enhanced MBSSID Advertisement (EMA) support in AP mode
- support factory test mode
- RealTek (rtw89):
- add RSSI based antenna diversity
- support U-NII-4 channels on 5 GHz band
- RealTek (rtl8xxxu):
- AP mode support for 8188f
- support USB RX aggregation for the newer chips
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSbJM4ACgkQMUZtbf5S
IrtoDhAAhEim1+LBIKf4lhPcVdZ2p/TkpnwTz5jsTwSeRBAxTwuNJ2fQhFXg13E3
MnRq6QaEp8G4/tA/gynLvQop+FEZEnv+horP0zf/XLcC8euU7UrKdrpt/4xxdP07
IL/fFWsoUGNO+L9LNaHwBo8g7nHvOkPscHEBHc2Xrvzab56TJk6vPySfLqcpKlNZ
CHWDwTpgRqNZzSKiSpoMVd9OVMKUXcPYHpDmfEJ5l+e8vTXmZzOLHrSELHU5nP5f
mHV7gxkDCTshoGcaed7UTiOvgu1p6E5EchDJxiLaSUbgsd8SZ3u4oXwRxgj33RK/
fB2+UaLrRt/DdlHvT/Ph8e8Ygu77yIXMjT49jsfur/zVA0HEA2dFb7V6QlsYRmQp
J25pnrdXmE15llgqsC0/UOW5J1laTjII+T2T70UOAqQl4LWYAQDG4WwsAqTzU0KY
dueydDouTp9XC2WYrRUEQxJUzxaOaazskDUHc5c8oHp/zVBT+djdgtvVR9+gi6+7
yy4elI77FlEEqL0ItdU/lSWINayAlPLsIHkMyhSGKX0XDpKjeycPqkNx4UterXB/
JKIR5RBWllRft+igIngIkKX0tJGMU0whngiw7d1WLw25wgu4sB53hiWWoSba14hv
tXMxwZs5iGaPcT38oRVMZz8I1kJM4Dz3SyI7twVvi4RUut64EG4=
=9i4I
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking changes from Jakub Kicinski:
"WiFi 7 and sendpage changes are the biggest pieces of work for this
release. The latter will definitely require fixes but I think that we
got it to a reasonable point.
Core:
- Rework the sendpage & splice implementations
Instead of feeding data into sockets page by page extend sendmsg
handlers to support taking a reference on the data, controlled by a
new flag called MSG_SPLICE_PAGES
Rework the handling of unexpected-end-of-file to invoke an
additional callback instead of trying to predict what the right
combination of MORE/NOTLAST flags is
Remove the MSG_SENDPAGE_NOTLAST flag completely
- Implement SCM_PIDFD, a new type of CMSG type analogous to
SCM_CREDENTIALS, but it contains pidfd instead of plain pid
- Enable socket busy polling with CONFIG_RT
- Improve reliability and efficiency of reporting for ref_tracker
- Auto-generate a user space C library for various Netlink families
Protocols:
- Allow TCP to shrink the advertised window when necessary, prevent
sk_rcvbuf auto-tuning from growing the window all the way up to
tcp_rmem[2]
- Use per-VMA locking for "page-flipping" TCP receive zerocopy
- Prepare TCP for device-to-device data transfers, by making sure
that payloads are always attached to skbs as page frags
- Make the backoff time for the first N TCP SYN retransmissions
linear. Exponential backoff is unnecessarily conservative
- Create a new MPTCP getsockopt to retrieve all info
(MPTCP_FULL_INFO)
- Avoid waking up applications using TLS sockets until we have a full
record
- Allow using kernel memory for protocol ioctl callbacks, paving the
way to issuing ioctls over io_uring
- Add nolocalbypass option to VxLAN, forcing packets to be fully
encapsulated even if they are destined for a local IP address
- Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
in-kernel ECMP implementation (e.g. Open vSwitch) select the same
link for all packets. Support L4 symmetric hashing in Open vSwitch
- PPPoE: make number of hash bits configurable
- Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
(ipconfig)
- Add layer 2 miss indication and filtering, allowing higher layers
(e.g. ACL filters) to make forwarding decisions based on whether
packet matched forwarding state in lower devices (bridge)
- Support matching on Connectivity Fault Management (CFM) packets
- Hide the "link becomes ready" IPv6 messages by demoting their
printk level to debug
- HSR: don't enable promiscuous mode if device offloads the proto
- Support active scanning in IEEE 802.15.4
- Continue work on Multi-Link Operation for WiFi 7
BPF:
- Add precision propagation for subprogs and callbacks. This allows
maintaining verification efficiency when subprograms are used, or
in fact passing the verifier at all for complex programs,
especially those using open-coded iterators
- Improve BPF's {g,s}setsockopt() length handling. Previously BPF
assumed the length is always equal to the amount of written data.
But some protos allow passing a NULL buffer to discover what the
output buffer *should* be, without writing anything
- Accept dynptr memory as memory arguments passed to helpers
- Add routing table ID to bpf_fib_lookup BPF helper
- Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
- Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
maps as read-only)
- Show target_{obj,btf}_id in tracing link fdinfo
- Addition of several new kfuncs (most of the names are
self-explanatory):
- Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
and bpf_dynptr_clone().
- bpf_task_under_cgroup()
- bpf_sock_destroy() - force closing sockets
- bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs
Netfilter:
- Relax set/map validation checks in nf_tables. Allow checking
presence of an entry in a map without using the value
- Increase ip_vs_conn_tab_bits range for 64BIT builds
- Allow updating size of a set
- Improve NAT tuple selection when connection is closing
Driver API:
- Integrate netdev with LED subsystem, to allow configuring HW
"offloaded" blinking of LEDs based on link state and activity
(i.e. packets coming in and out)
- Support configuring rate selection pins of SFP modules
- Factor Clause 73 auto-negotiation code out of the drivers, provide
common helper routines
- Add more fool-proof helpers for managing lifetime of MDIO devices
associated with the PCS layer
- Allow drivers to report advanced statistics related to Time Aware
scheduler offload (taprio)
- Allow opting out of VF statistics in link dump, to allow more VFs
to fit into the message
- Split devlink instance and devlink port operations
New hardware / drivers:
- Ethernet:
- Synopsys EMAC4 IP support (stmmac)
- Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
- Marvell 88E6250 7 port switches
- Microchip LAN8650/1 Rev.B0 PHYs
- MediaTek MT7981/MT7988 built-in 1GE PHY driver
- WiFi:
- Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
- Realtek RTL8723DS (SDIO variant)
- Realtek RTL8851BE
- CAN:
- Fintek F81604
Drivers:
- Ethernet NICs:
- Intel (100G, ice):
- support dynamic interrupt allocation
- use meta data match instead of VF MAC addr on slow-path
- nVidia/Mellanox:
- extend link aggregation to handle 4, rather than just 2 ports
- spawn sub-functions without any features by default
- OcteonTX2:
- support HTB (Tx scheduling/QoS) offload
- make RSS hash generation configurable
- support selecting Rx queue using TC filters
- Wangxun (ngbe/txgbe):
- add basic Tx/Rx packet offloads
- add phylink support (SFP/PCS control)
- Freescale/NXP (enetc):
- report TAPRIO packet statistics
- Solarflare/AMD:
- support matching on IP ToS and UDP source port of outer
header
- VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
- add devlink dev info support for EF10
- Virtual NICs:
- Microsoft vNIC:
- size the Rx indirection table based on requested
configuration
- support VLAN tagging
- Amazon vNIC:
- try to reuse Rx buffers if not fully consumed, useful for ARM
servers running with 16kB pages
- Google vNIC:
- support TCP segmentation of >64kB frames
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- enable USXGMII (88E6191X)
- Microchip:
- lan966x: add support for Egress Stage 0 ACL engine
- lan966x: support mapping packet priority to internal switch
priority (based on PCP or DSCP)
- Ethernet PHYs:
- Broadcom PHYs:
- support for Wake-on-LAN for BCM54210E/B50212E
- report LPI counter
- Microsemi PHYs: support RGMII delay configuration (VSC85xx)
- Micrel PHYs: receive timestamp in the frame (LAN8841)
- Realtek PHYs: support optional external PHY clock
- Altera TSE PCS: merge the driver into Lynx PCS which it is a
variant of
- CAN: Kvaser PCIEcan:
- support packet timestamping
- WiFi:
- Intel (iwlwifi):
- major update for new firmware and Multi-Link Operation (MLO)
- configuration rework to drop test devices and split the
different families
- support for segmented PNVM images and power tables
- new vendor entries for PPAG (platform antenna gain) feature
- Qualcomm 802.11ax (ath11k):
- Multiple Basic Service Set Identifier (MBSSID) and Enhanced
MBSSID Advertisement (EMA) support in AP mode
- support factory test mode
- RealTek (rtw89):
- add RSSI based antenna diversity
- support U-NII-4 channels on 5 GHz band
- RealTek (rtl8xxxu):
- AP mode support for 8188f
- support USB RX aggregation for the newer chips"
* tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits)
net: scm: introduce and use scm_recv_unix helper
af_unix: Skip SCM_PIDFD if scm->pid is NULL.
net: lan743x: Simplify comparison
netlink: Add __sock_i_ino() for __netlink_diag_dump().
net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses
Revert "af_unix: Call scm_recv() only after scm_set_cred()."
phylink: ReST-ify the phylink_pcs_neg_mode() kdoc
libceph: Partially revert changes to support MSG_SPLICE_PAGES
net: phy: mscc: fix packet loss due to RGMII delays
net: mana: use vmalloc_array and vcalloc
net: enetc: use vmalloc_array and vcalloc
ionic: use vmalloc_array and vcalloc
pds_core: use vmalloc_array and vcalloc
gve: use vmalloc_array and vcalloc
octeon_ep: use vmalloc_array and vcalloc
net: usb: qmi_wwan: add u-blox 0x1312 composition
perf trace: fix MSG_SPLICE_PAGES build error
ipvlan: Fix return value of ipvlan_queue_xmit()
netfilter: nf_tables: fix underflow in chain reference counter
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
...
If loading a core PMU fails, legacy hardware/cache events may segv due
to there being no PMU. Create a placeholder empty PMU for this
case. This was discussed in:
https://lore.kernel.org/lkml/20230614151625.2077-1-yangjihong1@huawei.com/
Reported-by: Yang Jihong <yangjihong1@huawei.com>
Tested-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230627182834.117565-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
brings some order to the documentation directory, declutters the top-level
directory, and makes the documentation organization more closely match that
of the source.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmSbDsIPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5Y+ksH/2Xqun1ipPvu66+bBdPIf8N9AVFatl2q3mt4
tgX3A4RH3Ejklb4GbRLOIP23PmCxt7LRv4P05ttw8VpTP3A+Cw1d1s2RxiXGvfDE
j7IW6hrpUmVoDdiDCRGtjdIa7MVI5aAsj8CCTjEFywGi5CQe0Uzq4aTUKoxJDEnu
GYVy2CwDNEt4GTQ6ClPpFx2rc4UZf/H2XqXsnod9ef8A5Nkt3EtgoS1hh3o1QZGA
Mqx2HAOVS1tb6GUVUbVLCdj40+YjBLjXFlsH4dA+wsFFdUlZLKuTesdiAMg2X6eT
E8C/6oRT+OiWbrnXUTJEn8z98Ds8VHn7D4n97O9bIQ+R9AFtmPI=
=H/+D
-----END PGP SIGNATURE-----
Merge tag 'docs-arm64-move' of git://git.lwn.net/linux
Pull arm64 documentation move from Jonathan Corbet:
"Move the arm64 architecture documentation under Documentation/arch/.
This brings some order to the documentation directory, declutters the
top-level directory, and makes the documentation organization more
closely match that of the source"
* tag 'docs-arm64-move' of git://git.lwn.net/linux:
perf arm-spe: Fix a dangling Documentation/arm64 reference
mm: Fix a dangling Documentation/arm64 reference
arm64: Fix dangling references to Documentation/arm64
dt-bindings: fix dangling Documentation/arm64 reference
docs: arm64: Move arm64 documentation under Documentation/arch/
- Rework & fix the event forwarding logic by extending the
core interface. This fixes AMD PMU events that have to
be forwarded from the core PMU to the IBS PMU.
- Add self-tests to test AMD IBS invocation via core PMU events
- Clean up Intel FixCntrCtl MSR encoding & handling
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmSayC0RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1jlWxAAqUPtfst1p6H5sSyCBPYo5Y/Rh0SyyqJj
w0YZ8p2nbB/+EmIN3WS1uYhx1/AalTP254q2BgVF4DlDFQA1MlJCmSNJ9HhtzOgt
mbpNKzy50cQCR/iH+s3ldcFsLGhSG07j6w8xeb6BGiABm2JoiZeg6iVU76zRe5A1
iPnjC7qoqjKH+sq8pu32fBClMjzf05/LGMd0MqFuYfl5950xRW61olstjo93XWgK
O5z+5wm5H3MhJ2mzU6x+0C/xurIEQ0zRf6AqLbFp41BbJJJORgTCK746flghiqd5
DiADc7oj9eOqL1X9jFPHgE07T/6QPrMC8BoH64pOcM3PoZ6Iq3zTkUHxAw3qK5j+
kqduxzlVaFLFnf7R/vxUvjMg1PM+qP3pqgCrT+NFUdqsdLgSPxRzt5pAM6aAUwmU
1lhuapESH44RUFZGWrfOwzQE5q/FDmUc2yGyGW2aYDmwkclNjVpnvHEJrQMugI3M
M3/y9a+ErcPDUJfHcodutBDGw9l7VhsxJFMt4ydOTkNbEfZLbi2TzNapui6SKFja
G2efrB/HhrV9nE+21Wfa3uxoKMuJ/UPiGrVr2qyGOnShQpK7sdyGDshO1s6TTPye
OoVf9I0LhewMPap52SU/KDP7GJVPW1BhL/C7w6OSnXxlS5k4lOji7z4Dj2hqXHib
19Jm7BhqZwE=
=xn05
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
- Rework & fix the event forwarding logic by extending the core
interface.
This fixes AMD PMU events that have to be forwarded from the
core PMU to the IBS PMU.
- Add self-tests to test AMD IBS invocation via core PMU events
- Clean up Intel FixCntrCtl MSR encoding & handling
* tag 'perf-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Re-instate the linear PMU search
perf/x86/intel: Define bit macros for FixCntrCtl MSR
perf test: Add selftest to test IBS invocation via core pmu events
perf/core: Remove pmu linear searching code
perf/ibs: Fix interface via core pmu events
perf/core: Rework forwarding of {task|cpu}-clock events
The dso__find_symbol_by_name() should be have idx pointer argument.
Found during the build-test.
$ make build-test
...
CC /tmp/tmp.6JwPK1xbWG/tests/pe-file-parsing.o
tests/pe-file-parsing.c: In function ‘run_dir’:
tests/pe-file-parsing.c:64:15: error: too few arguments to function ‘dso__find_symbol_by_name’
64 | sym = dso__find_symbol_by_name(dso, "main");
| ^~~~~~~~~~~~~~~~~~~~~~~~
In file included from tests/pe-file-parsing.c:16:
/usr/local/google/home/namhyung/project/linux/tools/perf/util/symbol.h:135:16: note: declared here
135 | struct symbol *dso__find_symbol_by_name(struct dso *dso, const char *name, size_t *idx);
| ^~~~~~~~~~~~~~~~~~~~~~~~
Fixes: 259dce914e ("perf symbol: Remove symbol_name_rb_node")
Acked-by: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230627063257.549005-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
-target has been deprecated since Clang 3.4 in 2013. Use the preferred
--target=bpf form instead. This matches how we use --target= in
scripts/Makefile.clang.
Signed-off-by: Fangrui Song <maskray@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: llvm@lists.linux.dev
Cc: Ingo Molnar <mingo@redhat.com>
Cc: bpf@vger.kernel.org
Link: 274b6f0c87
Link: https://lore.kernel.org/r/20230624002708.1907962-1-maskray@google.com
[ resolved a conflict with GEN_VMLINUX_H changes ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The original logic was to check is_pmu_hybrid() like in the below.
It just checks the name of PMU specifically for Intel hybrid systems
which means uncore PMU events should return false.
https://lore.kernel.org/all/20230527072210.2900565-35-irogers@google.com/
The is_pmu_hybrid() was replaced by arch-agnostic way but with the
incorrect condition which was fixed for core PMUs but not uncore.
This change fixes both.
Fixes: e23421426e ("perf pmu: Correct perf_pmu__auto_merge_stats() affecting hybrid")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/all/CAP-5=fXOi=xQ4=j5xAq+jWLR9n7uvfsWK+PzXkY1MZ3Fz-xccw@mail.gmail.com/
Link: https://lore.kernel.org/r/20230626053048.257959-1-irogers@google.com
[ rephrase the commit log a bit ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Namhyung reported segfault in perf daemon start command.
It's caused by extra check on argv[0] which is set to NULL by previous
__cmd_start call. Adding missing else to skip the extra check.
Fixes: 92294b906e ("perf daemon: Dynamically allocate path to perf")
Reported-and-Tested-by: Namhyung Kim <namhyung@kernel.org>
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lore.kernel.org/r/20230626201606.2514679-1-jolsa@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Our MPTCP CI and Stephen got this error:
In file included from builtin-trace.c:907:
trace/beauty/msg_flags.c: In function 'syscall_arg__scnprintf_msg_flags':
trace/beauty/msg_flags.c:28:21: error: 'MSG_SPLICE_PAGES' undeclared (first use in this function)
28 | if (flags & MSG_##n) { | ^~~~
trace/beauty/msg_flags.c:50:9: note: in expansion of macro 'P_MSG_FLAG'
50 | P_MSG_FLAG(SPLICE_PAGES);
| ^~~~~~~~~~
trace/beauty/msg_flags.c:28:21: note: each undeclared identifier is reported only once for each function it appears in
28 | if (flags & MSG_##n) { | ^~~~
trace/beauty/msg_flags.c:50:9: note: in expansion of macro 'P_MSG_FLAG'
50 | P_MSG_FLAG(SPLICE_PAGES);
| ^~~~~~~~~~
The fix is similar to what was done with MSG_FASTOPEN: the new macro is
defined if it is not defined in the system headers.
Fixes: b848b26c66 ("net: Kill MSG_SENDPAGE_NOTLAST")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/r/20230626112847.2ef3d422@canb.auug.org.au/
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230626090239.899672-1-matthieu.baerts@tessares.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Previously used to specify symbol_name_rb_node was in use.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jason Wang <wangborong@cdjrlc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/20230623054520.4118442-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Most perf commands want to sort symbols by name and this is done via
an invasive rbtree that on 64-bit systems costs 24 bytes. Sorting the
symbols in a DSO by name is optional and not done by default, however,
if sorting is requested the 24 bytes is allocated for every
symbol.
This change removes the rbtree and uses a sorted array of symbol
pointers instead (costing 8 bytes per symbol). As the array is created
on demand then there are further memory savings. The complexity of
sorting the array and using the rbtree are the same.
To support going to the next symbol, the index of the current symbol
needs to be passed around as a pair with the current symbol. This
requires some API changes.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jason Wang <wangborong@cdjrlc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/20230623054520.4118442-3-irogers@google.com
[ minimize change in symbols__sort_by_name() ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Determine if symbols are sorted, set the sorted flag and sort under
the dso lock. Done in the interest of thread safety.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jason Wang <wangborong@cdjrlc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/20230623054520.4118442-2-irogers@google.com
[ handle the similar code in util/probe-event.c ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
If generating vmlinux.h, make the code to generate it more tolerant by
filtering out paths to kernels that lack a .BTF section.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230623041405.4039475-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add tests with and without generating vmlinux.h.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230623041405.4039475-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
struct rq is defined in vmlinux.h when the vmlinux.h is generated,
this causes a redefinition failure if it is declared in
lock_contention.bpf.c. Move the definition to vmlinux.h for
consistency with the generated version.
Fixes: 760ebc4574 ("perf lock contention: Add empty 'struct rq' to satisfy libbpf 'runqueue' type verification")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230623041405.4039475-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
This test checks if the output of perf stat to match event names and
metrics. So it wants the output lines to have both event name and
metric. Otherwise it should skip the line.
On AMD machines, the instruction event has two metrics and they are printed
in separate lines. It makes the line without event name like below:
# perf stat -a sleep 1
Performance counter stats for 'system wide':
64,383.34 msec cpu-clock # 64.048 CPUs utilized
14,526 context-switches # 225.617 /sec
112 cpu-migrations # 1.740 /sec
190 page-faults # 2.951 /sec
807,558,652 cycles # 0.013 GHz (83.30%)
69,809,799 stalled-cycles-frontend # 8.64% frontend cycles idle (83.30%)
196,983,266 stalled-cycles-backend # 24.39% backend cycles idle (83.30%)
424,876,008 instructions # 0.53 insn per cycle
(here) ---> # 0.46 stalled cycles per insn (83.30%)
97,788,321 branches # 1.519 M/sec (83.34%)
4,147,377 branch-misses # 4.24% of all branches (83.46%)
1.005241409 seconds time elapsed
Also modern Intel machines have TopDown metrics which also don't have
event names.
# perf stat -a sleep 1
Performance counter stats for 'system wide':
8,015.39 msec cpu-clock # 7.996 CPUs utilized
5,823 context-switches # 726.477 /sec
189 cpu-migrations # 23.580 /sec
139 page-faults # 17.342 /sec
435,139,308 cycles # 0.054 GHz
193,891,345 instructions # 0.45 insn per cycle
42,773,028 branches # 5.336 M/sec
2,298,113 branch-misses # 5.37% of all branches
TopdownL1 # 25.5 % tma_backend_bound
/--> # 7.9 % tma_bad_speculation
(here) --+ # 55.7 % tma_frontend_bound
\--> # 10.9 % tma_retiring
1.002395924 seconds time elapsed
There is a check to skip TopdownL1 and TopdownL2 specifically but it
does not cover every affected lines.
So there is another check to skip the line if it has nothing on the left
side of # sign. Well.. it seems ok but that's not enough too.
When aggregation mode (like --per-socket or --per-thread) is used, it
adds some prefix (e.g. CPU socket, task name and PID) in the output
line. So the test code ignores them to normalize result.
A problem can happen for per-thread mode when task name contains one or
more spaces. It'd only ignore the first part of the task name, and it
thinks there's something more in the line so it would not skip.
# perf stat -a --perf-thread sleep 1
...
perf-21276 # 70.2 % tma_backend_bound
perf-21276 # 3.9 % tma_bad_speculation
perf-21276 # 10.5 % tma_frontend_bound
perf-21276 # 15.3 % tma_retiring
^^^^^^^^^^
(ignored)
my task-21328 # 70.2 % tma_backend_bound
my task-21328 # 3.9 % tma_bad_speculation
my task-21328 # 10.5 % tma_frontend_bound
my task-21328 # 15.3 % tma_retiring
^^
(ignored)
So I think it should look at the metric names instead. Add skip_metric
to hold the list of names to skip. It would contain 'stalled cycles per
insn' and metrics started by 'tma_'.
Fixes: 99a04a48f2 ("perf test: Add test case for the standard 'perf stat' output")
Acked-by: Ian Rogers <irogers@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230623230139.985594-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Commit a887466562 ("perf bpf skels: Stop using vmlinux.h generated
from BTF, use subset of used structs + CO-RE") made it so that
vmlinux.h was uncondtionally included from
tools/perf/util/vmlinux.h. This change reverts part of that change (so
that vmlinux.h is once again generated) and makes it so that the
vmlinux.h used at build time is selected from the VMLINUX_H
variable. By default the VMLINUX_H variable is set to the vmlinux.h
added in change a887466562, but if GEN_VMLINUX_H=1 is passed on the
build command line then the previous generation behavior kicks in.
The build with GEN_VMLINUX_H=1 currently fails with:
util/bpf_skel/lock_contention.bpf.c:419:8: error: redefinition of 'rq'
struct rq {};
^
/tmp/perf/util/bpf_skel/.tmp/../vmlinux.h:45630:8: note: previous definition is here
struct rq {
^
1 error generated.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230623041405.4039475-2-irogers@google.com
[ Format the error message and add a comment for GEN_VMLINUX_H ]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
On AMD machines, the perf stat STD output test failed like below:
$ sudo ./perf test -v 98
98: perf stat STD output linter :
--- start ---
test child forked, pid 1841901
Checking STD output: no argswrong event metric.
expected 'GHz' in 108,121 stalled-cycles-frontend # 10.88% frontend cycles idle
test child finished with -1
---- end ----
perf stat STD output linter: FAILED!
This is because there are stalled-cycles-{frontend,backend} events are
used by default. The current logic checks the event_name array to find
which event it's running. But 'cycles' event comes before those stalled
cycles event and it matches first. So it tries to find 'GHz' metric
in the output (which is for the 'cycles') and fails.
Move the stalled-cycles-{frontend,backend} events before 'cycles' so
that it can find the stalled cycles events first.
Also add a space after 'no args' test name for consistency.
Fixes: 99a04a48f2 ("perf test: Add test case for the standard 'perf stat' output")
Acked-by: Ian Rogers <irogers@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230623230139.985594-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The property of "cpu" when it has no cpu map is true on S390 with the
PMU cpum_cf. Rather than maintain a list of such PMUs, reuse the
is_core test result from the caller.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20230623043843.4080180-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
JSON events created in pmu-events.c by jevents.py may not specify a
PMU they are associated with, in which case it is implied that it is
the first core PMU. Care is needed to select this for regular 'cpu',
s390 'cpum_cf' and ARMs many names as at the point the name is first
needed the core PMUs list hasn't been initialized. Add a helper in
perf_pmus to create this value, in the worst case by scanning sysfs.
v2. Add missing close if fdopendir fails.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230623043843.4080180-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The result of thread__find_map is the map in the passed in
addr_location. Calling addr_location__exit puts that map and so copies
need to do a map__get. Add in the corresponding map__puts.
v2. Add missing map__put when dso is missing.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ivan Babrou <ivan@cloudflare.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230623043107.4077510-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>