For memcpy, the source pages are memset to zero only when --cycles is
used. This leads to wildly different results with or without --cycles,
since all sources pages are likely to be mapped to the same zero page
without explicit writes.
Before this fix:
$ export cmd="./perf stat -e LLC-loads -- ./perf bench \
mem memcpy -s 1024MB -l 100 -f default"
$ $cmd
2,935,826 LLC-loads
3.821677452 seconds time elapsed
$ $cmd --cycles
217,533,436 LLC-loads
8.616725985 seconds time elapsed
After this fix:
$ $cmd
214,459,686 LLC-loads
8.674301124 seconds time elapsed
$ $cmd --cycles
214,758,651 LLC-loads
8.644480006 seconds time elapsed
Fixes: 47b5757bac ("perf bench mem: Move boilerplate memory allocation to the infrastructure")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel@axis.com
Link: http://lore.kernel.org/lkml/20200810133404.30829-1-vincent.whitchurch@axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit fbd705a0c6 ("sched: Introduce the 'trace_sched_waking'
tracepoint") added sched_waking tracepoint which should be preferred
over sched_wakeup when analyzing scheduling delays.
Update 'perf sched record' to collect sched_waking events if it exists
and fallback to sched_wakeup if it does not. Similarly, update timehist
command to skip sched_wakeup events if the session includes sched_waking
(ie., sched_waking is preferred over sched_wakeup).
Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20200807164844.44870-1-dsahern@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There are a couple of spelling mistakes in the text. Fix these.
Signed-off-by: Colin King <colin.king@canonical.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: http://lore.kernel.org/lkml/20200812064647.200132-1-colin.king@canonical.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Standard benchmark names let users know the tests specifics. For
example "2x1-bw-process" name tells that two processes one thread each
are run and the RAM bandwidth is measured.
Several benchmarks names do not correspond to their actual running
configuration. Fix that and also some whitespace and comment
inconsistencies.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/6b6f2084f132ee8e9203dc7c32f9deb209b87a68.1597004831.git.agordeev@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/d949f5f48e17fc816f3beecf8479f1b2480345e4.1597004831.git.agordeev@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
That helps us not to lose new protocol families when they are
introduced, replacing that hardcoded, dated family->string table.
To recap what this allows us to do:
# perf trace -e syscalls:sys_enter_socket/max-stack=10/ --filter=family==INET --max-events=1
0.000 fetchmail/41097 syscalls:sys_enter_socket(family: INET, type: DGRAM|CLOEXEC|NONBLOCK, protocol: IP)
__GI___socket (inlined)
reopen (/usr/lib64/libresolv-2.31.so)
send_dg (/usr/lib64/libresolv-2.31.so)
__res_context_send (/usr/lib64/libresolv-2.31.so)
__GI___res_context_query (inlined)
__GI___res_context_search (inlined)
_nss_dns_gethostbyname4_r (/usr/lib64/libnss_dns-2.31.so)
gaih_inet.constprop.0 (/usr/lib64/libc-2.31.so)
__GI_getaddrinfo (inlined)
[0x15cb2] (/usr/bin/fetchmail)
#
More work is still needed to allow for the more natura strace-like
syscall name usage instead of the trace event name:
# perf trace -e socket/max-stack=10,family==INET/ --max-events=1
I.e. to allow for modifiers to follow the syscall name and for logical
expressions to be accepted as filters to use with that syscall, be it as
trace event filters or BPF based ones.
Using -v we can see how the trace event filter is built:
# perf trace -v -e syscalls:sys_enter_socket/call-graph=dwarf/ --filter=family==INET --max-events=2
<SNIP>
New filter for syscalls:sys_enter_socket: (family==0x2) && (common_pid != 41384 && common_pid != 2836)
<SNIP>
$ tools/perf/trace/beauty/socket.sh | grep -w 2
[2] = "INET",
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To use with 'perf trace', to convert the protocol families to strings,
e.g:
$ tools/perf/trace/beauty/socket.sh
static const char *socket_families[] = {
[0] = "UNSPEC",
[1] = "LOCAL",
[2] = "INET",
[3] = "AX25",
[4] = "IPX",
[5] = "APPLETALK",
[6] = "NETROM",
[7] = "BRIDGE",
[8] = "ATMPVC",
[9] = "X25",
[10] = "INET6",
[11] = "ROSE",
[12] = "DECnet",
[13] = "NETBEUI",
[14] = "SECURITY",
[15] = "KEY",
[16] = "NETLINK",
[17] = "PACKET",
[18] = "ASH",
[19] = "ECONET",
[20] = "ATMSVC",
[21] = "RDS",
[22] = "SNA",
[23] = "IRDA",
[24] = "PPPOX",
[25] = "WANPIPE",
[26] = "LLC",
[27] = "IB",
[28] = "MPLS",
[29] = "CAN",
[30] = "TIPC",
[31] = "BLUETOOTH",
[32] = "IUCV",
[33] = "RXRPC",
[34] = "ISDN",
[35] = "PHONET",
[36] = "IEEE802154",
[37] = "CAIF",
[38] = "ALG",
[39] = "NFC",
[40] = "VSOCK",
[41] = "KCM",
[42] = "QIPCRTR",
[43] = "SMC",
[44] = "XDP",
};
$
This uses a copy of include/linux/socket.h that is kept in a directory
to be used just for these table generation scripts and for checking if
the kernel has a new file that maybe gets something new for these
tables.
This allows us to:
- Avoid accessing files outside tools/, in the kernel sources, that may
be changed in unexpected ways and thus break these scripts.
- Notice when those files change and thus check if the changes don't
break those scripts, update them to automatically get the new
definitions, a new socket family, for instance.
- Not add then to the tools/include/ where it may end up used while
building the tools and end up requiring dragging yet more stuff from
the kernel or plain break the build in some of the myriad environments
where perf may be built.
This will replace the previous static array in tools/perf/ that was
dated and was already missing the AF_KCM, AF_QIPCRTR, AF_SMC and AF_XDP
families.
The next cset will wire this up to the perf build process.
At some point this must be made into a library to be used in places such
as libtraceevent, bpftrace, etc.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
New features:
- Introduce controlling how 'perf stat' and 'perf record' works via a
control file descriptor, allowing starting with events configured but
disabled until commands are received via the control file descriptor.
This allows, for instance for tools such as Intel VTune to make further
use of perf as its Linux platform driver.
- Improve 'perf record' to to register in a perf.data file header the clockid
used to help later correlate things like syslog files and perf events
recorded.
- Add basic syscall and find_next_bit benchmarks to 'perf bench'.
- Allow using computed metrics in calculating other metrics. For instance:
{
.metric_expr = "l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit",
.metric_name = "DCache_L2_All_Hits",
},
{
.metric_expr = "max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss",
.metric_name = "DCache_L2_All_Miss",
},
{
.metric_expr = "dcache_l2_all_hits + dcache_l2_all_miss",
.metric_name = "DCache_L2_All",
}
- Add suport for 'd_ratio', '>' and '<' operators to the expression resolver used
in calculating metrics in 'perf stat'.
Support for new kernel features:
- Support TEXT_POKE and KSYMBOL_TYPE_OOL perf metadata events to cope with
things like ftrace, trampolines, i.e. changes in the kernel text that gets
in the way of properly decoding Intel PT hardware traces, for instance.
Intel PT:
- Add various knobs to reduce the volume of Intel PT traces by reducing the
level of details such as decoding just some types of packets (e.g., FUP/TIP,
PSB+), also filtering by time range.
- Add new itrace options (log flags to the 'd' option, error flags to the 'e'
one, etc), controlling how Intel PT is transformed into perf events, document
some missing options (e.g., how to synthesize callchains).
BPF:
- Properly report BPF errors when parsing events.
- Do not setup side-band events if LIBBPF is not linked, fixing a segfault.
Libraries:
- Improvements on the libtraceevent plugin mechanism.
- Improve libtracevent support for KVM trace events SVM exit reasons.
- Add a libtracevent plugins for decoding syscalls/sys_enter_futex and for tlb_flush.
- Ensure sample_period is set libpfm4 events in 'perf test'.
- Fixup libperf namespacing, to make sure what is in libperf has the perf_
namespace while what is now only in tools/perf/ doesn't use that prefix.
Arch specific:
- Improve the testing of vendor events and metrics in 'perf test'.
- Allow no ARM CoreSight hardware tracer sink to be specified on command line.
- Fix arm_spe_x recording when mixed with other perf events.
- Add s390 idle functions 'psw_idle' and 'psw_idle_exit' to list of idle symbols.
- List kernel supplied event aliases for arm64 in 'perf list'.
- Add support for extended register capability in PowerPC 9 and 10.
- Added nest IMC power9 metric events.
Miscellaneous:
- No need to setup sample_regs_intr/sample_regs_user for dummy events.
- Update various copies of kernel headers, some causing perf to handle new
syscalls, MSRs, etc.
- Improve usage of flex and yacc, enabling warnings and addressing the fallout.
- Add missing '--output' option to 'perf kmem' so that it can pass it along to 'perf record'.
- 'perf probe' fixes related to adding multiple probes on the same address for
the same event.
- Make 'perf probe' warn if the target function is a GNU indirect function.
- Remove //anon mmap events from 'perf inject jit' to fix supporting both using
ELF files for generated functions and the perf-PID.map approaches.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Test results:
The first ones are container based builds of tools/perf with and without libelf
support. Where clang is available, it is also used to build perf with/without
libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang
when clang and its devel libraries are installed.
The objtool and samples/bpf/ builds are disabled now that I'm switching from
using the sources in a local volume to fetching them from a http server to
build it inside the container, to make it easier to build in a container cluster.
Those will come back later.
Several are cross builds, the ones with -x-ARCH and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.
The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.
Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.
fedora:rawhide with python3 and gcc 10.1.1-2 is failing (10.1.1-1 on fedora:32
works), fixes will be provided soon.
clearlinux:latest is failing on libbpf, there is a fix already in the bpf tree.
The ones failing when linking with libllvm, not the default build, were
restricted to clang-9/llvm-9, working with anything before or after, e.g.,
using clang-8 on ubuntu:19.10 and clang-11 on debian:experimental fixed the
build in those environments.
# export PERF_TARBALL=http://192.168.124.1/perf/perf-5.8.0.tar.xz
# dm
1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final)
2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final)
3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final)
4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0)
5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1)
6 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1)
7 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0)
8 alpine:3.11 : Ok gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.0 (https://git.alpinelinux.org/aports f7f0d2c2b8bcd6a5843401a9a702029556492689) (based on LLVM 9.0.0)
9 alpine:3.12 : Ok gcc (Alpine 9.3.0) 9.3.0, Alpine clang version 10.0.0 (https://gitlab.alpinelinux.org/alpine/aports.git 7445adce501f8473efdb93b17b5eaf2f1445ed4c)
10 alpine:edge : Ok gcc (Alpine 9.3.0) 9.3.0, Alpine clang version 10.0.0 (git://git.alpinelinux.org/aports 7445adce501f8473efdb93b17b5eaf2f1445ed4c)
11 alt:p8 : Ok x86_64-alt-linux-gcc (GCC) 5.3.1 20151207 (ALT p8 5.3.1-alt3.M80P.1), clang version 3.8.0 (tags/RELEASE_380/final)
12 alt:p9 : Ok x86_64-alt-linux-gcc (GCC) 8.4.1 20200305 (ALT p9 8.4.1-alt0.p9.1), clang version 7.0.1
13 alt:sisyphus : Ok x86_64-alt-linux-gcc (GCC) 9.2.1 20200123 (ALT Sisyphus 9.2.1-alt3), clang version 10.0.0
14 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final)
15 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2)
16 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
17 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
18 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
19 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
20 centos:8 : Ok gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), clang version 9.0.1 (Red Hat 9.0.1-2.module_el8.2.0+309+0c7b6b03)
21 clearlinux:latest : FAIL gcc (Clear Linux OS for Intel Architecture) 10.2.1 20200723 releases/gcc-10.2.0-3-g677b80db41, clang version 10.0.1
gcc (Clear Linux OS for Intel Architecture) 10.2.1 20200723 releases/gcc-10.2.0-3-g677b80db41
btf.c: In function 'btf__parse_raw':
btf.c:625:28: error: 'btf' may be used uninitialized in this function [-Werror=maybe-uninitialized]
625 | return err ? ERR_PTR(err) : btf;
| ~~~~~~~~~~~~~~~~~~~^~~~~
22 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0)
23 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final)
24 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8 (tags/RELEASE_701/final)
25 debian:experimental : Ok gcc (Debian 10.2.0-3) 10.2.0, Debian clang version 11.0.0-+rc1-1
26 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 9.3.0-8) 9.3.0
27 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.3.0-19) 8.3.0
28 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 9.3.0-8) 9.3.0
29 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 9.2.1-8) 9.2.1 20190909
30 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
31 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final)
32 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final)
33 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final)
34 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
35 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final)
36 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final)
37 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final)
38 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final)
39 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29)
40 fedora:30 : Ok gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc30)
41 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225
42 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225
43 fedora:31 : Ok gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2), clang version 9.0.1 (Fedora 9.0.1-2.fc31)
44 fedora:32 : Ok gcc (GCC) 10.1.1 20200507 (Red Hat 10.1.1-1), clang version 10.0.0 (Fedora 10.0.0-2.fc32)
45 fedora:rawhide : FAIL gcc (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1), clang version 10.0.0 (Fedora 10.0.0-10.fc33)
gcc (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1)
util/scripting-engines/trace-event-python.c: In function 'python_start_script':
util/scripting-engines/trace-event-python.c:1595:2: error: 'visibility' attribute ignored [-Werror=attributes]
1595 | PyMODINIT_FUNC (*initfunc)(void);
| ^~~~~~~~~~~~~~
46 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 9.3.0-r1 p3) 9.3.0
47 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final)
48 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final)
49 mageia:7 : Ok gcc (Mageia 8.3.1-0.20190524.1.mga7) 8.3.1 20190524, clang version 8.0.0 (Mageia 8.0.0-1.mga7)
50 manjaro:latest : Ok gcc (GCC) 9.2.0, clang version 9.0.0 (tags/RELEASE_900/final)
51 openmandriva:cooker : Ok gcc (GCC) 10.0.0 20200502 (OpenMandriva), clang version 10.0.1
52 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190424 [gcc-7-branch revision 270538], clang version 5.0.1 (tags/RELEASE_501/final 312548)
53 opensuse:15.1 : Ok gcc (SUSE Linux) 7.5.0, clang version 7.0.1 (tags/RELEASE_701/final 349238)
54 opensuse:15.2 : Ok gcc (SUSE Linux) 7.5.0, clang version 9.0.1
55 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5, clang version 3.8.0 (tags/RELEASE_380/final 262553)
56 opensuse:tumbleweed : Ok gcc (SUSE Linux) 10.2.1 20200728 [revision c0438ced53bcf57e4ebb1c38c226e41571aca892], clang version 10.0.1
57 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1)
58 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.5)
59 oraclelinux:8 : Ok gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5.0.3), clang version 9.0.1 (Red Hat 9.0.1-2.0.1.module+el8.2.0+5599+9ed9ef6d)
60 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0)
61 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4
62 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
63 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
64 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
65 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
66 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
67 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
68 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
69 ubuntu:18.04 : Ok gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
70 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
71 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
72 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
73 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
74 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
75 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
76 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
77 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
78 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
79 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
80 ubuntu:18.10 : Ok gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0, clang version 7.0.0-3 (tags/RELEASE_700/final)
81 ubuntu:19.04 : Ok gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, clang version 8.0.0-3 (tags/RELEASE_800/final)
82 ubuntu:19.04-x-alpha : Ok alpha-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0
83 ubuntu:19.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 8.3.0-6ubuntu1) 8.3.0
84 ubuntu:19.04-x-hppa : Ok hppa-linux-gnu-gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0
85 ubuntu:19.10 : Ok gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 8.0.1-3build1 (tags/RELEASE_801/final)
86 219.74 ubuntu:20.04 : Ok gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0, clang version 10.0.0-4ubuntu1
#
# uname -a
Linux quaco 5.7.12-200.fc32.x86_64 #1 SMP Sat Aug 1 16:13:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
# git log --oneline -1
1101c872c8 perf record: Skip side-band event setup if HAVE_LIBBPF_SUPPORT is not set
# perf version --build-options
perf version 5.8.g1101c872c8c7
dwarf: [ on ] # HAVE_DWARF_SUPPORT
dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT
glibc: [ on ] # HAVE_GLIBC_SUPPORT
gtk2: [ on ] # HAVE_GTK2_SUPPORT
syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT
libbfd: [ on ] # HAVE_LIBBFD_SUPPORT
libelf: [ on ] # HAVE_LIBELF_SUPPORT
libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT
numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT
libperl: [ on ] # HAVE_LIBPERL_SUPPORT
libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT
libslang: [ on ] # HAVE_SLANG_SUPPORT
libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT
libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT
libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT
zlib: [ on ] # HAVE_ZLIB_SUPPORT
lzma: [ on ] # HAVE_LZMA_SUPPORT
get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT
bpf: [ on ] # HAVE_LIBBPF_SUPPORT
aio: [ on ] # HAVE_AIO_SUPPORT
zstd: [ on ] # HAVE_ZSTD_SUPPORT
# perf test
1: vmlinux symtab matches kallsyms : Ok
2: Detect openat syscall event : Ok
3: Detect openat syscall event on all cpus : Ok
4: Read samples using the mmap interface : Ok
5: Test data source output : Ok
6: Parse event definition strings : Ok
7: Simple expression parser : Ok
8: PERF_RECORD_* events & perf_sample fields : Ok
9: Parse perf pmu format : Ok
10: PMU events :
10.1: PMU event table sanity : Ok
10.2: PMU event map aliases : Ok
10.3: Parsing of PMU event table metrics : Skip (some metrics failed)
10.4: Parsing of PMU event table metrics with fake PMUs : Ok
11: DSO data read : Ok
12: DSO data cache : Ok
13: DSO data reopen : Ok
14: Roundtrip evsel->name : Ok
15: Parse sched tracepoints fields : Ok
16: syscalls:sys_enter_openat event fields : Ok
17: Setup struct perf_event_attr : Ok
18: Match and link multiple hists : Ok
19: 'import perf' in python : Ok
20: Breakpoint overflow signal handler : Ok
21: Breakpoint overflow sampling : Ok
22: Breakpoint accounting : Ok
23: Watchpoint :
23.1: Read Only Watchpoint : Skip
23.2: Write Only Watchpoint : Ok
23.3: Read / Write Watchpoint : Ok
23.4: Modify Watchpoint : Ok
24: Number of exit events of a simple workload : Ok
25: Software clock events period values : Ok
26: Object code reading : FAILED!
Fix being evaluated
27: Sample parsing : Ok
28: Use a dummy software event to keep tracking : Ok
29: Parse with no sample_id_all bit set : Ok
30: Filter hist entries : Ok
31: Lookup mmap thread : Ok
32: Share thread maps : Ok
33: Sort output of hist entries : Ok
34: Cumulate child hist entries : Ok
35: Track with sched_switch : Ok
36: Filter fds with revents mask in a fdarray : Ok
37: Add fd to a fdarray, making it autogrow : Ok
38: kmod_path__parse : Ok
39: Thread map : Ok
40: LLVM search and compile :
40.1: Basic BPF llvm compile : Ok
40.2: kbuild searching : Ok
40.3: Compile source for BPF prologue generation : Ok
40.4: Compile source for BPF relocation : Ok
41: Session topology : Ok
42: BPF filter :
42.1: Basic BPF filtering : Ok
42.2: BPF pinning : Ok
42.3: BPF prologue generation : Ok
42.4: BPF relocation checker : Ok
43: Synthesize thread map : Ok
44: Remove thread map : Ok
45: Synthesize cpu map : Ok
46: Synthesize stat config : Ok
47: Synthesize stat : Ok
48: Synthesize stat round : Ok
49: Synthesize attr update : Ok
50: Event times : Ok
51: Read backward ring buffer : Ok
52: Print cpu map : Ok
53: Merge cpu map : Ok
54: Probe SDT events : Ok
55: is_printable_array : Ok
56: Print bitmap : Ok
57: perf hooks : Ok
58: builtin clang support : Skip (not compiled in)
59: unit_number__scnprintf : Ok
60: mem2node : Ok
61: time utils : Ok
62: Test jit_write_elf : Ok
63: Test libpfm4 support : Skip (not compiled in)
64: Test api io : Ok
65: maps__merge_in : Ok
66: Demangle Java : Ok
67: Parse and process metrics : Ok
68: x86 rdpmc : Ok
69: Convert perf time to TSC : Ok
70: DWARF unwind : Ok
71: x86 instruction decoder - new instructions : Ok
72: Intel PT packet decoder : Ok
73: x86 bp modify : Ok
74: probe libc's inet_pton & backtrace it with ping : Ok
75: Use vfs_getname probe to get syscall args filenames : Ok
76: Add vfs_getname probe to get syscall args filenames : Ok
77: Check open filename arg using perf trace + vfs_getname: Ok
78: Zstd perf.data compression/decompression : Ok
#
$ cd ~acme/git/perf ; git log --oneline -1; time make -C tools/perf build-test
1101c872c8 (HEAD -> perf/core, quaco/perf/core) perf record: Skip side-band event setup if HAVE_LIBBPF_SUPPORT is not set
make: Entering directory '/home/acme/git/perf/tools/perf'
- tarpkg: ./tests/perf-targz-src-pkg .
make_no_libcrypto_O: make NO_LIBCRYPTO=1
make_no_sdt_O: make NO_SDT=1
make_no_libnuma_O: make NO_LIBNUMA=1
make_no_libaudit_O: make NO_LIBAUDIT=1
make_no_syscall_tbl_O: make NO_SYSCALL_TABLE=1
make_no_newt_O: make NO_NEWT=1
make_no_auxtrace_O: make NO_AUXTRACE=1
make_install_prefix_slash_O: make install prefix=/tmp/krava/
make_no_libbpf_DEBUG_O: make NO_LIBBPF=1 DEBUG=1
make_static_O: make LDFLAGS=-static NO_PERF_READ_VDSO32=1 NO_PERF_READ_VDSOX32=1 NO_JVMTI=1
make_pure_O: make
make_install_bin_O: make install-bin
make_no_libelf_O: make NO_LIBELF=1
make_util_pmu_bison_o_O: make util/pmu-bison.o
make_with_babeltrace_O: make LIBBABELTRACE=1
make_debug_O: make DEBUG=1
make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1 NO_LIBZSTD=1 NO_LIBCAP=1 NO_SYSCALL_TABLE=1
make_with_clangllvm_O: make LIBCLANGLLVM=1
make_no_libbionic_O: make NO_LIBBIONIC=1
make_tags_O: make tags
make_doc_O: make doc
make_no_gtk2_O: make NO_GTK2=1
make_no_libbpf_O: make NO_LIBBPF=1
make_no_backtrace_O: make NO_BACKTRACE=1
make_install_prefix_O: make install prefix=/tmp/krava
make_no_slang_O: make NO_SLANG=1
make_no_demangle_O: make NO_DEMANGLE=1
make_no_libpython_O: make NO_LIBPYTHON=1
make_no_libperl_O: make NO_LIBPERL=1
make_clean_all_O: make clean all
make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
make_with_libpfm4_O: make LIBPFM4=1
make_help_O: make help
make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
make_no_libunwind_O: make NO_LIBUNWIND=1
make_util_map_o_O: make util/map.o
make_install_O: make install
make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
make_perf_o_O: make perf.o
OK
make: Leaving directory '/home/acme/git/perf/tools/perf'
$
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXzFq5QAKCRCyPKLppCJ+
J46OAP40WV9uE1L+3NznUF5D+zh7++SquzEBoABZiYNAXNhrGQEA2QZqAspkbLoo
hCM/yo7lO1XixiTGlp533b14OvE5oQk=
=n4VQ
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo:
"New features:
- Introduce controlling how 'perf stat' and 'perf record' works via a
control file descriptor, allowing starting with events configured
but disabled until commands are received via the control file
descriptor. This allows, for instance for tools such as Intel VTune
to make further use of perf as its Linux platform driver.
- Improve 'perf record' to to register in a perf.data file header the
clockid used to help later correlate things like syslog files and
perf events recorded.
- Add basic syscall and find_next_bit benchmarks to 'perf bench'.
- Allow using computed metrics in calculating other metrics. For
instance:
{
.metric_expr = "l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit",
.metric_name = "DCache_L2_All_Hits",
},
{
.metric_expr = "max(l2_rqsts.all_demand_data_rd - l2_rqsts.demand_data_rd_hit, 0) + l2_rqsts.pf_miss + l2_rqsts.rfo_miss",
.metric_name = "DCache_L2_All_Miss",
},
{
.metric_expr = "dcache_l2_all_hits + dcache_l2_all_miss",
.metric_name = "DCache_L2_All",
}
- Add suport for 'd_ratio', '>' and '<' operators to the expression
resolver used in calculating metrics in 'perf stat'.
Support for new kernel features:
- Support TEXT_POKE and KSYMBOL_TYPE_OOL perf metadata events to cope
with things like ftrace, trampolines, i.e. changes in the kernel
text that gets in the way of properly decoding Intel PT hardware
traces, for instance.
Intel PT:
- Add various knobs to reduce the volume of Intel PT traces by
reducing the level of details such as decoding just some types of
packets (e.g., FUP/TIP, PSB+), also filtering by time range.
- Add new itrace options (log flags to the 'd' option, error flags to
the 'e' one, etc), controlling how Intel PT is transformed into
perf events, document some missing options (e.g., how to synthesize
callchains).
BPF:
- Properly report BPF errors when parsing events.
- Do not setup side-band events if LIBBPF is not linked, fixing a
segfault.
Libraries:
- Improvements to the libtraceevent plugin mechanism.
- Improve libtracevent support for KVM trace events SVM exit reasons.
- Add a libtracevent plugins for decoding syscalls/sys_enter_futex
and for tlb_flush.
- Ensure sample_period is set libpfm4 events in 'perf test'.
- Fixup libperf namespacing, to make sure what is in libperf has the
perf_ namespace while what is now only in tools/perf/ doesn't use
that prefix.
Arch specific:
- Improve the testing of vendor events and metrics in 'perf test'.
- Allow no ARM CoreSight hardware tracer sink to be specified on
command line.
- Fix arm_spe_x recording when mixed with other perf events.
- Add s390 idle functions 'psw_idle' and 'psw_idle_exit' to list of
idle symbols.
- List kernel supplied event aliases for arm64 in 'perf list'.
- Add support for extended register capability in PowerPC 9 and 10.
- Added nest IMC power9 metric events.
Miscellaneous:
- No need to setup sample_regs_intr/sample_regs_user for dummy
events.
- Update various copies of kernel headers, some causing perf to
handle new syscalls, MSRs, etc.
- Improve usage of flex and yacc, enabling warnings and addressing
the fallout.
- Add missing '--output' option to 'perf kmem' so that it can pass it
along to 'perf record'.
- 'perf probe' fixes related to adding multiple probes on the same
address for the same event.
- Make 'perf probe' warn if the target function is a GNU indirect
function.
- Remove //anon mmap events from 'perf inject jit' to fix supporting
both using ELF files for generated functions and the perf-PID.map
approaches"
* tag 'perf-tools-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (144 commits)
perf record: Skip side-band event setup if HAVE_LIBBPF_SUPPORT is not set
perf tools powerpc: Add support for extended regs in power10
perf tools powerpc: Add support for extended register capability
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
tools arch x86: Sync asm/cpufeatures.h with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools headers UAPI: update linux/in.h copy
tools headers API: Update close_range affected files
perf script: Add 'tod' field to display time of day
perf script: Change the 'enum perf_output_field' enumerators to be 64 bits
perf data: Add support to store time of day in CTF data conversion
perf tools: Move clockid_res_ns under clock struct
perf header: Store clock references for -k/--clockid option
perf tools: Add clockid_name function
perf clockid: Move parse_clockid() to new clockid object
tools lib traceevent: Handle possible strdup() error in tep_add_plugin_path() API
libtraceevent: Fixed description of tep_add_plugin_path() API
libtraceevent: Fixed type in PRINT_FMT_STING
libtraceevent: Fixed broken indentation in parse_ip4_print_args()
libtraceevent: Improve error handling of tep_plugin_add_option() API
...
- Add support for (optionally) using queued spinlocks & rwlocks.
- Support for a new faster system call ABI using the scv instruction on Power9
or later.
- Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on
Power10 and future processors, leaving us with no way to implement the
functionality it requests. This risks breaking userspace, though we believe
it is unused in practice.
- A bug fix for, and then the removal of, our custom stack expansion checking.
We now allow stack expansion up to the rlimit, like other architectures.
- Remove the remnants of our (previously disabled) topology update code, which
tried to react to NUMA layout changes on virtualised systems, but was prone
to crashes and other problems.
- Add PMU support for Power10 CPUs.
- A change to our signal trampoline so that we don't unbalance the link stack
(branch return predictor) in the signal delivery path.
- Lots of other cleanups, refactorings, smaller features and so on as usual.
Thanks to:
Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy,
Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton
Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill
Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy,
Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A.
Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar,
Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini,
Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe,
Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li
RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal
Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan
Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver
O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe
Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju,
Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung
Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong,
YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl8tOxATHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDQfEAClXHWf6hnxB84bEu39D51NkVotL1IG
BRWFvyix+xHuUkHIouBPAAMl6ngY5X6wkYd+Z+CY9zHNtdSDoVlJE30YXdMQA/dE
L/rYxR1884yGR/uU/3wusboO68ReXwcKQPmKOymUfh0zH7ujyJsSWLpXFK1YDC5d
2TVVTi0Q+P5ucMHDh0L+AHirIxZvtZSp43+J7xLtywsj+XAxJWCTGo5WCJbdgbCA
Qbv3aOkVyUa3EgsbdM/STPpv82ebqT+PHxeSIO4Jw6ZODtKRH0R5YsWCApuY9eZ+
ebY9RLmgv9ZAhJqB2fv9A5NDcMoGpZNmjM7HrWpXwULKQpkBGHCzJ9FcSdHVMOx8
nbVMFjt4uzLwV1w8lFYslQ2tNH/uH2o9BlryV1RLpiiKokDAJO/NOsWN9y0u/I4J
EmAM5DSX2LgVvvas96IlGK8KX4xkOkf8FLX/H5UDvvAfloH8J4CZXk/CWCab/nqY
KEHPnMmYvQZ1w9SzyZg9sO/1p6Bl1Gmm75Jv2F1lBiRW/42VcGBI/qLsJ4lC59Fc
KbwufYNYYG38wbxDLW1HAPJhRonxIcaZj3EEqk7aTiLZ55nNbu8e2k32CpNXTGqt
npOhzJHimcq7L6+878ZW+xpbZwogIEUdRSsmwb6aT8za3ShnYwSA2Q3LYxh9xyGH
j3GifvPq6Efp3Q==
=QMY1
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add support for (optionally) using queued spinlocks & rwlocks.
- Support for a new faster system call ABI using the scv instruction on
Power9 or later.
- Drop support for the PROT_SAO mmap/mprotect flag as it will be
unsupported on Power10 and future processors, leaving us with no way
to implement the functionality it requests. This risks breaking
userspace, though we believe it is unused in practice.
- A bug fix for, and then the removal of, our custom stack expansion
checking. We now allow stack expansion up to the rlimit, like other
architectures.
- Remove the remnants of our (previously disabled) topology update
code, which tried to react to NUMA layout changes on virtualised
systems, but was prone to crashes and other problems.
- Add PMU support for Power10 CPUs.
- A change to our signal trampoline so that we don't unbalance the link
stack (branch return predictor) in the signal delivery path.
- Lots of other cleanups, refactorings, smaller features and so on as
usual.
Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey
Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju
T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan
S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris
Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan
Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn
Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand,
Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel
Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh
Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan
Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton
Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran,
Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud,
Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar
Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza
Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov,
Wei Yongjun, Wen Xiong, YueHaibing.
* tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits)
selftests/powerpc: Fix pkey syscall redefinitions
powerpc: Fix circular dependency between percpu.h and mmu.h
powerpc/powernv/sriov: Fix use of uninitialised variable
selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
powerpc/40x: Fix assembler warning about r0
powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
cpuidle: pseries: Fixup exit latency for CEDE(0)
cpuidle: pseries: Add function to parse extended CEDE records
cpuidle: pseries: Set the latency-hint before entering CEDE
selftests/powerpc: Fix online CPU selection
powerpc/perf: Consolidate perf_callchain_user_[64|32]()
powerpc/pseries/hotplug-cpu: Remove double free in error path
powerpc/pseries/mobility: Add pr_debug() for device tree changes
powerpc/pseries/mobility: Set pr_fmt()
powerpc/cacheinfo: Warn if cache object chain becomes unordered
powerpc/cacheinfo: Improve diagnostics about malformed cache lists
powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages
powerpc/cacheinfo: Set pr_fmt()
powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
...
We received an error report that perf-record caused 'Segmentation fault'
on a newly system (e.g. on the new installed ubuntu).
(gdb) backtrace
#0 __read_once_size (size=4, res=<synthetic pointer>, p=0x14) at /root/0-jinyao/acme/tools/include/linux/compiler.h:139
#1 atomic_read (v=0x14) at /root/0-jinyao/acme/tools/include/asm/../../arch/x86/include/asm/atomic.h:28
#2 refcount_read (r=0x14) at /root/0-jinyao/acme/tools/include/linux/refcount.h:65
#3 perf_mmap__read_init (map=map@entry=0x0) at mmap.c:177
#4 0x0000561ce5c0de39 in perf_evlist__poll_thread (arg=0x561ce68584d0) at util/sideband_evlist.c:62
#5 0x00007fad78491609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007fad7823c103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The root cause is, evlist__add_bpf_sb_event() just returns 0 if
HAVE_LIBBPF_SUPPORT is not defined (inline function path). So it will
not create a valid evsel for side-band event.
But perf-record still creates BPF side band thread to process the
side-band event, then the error happpens.
We can reproduce this issue by removing the libelf-dev. e.g.
1. apt-get remove libelf-dev
2. perf record -a -- sleep 1
root@test:~# ./perf record -a -- sleep 1
perf: Segmentation fault
Obtained 6 stack frames.
./perf(+0x28eee8) [0x5562d6ef6ee8]
/lib/x86_64-linux-gnu/libc.so.6(+0x46210) [0x7fbfdc65f210]
./perf(+0x342e74) [0x5562d6faae74]
./perf(+0x257e39) [0x5562d6ebfe39]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7fbfdc990609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fbfdc73b103]
Segmentation fault (core dumped)
To fix this issue,
1. We either install the missing libraries to let HAVE_LIBBPF_SUPPORT
be defined.
e.g. apt-get install libelf-dev and install other related libraries.
2. Use this patch to skip the side-band event setup if HAVE_LIBBPF_SUPPORT
is not set.
Committer notes:
The side band thread is not used just with BPF, it is also used with
--switch-output-event, so narrow the ifdef to the BPF specific part.
Fixes: 23cbb41c93 ("perf record: Move side band evlist setup to separate routine")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200805022937.29184-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Added support for supported regs which are new in power10 ( MMCR3,
SIER2, SIER3 ) to sample_reg_mask in the tool side to use with `-I?`
option. Also added PVR check to send extended mask for power10 at kernel
while capturing extended regs in each sample.
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add extended regs to sample_reg_mask in the tool side to use with `-I?`
option. Perf tools side uses extended mask to display the platform
supported register names (with -I? option) to the user and also send
this mask to the kernel to capture the extended registers in each
sample. Hence decide the mask value based on the processor version.
Currently definitions for `mfspr`, `SPRN_PVR` are part of
`arch/powerpc/util/header.c`. Move this to a header file so that these
definitions can be re-used in other source files as well.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Reviewed--by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org> <mikey@neuling.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
[Decide extended mask at run time based on platform]
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
55db9c0e85 ("net: remove compat_sys_{get,set}sockopt")
9b4feb630e ("arch: wire-up close_range()")
That automagically add the 'close_range' syscall to tools such as 'perf
trace'.
Before:
# perf trace -e close_range
event syntax error: 'close_range'
\___ parser error
Run 'perf list' for a list of valid events
Usage: perf trace [<options>] [<command>]
or: perf trace [<options>] -- <command> [<options>]
or: perf trace record [<options>] [<command>]
or: perf trace record [<options>] -- <command> [<options>]
-e, --event <event> event/syscall selector. use 'perf list' to list available events
#
After, system wide strace like tracing for this syscall:
# perf trace -e close_range
^C#
No calls, I need some test proggie :-)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a 'tod' field to display time of day column with time of date
(wallclock) time.
# perf record -k CLOCK_MONOTONIC kill
kill: not enough arguments
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.033 MB perf.data (8 samples) ]
# perf script
perf 261340 152919.481538: 1 cycles: ffffffff8106d104 ...
perf 261340 152919.481543: 1 cycles: ffffffff8106d104 ...
perf 261340 152919.481545: 7 cycles: ffffffff8106d104 ...
...
# perf script --ns
perf 261340 152919.481538922: 1 cycles: ffffffff8106d ...
perf 261340 152919.481543286: 1 cycles: ffffffff8106d ...
perf 261340 152919.481545397: 7 cycles: ffffffff8106d ...
...
# perf script -F+tod
perf 261340 2020-07-13 18:26:55.620971 152919.481538: ...
perf 261340 2020-07-13 18:26:55.620975 152919.481543: ...
perf 261340 2020-07-13 18:26:55.620978 152919.481545: ...
...
# perf script -F+tod --ns
perf 261340 2020-07-13 18:26:55.620971621 152919.481538922: ...
perf 261340 2020-07-13 18:26:55.620975985 152919.481543286: ...
perf 261340 2020-07-13 18:26:55.620978096 152919.481545397: ...
...
It's available only for recording with clockid specified, because it's
the only case where we can get reference time to wallclock time. It's
can't do that with perf clock yet.
Error is display if you want to use --tod on data without clockid
specified:
# perf script -F+tod
Can't provide 'tod' time, missing clock data. Please record with -k/--clockid option.
Original-patch-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Geneviève Bastien <gbastien@versatic.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20200805093444.314999-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So it's possible to add new values. I did not find any place where the
enum values are passed through some number type, so it's safe to make
this change.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Geneviève Bastien <gbastien@versatic.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20200805093444.314999-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adad support to convert and store time of day in CTF data conversion for
'perf data convert' subcommand.
The perf.data used for conversion needs to have clock data information -
must be recorded with -k/--clockid option).
New --tod option is added to 'perf data convert' subcommand to convert
data with timestamps converted to wall clock time.
Record data with clockid set:
# perf record -k CLOCK_MONOTONIC kill
kill: not enough arguments
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.033 MB perf.data (8 samples) ]
Convert data with TOD timestamps:
# perf data convert --tod --to-ctf ./ctf
[ perf data convert: Converted 'perf.data' into CTF data './ctf' ]
[ perf data convert: Converted and wrote 0.000 MB (8 samples) ]
Display data in perf script:
# perf script -F+tod --ns
perf 262150 2020-07-13 18:38:50.097678523 153633.958246159: 1 cycles: ...
perf 262150 2020-07-13 18:38:50.097682941 153633.958250577: 1 cycles: ...
perf 262150 2020-07-13 18:38:50.097684997 153633.958252633: 7 cycles: ...
...
Display data in babeltrace:
# babeltrace --clock-date ./ctf
[2020-07-13 18:38:50.097678523] (+?.?????????) cycles: { cpu_id = 0 }, { perf_ip = 0xFFF ...
[2020-07-13 18:38:50.097682941] (+0.000004418) cycles: { cpu_id = 0 }, { perf_ip = 0xFFF ...
[2020-07-13 18:38:50.097684997] (+0.000002056) cycles: { cpu_id = 0 }, { perf_ip = 0xFFF ...
...
It's available only for recording with clockid specified, because it's
the only case where we can get reference time to wallclock time. It's
can't do that with perf clock yet.
Error is display if you want to use --tod on data without clockid
specified:
# perf data convert --tod --to-ctf ./ctf
Can't provide --tod time, missing clock data. Please record with -k/--clockid option.
Failed to setup CTF writer.
Error during conversion setup.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Geneviève Bastien <gbastien@versatic.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20200805093444.314999-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move the clockid_res_ns struct member to the clock struct, so we have
the clock related stuff in one place.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Geneviève Bastien <gbastien@versatic.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20200805093444.314999-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a new CLOCK_DATA feature that stores reference times when
-k/--clockid option is specified.
It contains the clock id and its reference time together with wall clock
time taken at the 'same time', both values are in nanoseconds.
The format of data is as below:
struct {
u32 version; /* version = 1 */
u32 clockid;
u64 wall_clock_ns;
u64 clockid_time_ns;
};
This clock reference times will be used in following changes to display
wall clock for perf events.
It's available only for recording with clockid specified, because it's
the only case where we can get reference time to wallclock time. It's
can't do that with perf clock yet.
Committer testing:
$ perf record -h -k
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-k, --clockid <clockid>
clockid to use for events, see clock_gettime()
$ perf record -k monotonic sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (8 samples) ]
$ perf report --header-only | grep clockid -A1
# event : name = cycles:u, , id = { 88815, 88816, 88817, 88818, 88819, 88820, 88821, 88822 }, size = 120, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, read_format = ID, disabled = 1, inherit = 1, exclude_kernel = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, use_clockid = 1, ksymbol = 1, bpf_event = 1, clockid = 1
# CPU_TOPOLOGY info available, use -I to display
--
# clockid frequency: 1000 MHz
# cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake
# clockid: monotonic (1)
# reference time: 2020-08-06 09:40:21.619290 = 1596717621.619290 (TOD) = 21931.077673635 (monotonic)
$
Original-patch-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Geneviève Bastien <gbastien@versatic.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20200805093444.314999-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add the clockid_name() function to get the clock name based on its
clockid. It will be used in the following changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Geneviève Bastien <gbastien@versatic.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20200805093444.314999-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move parse_clockid and all needed clcckid related stuff into clockid
object. We are going to add clockid_name function in following change,
so it's better it's placed in separated object and not in
builtin-record.c.
No functional change is intended.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Geneviève Bastien <gbastien@versatic.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20200805093444.314999-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adjust limited access message to mention CAP_SYS_PTRACE capability for
processes of unprivileged users. Add link to perf security document in
the end of the section about capabilities.
The change has been inspired by this discussion:
https://lore.kernel.org/lkml/20200722113007.GI77866@kernel.org/
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/6f8a7425-6e7d-19aa-1605-e59836b9e2a6@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
A single q option decodes ip from only FUP/TIP packets. Make it so that
repeating the q option (i.e. qq) decodes only PSB+, getting ip if there
is a FUP packet within PSB+ (i.e. between PSB and PSBEND).
Example:
$ perf record -e intel_pt//u grep -rI pudding drivers
[ perf record: Woken up 52 times to write data ]
[ perf record: Captured and wrote 57.870 MB perf.data ]
$ time perf script --itrace=bi | wc -l
58948289
real 1m23.863s
user 1m23.251s
sys 0m7.452s
$ time perf script --itrace=biq | wc -l
3385694
real 0m4.453s
user 0m4.455s
sys 0m0.328s
$ time perf script --itrace=biqq | wc -l
1883
real 0m0.047s
user 0m0.043s
sys 0m0.009s
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200710151104.15137-13-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use the new itrace 'q' option to add support for a mode of decoding that
ignores TNT, does not walk object code, but gets the ip from FUP and TIP
packets.
Example:
$ perf record -e intel_pt//u grep -rI pudding drivers
[ perf record: Woken up 52 times to write data ]
[ perf record: Captured and wrote 57.870 MB perf.data ]
$ time perf script --itrace=bi | wc -l
58948289
real 1m23.863s
user 1m23.251s
sys 0m7.452s
$ time perf script --itrace=biq | wc -l
3385694
real 0m4.453s
user 0m4.455s
sys 0m0.328s
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200710151104.15137-12-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'q' option is for modes of decoding that are quicker because they
skip or omit decoding some aspects of trace data.
If supported, the 'q' option may be repeated to increase the effect.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200710151104.15137-11-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Change the debug logging (when used with the --time option) to time
filter logged perf events, but allow that to be overridden by using
"d+a" instead of plain "d".
That can reduce the size of the log file.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200710151104.15137-10-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The "d" option may be followed by flags which affect what debug messages
will or will not be logged. Each flag must be preceded by either '+' or
'-'. The flags support by Intel PT are:
-a Suppress logging of perf events
Suppressing perf events is useful for decreasing the size of the log.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200710151104.15137-9-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Allow the 'd' option to be followed by flags which will affect what debug
messages will or will not be reported. Each flag must be preceded by either
'+' or '-'. The flags are:
a all perf events
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200710151104.15137-8-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The itrace "e" option may be followed by flags which affect what errors
will or will not be reported. Each flag must be preceded by either '+' or '-'.
The flags supported by Intel PT are:
-o Suppress overflow errors
-l Suppress trace data lost errors
For example, for errors but not overflow or data lost errors:
--itrace=e-o-l
Suppressing those errors can be useful for testing and debugging because
they are not due to decoding.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200710151104.15137-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Allow the 'e' option to be followed by flags which will affect what errors
will or will not be reported. Each flag must be preceded by either '+' or
'-'. The flags are:
o overflow
l trace data lost
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200710151104.15137-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add missing itrace options o, G and L.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200710151104.15137-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While walking code towards a FUP ip, the packet state is
INTEL_PT_STATE_FUP or INTEL_PT_STATE_FUP_NO_TIP. That was mishandled
resulting in the state becoming INTEL_PT_STATE_IN_SYNC prematurely. The
result was an occasional lost EXSTOP event.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20200710151104.15137-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To sync headers, for instance, in this case tools/perf was ahead of
upstream till Linus merged tip/perf/core to get the
PERF_RECORD_TEXT_POKE changes:
Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since commit 0a892c1c94 ("perf record: Add dummy event during system wide synthesis"),
a dummy event is added to capture mmaps.
But if we run perf-record as,
# perf record -e cycles:p -IXMM0 -a -- sleep 1
Error:
dummy:HG: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
The issue is, if we enable the extended regs (-IXMM0), but the
pmu->capabilities is not set with PERF_PMU_CAP_EXTENDED_REGS, the kernel
will return -EOPNOTSUPP error.
See following code:
/* in kernel/events/core.c */
static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
{
....
if (!(pmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS) &&
has_extended_regs(event))
ret = -EOPNOTSUPP;
....
}
For software dummy event, the PMU should not be set with
PERF_PMU_CAP_EXTENDED_REGS. But unfortunately now, the dummy
event has possibility to be set with PERF_REG_EXTENDED_MASK bit.
In evsel__config, /* tools/perf/util/evsel.c */
if (opts->sample_intr_regs) {
attr->sample_regs_intr = opts->sample_intr_regs;
}
If we use -IXMM0, the attr>sample_regs_intr will be set with
PERF_REG_EXTENDED_MASK bit.
It doesn't make sense to set attr->sample_regs_intr for a
software dummy event.
This patch adds dummy event checking before setting
attr->sample_regs_intr and attr->sample_regs_user.
After:
# ./perf record -e cycles:p -IXMM0 -a -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.413 MB perf.data (45 samples) ]
Committer notes:
Adrian said this when providing his Acked-by:
"
This is fine. It will not break PT.
no_aux_samples is useful for evsels that have been added by the code rather
than requested by the user. For old kernels PT adds sched_switch tracepoint
to track context switches (before the current context switch event was
added) and having auxiliary sample information unnecessarily uses up space
in the perf buffer.
"
Fixes: 0a892c1c94 ("perf record: Add dummy event during system wide synthesis")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200720010013.18238-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introduce --control fd:ctl-fd[,ack-fd] options to pass open file
descriptors numbers from command line.
Extend perf-record.txt file with --control fd:ctl-fd[,ack-fd] options
description.
Document possible usage model introduced by --control fd:ctl-fd[,ack-fd]
options by providing example bash shell script.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/8dc01e1a-3a80-3f67-5385-4bc7112b0dd3@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Implement handling of 'enable' and 'disable' control commands coming
from control file descriptor.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/f0fde590-1320-dca1-39ff-da3322704d3b@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Extend -D,--delay option with -1 to start collection with events
disabled to be enabled later by 'enable' command provided via control
file descriptor.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/3e7d362c-7973-ee5d-e81e-c60ea22432c3@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introduce --control fd:ctl-fd[,ack-fd] options to pass open file
descriptors numbers from command line. Extend perf-stat.txt file with
--control fd:ctl-fd[,ack-fd] options description. Document possible
usage model introduced by --control fd:ctl-fd[,ack-fd] options by
providing example bash shell script.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/feabd5cf-0155-fb0a-4587-c71571f2d517@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Minor conflict in tools/perf/arch/arm/util/auxtrace.c as one fix there
was cherry-picked for the last perf/urgent pull req to Linus, so was
already there.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When recording with cache-misses and arm_spe_x event, I found that it
will just fail without showing any error info if i put cache-misses
after 'arm_spe_x' event.
[root@localhost 0620]# perf record -e cache-misses \
-e arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,jitter=1,store_filter=1,min_latency=0/ sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.067 MB perf.data ]
[root@localhost 0620]#
[root@localhost 0620]# perf record -e arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,jitter=1,store_filter=1,min_latency=0/ \
-e cache-misses sleep 1
[root@localhost 0620]#
The current code can only work if the only event to be traced is an
'arm_spe_x', or if it is the last event to be specified. Otherwise the
last event type will be checked against all the arm_spe_pmus[i]->types,
none will match and an out of bound 'i' index will be used in
arm_spe_recording_init().
We don't support concurrent multiple arm_spe_x events currently, that
is checked in arm_spe_recording_options(), and it will show the relevant
info. So add the check and record of the first found 'arm_spe_pmu' to
fix this issue here.
Fixes: ffd3d18c20 ("perf tools: Add ARM Statistical Profiling Extensions (SPE) support")
Signed-off-by: Wei Li <liwei391@huawei.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20200724071111.35593-2-liwei391@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 5aa98879ef ("s390/cpum_sf: prohibit callchain data collection")
prohibits call graph sampling for hardware events on s390. The
information recorded is out of context and does not match.
On s390 this commit now breaks test case 68 Zstd perf.data
compression/decompression.
Therefore omit call graph sampling on s390 in this test.
Output before:
[root@t35lp46 perf]# ./perf test -Fv 68
68: Zstd perf.data compression/decompression :
--- start ---
Collecting compressed record file:
Error:
cycles: PMU Hardware doesn't support sampling/overflow-interrupts.
Try 'perf stat'
---- end ----
Zstd perf.data compression/decompression: FAILED!
[root@t35lp46 perf]#
Output after:
[root@t35lp46 perf]# ./perf test -Fv 68
68: Zstd perf.data compression/decompression :
--- start ---
Collecting compressed record file:
500+0 records in
500+0 records out
256000 bytes (256 kB, 250 KiB) copied, 0.00615638 s, 41.6 MB/s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.004 MB /tmp/perf.data.X3M,
compressed (original 0.002 MB, ratio is 3.609) ]
Checking compressed events stats:
# compressed : Zstd, level = 1, ratio = 4
COMPRESSED events: 1
2ELIFREPh---- end ----
Zstd perf.data compression/decompression: Ok
[root@t35lp46 perf]#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20200729135314.91281-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Following the previous change that rename egroup to metric, there's no
reason to call the list 'group_list' anymore, renaming it to
metric_list.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-20-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Renaming struct egroup to metric, because it seems to make more sense.
Plus renaming all the variables that hold egroup to appropriate names.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-19-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding test for metric group plus compute_metric_group function to get
metrics values within the group.
Committer notes:
Fixed this;
tests/parse-metric.c:327:7: error: missing field 'val' initializer [-Werror,-Wmissing-field-initializers]
{ 0 },
^
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-18-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So far compute_single function relies on the fact, that there's only
single metric defined within evlist in all tests. In following patch we
will add test for metric group, so we need to be able to compute metric
by given name.
Adding the name argument to compute_single and iterating evlist and
evsel's expression to find the given metric.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-17-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Keeping the stack of nested metrics via 'struct expr_id' objects
and checking if we are in recursion via already processed metric.
The stack is implemented as static array within the struct egroup
with 100 entries, which should be enough nesting depth for any
metric we have or plan to have at the moment.
Adding test that simulates the recursion and checks we can
detect it.
Committer notes:
Bumped RECURSION_ID_MAX to 1000 as per Jiri's reply to Paul Clark on the
patch series e-mail discussion.
Fixed these:
tests/parse-metric.c:308:7: error: missing field 'val' initializer [-Werror,-Wmissing-field-initializers]
{ 0 },
^
util/metricgroup.c:924:28: error: missing field 'parent' initializer [-Werror,-Wmissing-field-initializers]
struct expr_ids ids = { 0 };
^
util/metricgroup.c:924:26: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
struct expr_ids ids = { 0 };
^
{}
util/metricgroup.c:924:26: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
struct expr_ids ids = { 0 };
^
{}
util/metricgroup.c:924:28: error: missing field 'cnt' initializer [-Werror,-Wmissing-field-initializers]
struct expr_ids ids = { 0 };
^
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-16-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding test that compute DCache_L2 metrics with other related metrics in it.
Committer notes:
Fixed up this:
tests/parse-metric.c:285:7: error: missing field 'val' initializer [-Werror,-Wmissing-field-initializers]
{ 0 },
^
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-15-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding test that compute metric with other metrics in it.
cache_miss_cycles = metric:dcache_miss_cpi + metric:icache_miss_cycles
Committer notes:
Fixed up initializer to cope with:
tests/parse-metric.c:242:7: error: missing field 'val' initializer [-Werror,-Wmissing-field-initializers]
{ 0 },
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-14-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There's no need to iterate the whole list of groups, when adding new
events. The currently created groups are the ones we want to add.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-13-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding computation (expr__parse call) of referenced metric at
the point when it needs to be resolved during the parent metric
computation.
Once the inner metric is computed, the result is stored and
used if there's another usage of that metric.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-12-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding referenced metrics to the parsing context so they can be resolved
during the metric processing.
Adding expr__add_ref function to store referenced metrics into parse
context.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-11-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add referenced metrics into struct metric_expr object, so they are
accessible when computing the metric.
Storing just name and expression itself, so the metric can be resolved
and computed.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Collecting referenced metrics in struct metric_ref_node object,
so we can process them later on.
The change will parse nested metric names out of expression and
'resolve' them.
All referenced metrics are dissolved into one context, meaning all
nested metrics events and added to the parent context.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Renaming __metricgroup__add_metric to __add_metric to fit in the current
function names.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Decouple metric adding logging into add_metric function,
so it can be used from other places in following changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding following macros to iterate events and metric:
map_for_each_event(__pe, __idx, __map)
- iterates over all pmu_events_map events
map_for_each_metric(__pe, __idx, __map, __metric)
- iterates over all metrics that match __metric argument
and use it in metricgroup__add_metric function. Macros will be be used
from other places in following changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding expr__del_id function to remove ID from hashmap. It will save us
few lines in following changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Changing expr__get_id to use and return struct expr_id_data
pointer as value for the ID. This way we can access data other
than value for given ID in following changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add the expr__add_id() function to data for ID with zero value, which is
used when scanning the expression for IDs.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo found that we don't release value data in case the hashmap__set
fails. Releasing it in case of an error.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200719181320.785305-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Test that a command line option doesn't override the period set on a
libpfm4 event.
Without libpfm4 test passes as unsupported.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20200728085734.609930-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jin Yao reported issue with possible conflict between raw events and
term values in pmu event syntax.
Currently following syntax is resolved as raw event with 0xead value:
uncore_imc_free_running/read/
instead of using 'read' term from uncore_imc_free_running pmu, because
'read' is correct raw event syntax with 0xead value.
To solve this issue we do following:
- check existing terms during rXXXX syntax processing
and make them priority in case of conflict
- allow pmu/r0x1234/ syntax to be able to specify conflicting
raw event (implemented in previous patch)
Also add automated tests for this and perf_pmu__parse_cleanup call to
parse_events_terms, so the test gets properly cleaned up.
Fixes: 3a6c51e4d6 ("perf parser: Add support to specify rXXX event with pmu")
Reported-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20200726075244.1191481-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add support to specify raw event with 'r0<HEX>' syntax within pmu term
syntax like:
-e cpu/r0xdead/
It will be used to specify raw events in cases where they conflict with
real pmu terms, like 'read', which is valid raw event syntax, but also a
possible pmu term name as reported by Jin Yao.
Reported-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20200725121959.1181869-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
- auxtrace_record__init() is called only once, so there is no point in
using a static variable to cache the results of
find_all_arm_spe_pmus(), make it local and free the results after use.
- Another reason is, even though SPE is micro-architecture dependent,
but so far it only supports "statistical-profiling-extension-v1" and
we have no chance to use multiple SPE's PMU events in Perf command.
So remove the useless check code to make it clear.
Signed-off-by: Wei Li <liwei391@huawei.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20200724071111.35593-3-liwei391@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When recording with cache-misses and arm_spe_x event, I found that it
will just fail without showing any error info if i put cache-misses
after 'arm_spe_x' event.
[root@localhost 0620]# perf record -e cache-misses \
-e arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,jitter=1,store_filter=1,min_latency=0/ sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.067 MB perf.data ]
[root@localhost 0620]#
[root@localhost 0620]# perf record -e arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,jitter=1,store_filter=1,min_latency=0/ \
-e cache-misses sleep 1
[root@localhost 0620]#
The current code can only work if the only event to be traced is an
'arm_spe_x', or if it is the last event to be specified. Otherwise the
last event type will be checked against all the arm_spe_pmus[i]->types,
none will match and an out of bound 'i' index will be used in
arm_spe_recording_init().
We don't support concurrent multiple arm_spe_x events currently, that
is checked in arm_spe_recording_options(), and it will show the relevant
info. So add the check and record of the first found 'arm_spe_pmu' to
fix this issue here.
Fixes: ffd3d18c20 ("perf tools: Add ARM Statistical Profiling Extensions (SPE) support")
Signed-off-by: Wei Li <liwei391@huawei.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20200724071111.35593-2-liwei391@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The usefulness of having a standard way of testing syscall performance
has come up from time to time[0]. Furthermore, some of our testing
machinery (such as 'mmtests') already makes use of a simplified version
of the microbenchmark. This patch mainly takes the same idea to measure
syscall throughput compatible with 'perf-bench' via getppid(2), yet
without any of the additional template stuff from Ingo's version (based
on numa.c). The code is identical to what mmtests uses.
[0] https://lore.kernel.org/lkml/20160201074156.GA27156@gmail.com/
Committer notes:
Add mising stdlib.h and unistd.h to get the prototypes for exit() and
getppid().
Committer testing:
$ perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
syscall: System call benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
epoll: Epoll stressing benchmarks
internals: Perf-internals benchmarks
all: All benchmarks
$
$ perf bench syscall
# List of available benchmarks for collection 'syscall':
basic: Benchmark for basic getppid(2) calls
all: Run all syscall benchmarks
$ perf bench syscall basic
# Running 'syscall/basic' benchmark:
# Executed 10000000 getppid() calls
Total time: 3.679 [sec]
0.367957 usecs/op
2717708 ops/sec
$ perf bench syscall all
# Running syscall/basic benchmark...
# Executed 10000000 getppid() calls
Total time: 3.644 [sec]
0.364456 usecs/op
2743815 ops/sec
$
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20190308181747.l36zqz2avtivrr3c@linux-r8p5
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The UDP reuseport conflict was a little bit tricky.
The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.
At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.
This requires moving the reuseport_has_conns() logic into the callers.
While we are here, get rid of inline directives as they do not belong
in foo.c files.
The other changes were cases of more straightforward overlapping
modifications.
Signed-off-by: David S. Miller <davem@davemloft.net>
On PAPR+ the hcall() on 0x1B0 is called H_DISABLE_AND_GET, but got
defined as H_DISABLE_AND_GETC instead.
This define was introduced with a typo in commit <b13a96cfb055>
("[PATCH] powerpc: Extends HCALL interface for InfiniBand usage"), and was
later used without having the typo noticed.
Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200707004812.190765-1-leobras.c@gmail.com
Implement handling of 'enable' and 'disable' control commands coming
from control file descriptor. If poll event splits initiated timeout
interval then the reminder is calculated and still waited in the
following evlist__poll() call.
Committer testing:
The testing instructions came in the cover letter, here I'll extract the
parts that are needed to test this specific patch, so that we don't
introduce bisection regressions by testing only the patch series as a
whole:
<FILL IN THE TEST INSTRUCTIONS>
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/3cb8a826-145f-81f4-fcb2-fa20045c6957@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Extend -D,--delay option with -1 value to start monitoring with
events disabled to be enabled later by enable command provided
via control file descriptor.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/81ac633c-a844-5cfb-931c-820f6e6cbd12@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Consolidate event dispatching loops for fork, attach and system wide
monitoring use cases into common dispatch_events() function.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/8a900bd5-200a-9b0f-7154-80a2343bfd1a@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Factor out body of event handling loop for fork case reusing
handle_interval() function.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/a8ae3f8d-a30e-fd40-998a-f5ca3e98cd45@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Check for target existence in loop control statement jointly external
asynchronous 'done' signal.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/79037528-578c-af64-f06c-a644b7f5ba6a@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introduce handle_interval() function that factors out body of event
handling loop for attach and system wide monitoring use cases.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/73130f9e-0d0f-7391-da50-41b4bf4bf54d@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Implement functions of initialization, finalization and processing of
control command messages coming from control file descriptors.
Allocate control file descriptor as descriptor at struct pollfd object
of evsel_list for atomic poll() operation.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/62518ceb-1cc9-2aba-593b-55408d07c1bf@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Store flags per struct pollfd *entries object in a bitmap of int size.
Implement fdarray_flag__nonfilterable flag to skip object from counting
by fdarray__filter().
Fixed fdarray test issue reported by kernel test robot.
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/6b7d43ff-0801-d5dd-4e90-fcd86b17c1c8@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Change the counter name DLFT_CCERROR to DLFT_CCFINISH on IBM z15.
This counter counts completed DEFLATE instructions with exit code
0, 1 or 2. Since exit code 0 means success and exit code 1 or 2
indicate errors, change the counter name to avoid confusion.
This counter is incremented each time the DEFLATE instruction
completed regardless if an error was detected or not.
Fixes: d68d5d51dc ("s390/cpum_cf: Add new extended counters for IBM z15")
Fixes: e7950166e4 ("perf vendor events s390: Add new deflate counters for IBM z15")
Cc: stable@vger.kernel.org # v5.7
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Avoid moving of fds by fdarray__filter() so fds indices returned by
fdarray__add() can be used for access and processing of objects at
struct pollfd *entries.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/676844f8-55d3-c628-23db-aa163a81519e@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now that the ->compat_{get,set}sockopt proto_ops methods are gone
there is no good reason left to keep the compat syscalls separate.
This fixes the odd use of unsigned int for the compat_setsockopt
optlen and the missing sock_use_custom_sol_socket.
It would also easily allow running the eBPF hooks for the compat
syscalls, but such a large change in behavior does not belong into
a consolidation patch like this one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
To pick up the changes in:
b2f9f1535b ("libbpf: Fix libbpf hashmap on (I)LP32 architectures")
Silencing this warning:
Warning: Kernel ABI header at 'tools/perf/util/hashmap.h' differs from latest version at 'tools/lib/bpf/hashmap.h'
diff -u tools/perf/util/hashmap.h tools/lib/bpf/hashmap.h
I'll eventually update the warning to remove the "Kernel ABI" part
and instead state libbpf when noticing that the original is at
"tools/lib/something".
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Jakub Bogusz <qboosh@pld-linux.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Ian Rogers <irogers@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add 'struct expr_id_data' to keep an expr value instead of just a simple
double pointer, so we can store more data for ID in the following
changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200712132634.138901-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rename expr__add_id() to expr__add_val() so we can use expr__add_id() to
actually add just the id without any value in following changes.
There's no functional change.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200712132634.138901-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Warn if the probe target function is a GNU indirect function (GNU_IFUNC)
because it may not be what the user wants to probe.
The GNU indirect function ( https://sourceware.org/glibc/wiki/GNU_IFUNC )
is the dynamic symbol solved at runtime. An IFUNC function is a selector
which is invoked from the ELF loader, but the symbol address of the
function which will be modified by the IFUNC is the same as the IFUNC in
the symbol table. This can confuse users trying to probe such functions.
For example, memcpy is an IFUNC.
probe_libc:memcpy (on __new_memcpy_ifunc@x86_64/multiarch/memcpy.c in /usr/lib64/libc-2.30.so)
the probe is put on an IFUNC.
perf 1742 [000] 26201.715632: probe_libc:memcpy: (7fdaa53824c0)
7fdaa53824c0 __new_memcpy_ifunc+0x0 (inlined)
7fdaa5d4a980 elf_machine_rela+0x6c0 (inlined)
7fdaa5d4a980 elf_dynamic_do_Rela+0x6c0 (inlined)
7fdaa5d4a980 _dl_relocate_object+0x6c0 (/usr/lib64/ld-2.30.so)
7fdaa5d42155 dl_main+0x1cc5 (/usr/lib64/ld-2.30.so)
7fdaa5d5831a _dl_sysdep_start+0x54a (/usr/lib64/ld-2.30.so)
7fdaa5d3ffeb _dl_start_final+0x25b (inlined)
7fdaa5d3ffeb _dl_start+0x25b (/usr/lib64/ld-2.30.so)
7fdaa5d3f117 .annobin_rtld.c+0x7 (inlined)
And the event is invoked from the ELF loader instead of the target
program's main code.
Moreover, at this moment, we can not probe on the function which will
be selected by the IFUNC, because it is determined at runtime. But
uprobe will be prepared before running the target binary.
Thus, I decided to warn user when 'perf probe' detects that the probe
point is on an GNU IFUNC symbol. Someone who wants to probe an IFUNC
symbol to debug the IFUNC function can ignore this warning.
Committer notes:
I.e., this warning will be emitted if the probe point is an IFUNC:
"Warning: The probe function (%s) is a GNU indirect function.\n"
"Consider identifying the final function used at run time and set the probe directly on that.\n"
Complete set of steps:
# readelf -sW /lib64/libc-2.29.so | grep IFUNC | tail
22196: 0000000000109a80 183 IFUNC GLOBAL DEFAULT 14 __memcpy_chk
22214: 00000000000b7d90 191 IFUNC GLOBAL DEFAULT 14 __gettimeofday
22336: 000000000008b690 60 IFUNC GLOBAL DEFAULT 14 memchr
22350: 000000000008b9b0 89 IFUNC GLOBAL DEFAULT 14 __stpcpy
22420: 000000000008bb10 76 IFUNC GLOBAL DEFAULT 14 __strcasecmp_l
22582: 000000000008a970 60 IFUNC GLOBAL DEFAULT 14 strlen
22585: 00000000000a54d0 92 IFUNC WEAK DEFAULT 14 wmemset
22600: 000000000010b030 92 IFUNC GLOBAL DEFAULT 14 __wmemset_chk
22618: 000000000008b8a0 183 IFUNC GLOBAL DEFAULT 14 __mempcpy
22675: 000000000008ba70 76 IFUNC WEAK DEFAULT 14 strcasecmp
#
# perf probe -x /lib64/libc-2.29.so strlen
Warning: The probe function (strlen) is a GNU indirect function.
Consider identifying the final function used at run time and set the probe directly on that.
Added new event:
probe_libc:strlen (on strlen in /usr/lib64/libc-2.29.so)
You can now use it in all perf tools, such as:
perf record -e probe_libc:strlen -aR sleep 1
#
Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lore.kernel.org/lkml/159438669349.62703.5978345670436126948.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix the memory leakage in debuginfo__find_trace_events() when the probe
point is not found in the debuginfo. If there is no probe point found in
the debuginfo, debuginfo__find_probes() will NOT return -ENOENT, but 0.
Thus the caller of debuginfo__find_probes() must check the tf.ntevs and
release the allocated memory for the array of struct probe_trace_event.
The current code releases the memory only if the debuginfo__find_probes()
hits an error but not checks tf.ntevs. In the result, the memory allocated
on *tevs are not released if tf.ntevs == 0.
This fixes the memory leakage by checking tf.ntevs == 0 in addition to
ret < 0.
Fixes: ff74178350 ("perf probe: Introduce debuginfo to encapsulate dwarf information")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/159438668346.62703.10887420400718492503.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix a wrong "variable not found" warning when the probe point is not
found in the debuginfo.
Since the debuginfo__find_probes() can return 0 even if it does not find
given probe point in the debuginfo, fill_empty_trace_arg() can be called
with tf.ntevs == 0 and it can emit a wrong warning. To fix this, reject
ntevs == 0 in fill_empty_trace_arg().
E.g. without this patch;
# perf probe -x /lib64/libc-2.30.so -a "memcpy arg1=%di"
Failed to find the location of the '%di' variable at this address.
Perhaps it has been optimized out.
Use -V with the --range option to show '%di' location range.
Added new events:
probe_libc:memcpy (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di)
probe_libc:memcpy (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di)
You can now use it in all perf tools, such as:
perf record -e probe_libc:memcpy -aR sleep 1
With this;
# perf probe -x /lib64/libc-2.30.so -a "memcpy arg1=%di"
Added new events:
probe_libc:memcpy (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di)
probe_libc:memcpy (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di)
You can now use it in all perf tools, such as:
perf record -e probe_libc:memcpy -aR sleep 1
Fixes: cb40273085 ("perf probe: Trace a magic number if variable is not found")
Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Andi Kleen <ak@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/159438667364.62703.2200642186798763202.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There is a case that several same-name symbols points to the same
address. In that case, 'perf probe' returns an error.
E.g.
# perf probe -x /lib64/libc-2.30.so -v -a "memcpy arg1=%di"
probe-definition(0): memcpy arg1=%di
symbol:memcpy file:(null) line:0 offset:0 return:0 lazy:(null)
parsing arg: arg1=%di into name:arg1 %di
1 arguments
symbol:setjmp file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:longjmp file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:longjmp_target file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:lll_lock_wait_private file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_mallopt_arena_max file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_mallopt_arena_test file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_tunable_tcache_max_bytes file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_tunable_tcache_count file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_tunable_tcache_unsorted_limit file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_mallopt_trim_threshold file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_mallopt_top_pad file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_mallopt_mmap_threshold file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_mallopt_mmap_max file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_mallopt_perturb file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_mallopt_mxfast file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_heap_new file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_arena_reuse_free_list file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_arena_reuse file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_arena_reuse_wait file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_arena_new file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_arena_retry file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_sbrk_less file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_heap_free file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_heap_less file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_tcache_double_free file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_heap_more file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_sbrk_more file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_malloc_retry file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_memalign_retry file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_mallopt_free_dyn_thresholds file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_realloc_retry file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_calloc_retry file:(null) line:0 offset:0 return:0 lazy:(null)
symbol:memory_mallopt file:(null) line:0 offset:0 return:0 lazy:(null)
Open Debuginfo file: /usr/lib/debug/usr/lib64/libc-2.30.so.debug
Try to find probe point from debuginfo.
Opening /sys/kernel/debug/tracing//README write=0
Failed to find the location of the '%di' variable at this address.
Perhaps it has been optimized out.
Use -V with the --range option to show '%di' location range.
An error occurred in debuginfo analysis (-2).
Trying to use symbols.
Opening /sys/kernel/debug/tracing//uprobe_events write=1
Writing event: p:probe_libc/memcpy /usr/lib64/libc-2.30.so:0x914c0 arg1=%di
Writing event: p:probe_libc/memcpy /usr/lib64/libc-2.30.so:0x914c0 arg1=%di
Failed to write event: File exists
Error: Failed to add events. Reason: File exists (Code: -17)
You can see that perf tried to write completely the same probe
definition twice, which caused an error.
To fix this issue, check the symbol list and drop duplicated symbols
(which has the same symbol name and address) from it.
With this patch:
# perf probe -x /lib64/libc-2.30.so -a "memcpy arg1=%di"
Failed to find the location of the '%di' variable at this address.
Perhaps it has been optimized out.
Use -V with the --range option to show '%di' location range.
Added new events:
probe_libc:memcpy (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di)
probe_libc:memcpy (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di)
You can now use it in all perf tools, such as:
perf record -e probe_libc:memcpy -aR sleep 1
Committer notes:
Fix this build error on 32-bit arches by using PRIx64 for symbol->start,
that is an u64:
In file included from util/probe-event.c:27:
util/probe-event.c: In function 'find_probe_trace_events_from_map':
util/probe-event.c:2978:14: error: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=]
pr_debug("Found duplicated symbol %s @ %lx\n",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
util/debug.h:17:21: note: in definition of macro 'pr_fmt'
#define pr_fmt(fmt) fmt
^~~
util/probe-event.c:2978:5: note: in expansion of macro 'pr_debug'
pr_debug("Found duplicated symbol %s @ %lx\n",
^~~~~~~~
Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lore.kernel.org/lkml/159438666401.62703.15196394835032087840.stgit@devnote2
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'perf kmem' has an input file option but current an output file option
fails:
$ sudo perf kmem record -o /tmp/p.data sleep 1
Error: unknown switch `o'
Usage: perf kmem [<options>] {record|stat}
-f, --force don't complain, do it
-i, --input <file> input file name
-l, --line <num> show n lines
-s, --sort <key[,key2...]>
sort by keys: ptr, callsite, bytes, hit, pingpong, frag, page, order, mig>
-v, --verbose be more verbose (show symbol address, etc)
--alloc show per-allocation statistics
--caller show per-callsite statistics
--live Show live page stat
--page Analyze page allocator
--raw-ip show raw ip instead of symbol
--slab Analyze slab allocator
--time <str> Time span of interest (start,stop)
'perf sched' is similar in implementation and avoids the problem by
passing additional arguments to 'perf record'.
This change makes 'perf kmem' parse command line options consistently
with 'perf sched', although neither actually list that -o is a supported
option.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200708183919.4141023-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Setting the parse_events_error directly doesn't increment num_errors
causing the error message not to be displayed. Use the
parse_events__handle_error function that sets num_errors and handle
multiple errors.
Committer notes:
Ian provided a before/after upon request:
Before:
$ /tmp/perf/perf record -e /tmp/perf/util/parse-events.o
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available event
After:
$ /tmp/perf/perf record -e /tmp/perf/util/parse-events.o
event syntax error: '/tmp/perf/util/parse-events.o'
\___ Failed to load /tmp/perf/util/parse-events.o: BPF object format invalid
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: bpf@vger.kernel.org
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@chromium.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: netdev@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lore.kernel.org/lkml/20200707211449.3868944-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It is generally more useful to show the symbol with an address. In this
case, the print function requires the 'machine' which means changing
callers to provide it as a parameter. It is optional because most events
do not need it and the callers that matter can provide it.
Committer notes:
Made 'union perf_event' continue to be the first parameter to the
perf_event__fprintf() and perf_event__fprintf_text_poke() events.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: x86@kernel.org
Link: http://lore.kernel.org/lkml/20200512121922.8997-16-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Consistent with other new events, add an option to perf script to
display text poke events and ksymbol events. Both text poke events and
ksymbol events are displayed because some text pokes (e.g. ftrace
trampolines) have corresponding ksymbol events.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: x86@kernel.org
Link: http://lore.kernel.org/lkml/20200512121922.8997-15-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Select text poke events when available and the kernel is being traced.
Process text poke events to invalidate entries in Intel PT's instruction
cache.
Example:
The example requires kernel config:
CONFIG_PROC_SYSCTL=y
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
Before:
# perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M &
# cat /proc/sys/kernel/sched_schedstats
0
# echo 1 > /proc/sys/kernel/sched_schedstats
# cat /proc/sys/kernel/sched_schedstats
1
# echo 0 > /proc/sys/kernel/sched_schedstats
# cat /proc/sys/kernel/sched_schedstats
0
# kill %1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 3.341 MB perf.data.before ]
[1]+ Terminated perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M
# perf script -i perf.data.before --itrace=e >/dev/null
Warning:
474 instruction trace errors
After:
# perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
# cat /proc/sys/kernel/sched_schedstats
0
# echo 1 > /proc/sys/kernel/sched_schedstats
# cat /proc/sys/kernel/sched_schedstats
1
# echo 0 > /proc/sys/kernel/sched_schedstats
# cat /proc/sys/kernel/sched_schedstats
0
# kill %1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.646 MB perf.data.after ]
[1]+ Terminated perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
# perf script -i perf.data.after --itrace=e >/dev/null
Example:
The example requires kernel config:
# CONFIG_FUNCTION_TRACER is not set
Before:
# perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
# perf probe __schedule
Added new event:
probe:__schedule (on __schedule)
You can now use it in all perf tools, such as:
perf record -e probe:__schedule -aR sleep 1
# perf record -e probe:__schedule -aR sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.026 MB perf.data (68 samples) ]
# perf probe -d probe:__schedule
Removed event: probe:__schedule
# kill %1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 41.268 MB t1 ]
[1]+ Terminated perf record --kcore -m,64M -o t1 -a -e intel_pt//k
# perf script -i t1 --itrace=e >/dev/null
Warning:
207 instruction trace errors
After:
# perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
# perf probe __schedule
Added new event:
probe:__schedule (on __schedule)
You can now use it in all perf tools, such as:
perf record -e probe:__schedule -aR sleep 1
# perf record -e probe:__schedule -aR sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.028 MB perf.data (107 samples) ]
# perf probe -d probe:__schedule
Removed event: probe:__schedule
# kill %1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 39.978 MB t1 ]
[1]+ Terminated perf record --kcore -m,64M -o t1 -a -e intel_pt//k
# perf script -i t1 --itrace=e >/dev/null
# perf script -i t1 --no-itrace -D | grep 'POKE\|KSYMBOL'
6 565303693547 0x291f18 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc027a000 len 4096 type 2 flags 0x0 name kprobe_insn_page
6 565303697010 0x291f68 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027a000 old len 0 new len 6
6 565303838278 0x291fa8 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc027c000 len 4096 type 2 flags 0x0 name kprobe_optinsn_page
6 565303848286 0x291ff8 [0xa0]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027c000 old len 0 new len 106
6 565369336743 0x292af8 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffff88ab8890 old len 5 new len 5
7 566434327704 0x217c208 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffff88ab8890 old len 5 new len 5
6 566456313475 0x293198 [0xa0]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027c000 old len 106 new len 0
6 566456314935 0x293238 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027a000 old len 6 new len 0
Example:
The example requires kernel config:
CONFIG_FUNCTION_TRACER=y
Before:
# perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
# perf probe __kmalloc
Added new event:
probe:__kmalloc (on __kmalloc)
You can now use it in all perf tools, such as:
perf record -e probe:__kmalloc -aR sleep 1
# perf record -e probe:__kmalloc -aR sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.022 MB perf.data (6 samples) ]
# perf probe -d probe:__kmalloc
Removed event: probe:__kmalloc
# kill %1
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 43.850 MB t1 ]
[1]+ Terminated perf record --kcore -m,64M -o t1 -a -e intel_pt//k
# perf script -i t1 --itrace=e >/dev/null
Warning:
8 instruction trace errors
After:
# perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
# perf probe __kmalloc
Added new event:
probe:__kmalloc (on __kmalloc)
You can now use it in all perf tools, such as:
perf record -e probe:__kmalloc -aR sleep 1
# perf record -e probe:__kmalloc -aR sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (206 samples) ]
# perf probe -d probe:__kmalloc
Removed event: probe:__kmalloc
# kill %1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 41.442 MB t1 ]
[1]+ Terminated perf record --kcore -m,64M -o t1 -a -e intel_pt//k
# perf script -i t1 --itrace=e >/dev/null
# perf script -i t1 --no-itrace -D | grep 'POKE\|KSYMBOL'
5 312216133258 0x8bafe0 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc0360000 len 415 type 2 flags 0x0 name ftrace_trampoline
5 312216133494 0x8bb030 [0x1d8]: PERF_RECORD_TEXT_POKE addr 0xffffffffc0360000 old len 0 new len 415
5 312216229563 0x8bb208 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
5 312216239063 0x8bb248 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
5 312216727230 0x8bb288 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffabbea190 old len 5 new len 5
5 312216739322 0x8bb2c8 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
5 312216748321 0x8bb308 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
7 313287163462 0x2817430 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
7 313287174890 0x2817470 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
7 313287818979 0x28174b0 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffabbea190 old len 5 new len 5
7 313287829357 0x28174f0 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
7 313287841246 0x2817530 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: x86@kernel.org
Link: http://lore.kernel.org/lkml/20200512121922.8997-14-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
PERF_RECORD_KSYMBOL_TYPE_OOL marks an executable page. Create a map
backed only by memory, which will be populated as necessary by text poke
events.
Committer notes:
From the patch:
OOL stands for "Out of line" code such as kprobe-replaced instructions
or optimized kprobes or ftrace trampolines.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: x86@kernel.org
Link: http://lore.kernel.org/lkml/20200512121922.8997-13-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add processing for PERF_RECORD_TEXT_POKE events. When a text poke event
is processed, then the kernel dso data cache is updated with the poked
bytes.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: x86@kernel.org
Link: http://lore.kernel.org/lkml/20200512121922.8997-12-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Our local MSAN (Memory Sanitizer) build of perf throws a warning that
comes from the "dso__disassemble_filename" function in
"tools/perf/util/annotate.c" when running perf record.
The warning stems from the call to readlink, in which "build_id_path"
was being read into "linkname". Since readlink does not null terminate,
an uninitialized memory access would later occur when "linkname" is
passed into the strstr function. This is simply fixed by
null-terminating "linkname" after the call to readlink.
To reproduce this warning, build perf by running:
$ make -C tools/perf CLANG=1 CC=clang EXTRA_CFLAGS="-fsanitize=memory -fsanitize-memory-track-origins"
(Additionally, llvm might have to be installed and clang might have to
be specified as the compiler - export CC=/usr/bin/clang)
Then running:
tools/perf/perf record -o - ls / | tools/perf/perf --no-pager annotate -i - --stdio
Please see the cover letter for why false positive warnings may be
generated.
Signed-off-by: Numfor Mbiziwo-Tiapo <nums@google.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Drayton <mbd@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20190729205750.193289-1-nums@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
**perf-<pid>.map and jit-<pid>.dump designs:
When a JIT generates code to be executed, it must allocate memory and
mark it executable using an mmap call.
*** perf-<pid>.map design
The perf-<pid>.map assumes that any sample recorded in an anonymous
memory page is JIT code. It then tries to resolve the symbol name by
looking at the process' perf-<pid>.map.
*** jit-<pid>.dump design
The jit-<pid>.dump mechanism takes a different approach. It requires a
JIT to write a `<path>/jit-<pid>.dump` file. This file must also be
mmapped so that perf inject -jit can find the file. The JIT must also
add JIT_CODE_LOAD records for any functions it generates. The records
are timestamped using a clock which can be correlated to the perf record
clock.
After perf record, the `perf inject -jit` pass parses the recording
looking for a `<path>/jit-<pid>.dump` file. When it finds the file, it
parses it and for each JIT_CODE_LOAD record:
* creates an elf file `<path>/jitted-<pid>-<code_index>.so
* injects a new mmap record mapping the new elf file into the process.
*** Coexistence design
The kernel and perf support both of these mechanisms. We need to make
sure perf works on an app supporting either or both of these mechanisms.
Both designs rely on mmap records to determine how to resolve an ip
address.
The mmap records of both techniques by definition overlap. When the JIT
compiles a method, it must:
* allocate memory (mmap)
* add execution privilege (mprotect or mmap. either will
generate an mmap event form the kernel to perf)
* compile code into memory
* add a function record to perf-<pid>.map and/or jit-<pid>.dump
Because the jit-<pid>.dump mechanism supports greater capabilities, perf
prefers the symbols from jit-<pid>.dump. It implements this based on
timestamp ordering of events. There is an implicit ASSUMPTION that the
JIT_CODE_LOAD record timestamp will be after the // anon mmap event that
was generated during memory allocation or adding the execution privilege setting.
*** Problems with the ASSUMPTION
The ASSUMPTION made in the Coexistence design section above is violated
in the following scenario.
*** Scenario
While a JIT is jitting code it will eventually need to commit more
pages and change these pages to executable permissions. Typically the
JIT will want these collocated to minimize branch displacements.
The kernel will coalesce these anonymous mapping with identical
permissions before sending an MMAP event for the new pages. The address
range of the new mmap will not be just the most recently mmap pages.
It will include the entire coalesced mmap region.
See mm/mmap.c
unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
struct list_head *uf)
{
...
/*
* Can we just expand an old mapping?
*/
...
perf_event_mmap(vma);
...
}
*** Symptoms
The coalesced // anon mmap event will be timestamped after the
JIT_CODE_LOAD records. This means it will be used as the most recent
mapping for that entire address range. For remaining events it will look
at the inferior perf-<pid>.map for symbols.
If both mechanisms are supported, the symbol will appear twice with
different module names. This causes weird behavior in reporting.
If only jit-<pid>.dump is supported, the symbol will no longer be resolved.
** Implemented solution
This patch solves the issue by removing // anon mmap events for any
process which has a valid jit-<pid>.dump file.
It tracks on a per process basis to handle the case where some running
apps support jit-<pid>.dump, but some only support perf-<pid>.map.
It adds new assumptions:
* // anon mmap events are only required for perf-<pid>.map support.
* An app that uses jit-<pid>.dump, no longer needs
perf-<pid>.map support. It assumes that any perf-<pid>.map info is
inferior.
*** Details
Use thread->priv to store whether a jitdump file has been processed
During "perf inject --jit", discard "//anon*" mmap events for any pid which
has sucessfully processed a jitdump file.
** Testing:
// jitdump case
perf record <app with jitdump>
perf inject --jit --input perf.data --output perfjit.data
// verify mmap "//anon" events present initially
perf script --input perf.data --show-mmap-events | grep '//anon'
// verify mmap "//anon" events removed
perf script --input perfjit.data --show-mmap-events | grep '//anon'
// no jitdump case
perf record <app without jitdump>
perf inject --jit --input perf.data --output perfjit.data
// verify mmap "//anon" events present initially
perf script --input perf.data --show-mmap-events | grep '//anon'
// verify mmap "//anon" events not removed
perf script --input perfjit.data --show-mmap-events | grep '//anon'
** Repro:
This issue was discovered while testing the initial CoreCLR jitdump
implementation. https://github.com/dotnet/coreclr/pull/26897.
** Alternate solutions considered
These were also briefly considered:
* Change kernel to not coalesce mmap regions.
* Change kernel reporting of coalesced mmap regions to perf. Only
include newly mapped memory.
* Only strip parts of // anon mmap events overlapping existing
jitted-<pid>-<code_index>.so mmap events.
Signed-off-by: Steve MacLean <Steve.MacLean@Microsoft.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/1590544271-125795-1-git-send-email-steve.maclean@linux.microsoft.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up fixes and move perf/core forward, minor conflict as
perf_evlist__add_dummy() lost its 'perf_' prefix as it operates on a
'struct evlist', not on a 'struct perf_evlist', i.e. its tools/perf/
specific, it is not in libperf.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>