We have to check if the values are >= *_MAX, not just >, fix it.
From the bugzilla report:
''In file /tools/perf/util/evsel.c function __perf_evsel__hw_cache_name
it appears that there is a bug that reads beyond the end of the buffer.
The statement "if (type > PERF_COUNT_HW_CACHE_MAX)" allows type to be
equal to the maximum value. Later, when statement "if
(!perf_evsel__is_cache_op_valid(type, op))" is executed, the function
can access array perf_evsel__hw_cache_stat[type] beyond the end of the
buffer.
It appears to me that the statement "if (type > PERF_COUNT_HW_CACHE_MAX)"
should be "if (type >= PERF_COUNT_HW_CACHE_MAX)"
Bug found with Coverity and manual code review. No attempts were made to
execute the code with a maximum type value.''
Committer note:
Testing it:
$ perf record -e $(echo $(perf list cache | cut -d \[ -f1) | sed 's/ /,/g') usleep 1
[ perf record: Woken up 16 times to write data ]
[ perf record: Captured and wrote 0.023 MB perf.data (34 samples) ]
$ perf evlist
L1-dcache-load-misses
L1-dcache-loads
L1-dcache-stores
L1-icache-load-misses
LLC-load-misses
LLC-loads
LLC-store-misses
LLC-stores
branch-load-misses
branch-loads
dTLB-load-misses
dTLB-loads
dTLB-store-misses
dTLB-stores
iTLB-load-misses
iTLB-loads
node-load-misses
node-loads
node-store-misses
node-stores
$ perf list cache
List of pre-defined events (to be used in -e):
L1-dcache-load-misses [Hardware cache event]
L1-dcache-loads [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
LLC-load-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-stores [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
dTLB-store-misses [Hardware cache event]
dTLB-stores [Hardware cache event]
iTLB-load-misses [Hardware cache event]
iTLB-loads [Hardware cache event]
node-load-misses [Hardware cache event]
node-loads [Hardware cache event]
node-store-misses [Hardware cache event]
node-stores [Hardware cache event]
$
Reported-by: Brian Sweeney <bsweeney@lgsinnovations.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=153351
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Colin King <colin.king@canonical.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-zh2j4iqimralugke5qq7dn6d@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
dup and fdopen can potentially fail, so add some extra
error handling checks rather than assuming they always work.
Signed-off-by: Colin King <colin.king@canonical.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1471038296-12956-1-git-send-email-colin.king@canonical.com
[ Free resources when those functions (now being verified) fail ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 73cdf0c6ea ("perf symbols: Record text offset in dso
to calculate objdump address") started storing the offset of
the text section for all DSOs:
if (elf_section_by_name(elf, &ehdr, &tshdr, ".text", NULL))
dso->text_offset = tshdr.sh_addr - tshdr.sh_offset;
Unfortunately this breaks debuginfo files, because we need to calculate
the offset of the text section in the associated executable file. As a
result perf annotate returns junk for all debuginfo files.
Fix this by using runtime_ss->elf which should point at the executable
when parsing a debuginfo file.
Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v4.6+
Fixes: 73cdf0c6ea ("perf symbols: Record text offset in dso to calculate objdump address")
Link: http://lkml.kernel.org/r/20160813115533.6de17912@kryten
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Noticed on Fedora Rawhide:
$ gcc --version
gcc (GCC) 6.1.1 20160721 (Red Hat 6.1.1-4)
$ rpm -q glibc
glibc-2.24.90-1.fc26.x86_64
$
CC /tmp/build/perf/util/jitdump.o
util/jitdump.c: In function 'jit_repipe_code_load':
util/jitdump.c:428:2: error: '__major_from_sys_types' is deprecated:
In the GNU C Library, `major' is defined by <sys/sysmacros.h>.
For historical compatibility, it is currently defined by
<sys/types.h> as well, but we plan to remove this soon.
To use `major', include <sys/sysmacros.h> directly.
If you did not intend to use a system-defined macro `major',
you should #undef it after including <sys/types.h>.
[-Werror=deprecated-declarations]
event->mmap2.maj = major(st.st_dev);
^~~~~
In file included from /usr/include/features.h:397:0,
from /usr/include/sys/types.h:25,
from util/jitdump.c:1:
/usr/include/sys/sysmacros.h:87:1: note: declared here
__SYSMACROS_DEFINE_MAJOR (__SYSMACROS_FST_IMPL_TEMPL)
Fix it following that recomendation.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-3majvd0adhfr25rvx4v5e9te@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The June 2015 Intel SDM introduced IP Compression types 4 and 6. Refer
to section 36.4.2.2 Target IP (TIP) Packet - IP Compression.
Existing Intel PT packet decoder did not support type 4, and got type 6
wrong. Because type 3 and type 4 have the same number of bytes, the
packet 'count' has been changed from being the number of ip bytes to
being the type code. That allows the Intel PT decoder to correctly
decide whether to sign-extend or use the last ip. However that also
meant the code had to be adjusted in a number of places.
Currently hardware is not using the new compression types, so this fix
has no effect on existing hardware.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1469005206-3049-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Powerpc has Global Entry Point and Local Entry Point for functions. LEP
catches call from both the GEP and the LEP. Symbol table of ELF contains
GEP and Offset from which we can calculate LEP, but debuginfo does not
have LEP info.
Currently, perf prioritize symbol table over dwarf to probe on LEP for
ppc64le. But when user tries to probe with function parameter, we fall
back to using dwarf(i.e. GEP) and when function called via LEP, probe
will never hit.
For example:
$ objdump -d vmlinux
...
do_sys_open():
c0000000002eb4a0: e8 00 4c 3c addis r2,r12,232
c0000000002eb4a4: 60 00 42 38 addi r2,r2,96
c0000000002eb4a8: a6 02 08 7c mflr r0
c0000000002eb4ac: d0 ff 41 fb std r26,-48(r1)
$ sudo ./perf probe do_sys_open
$ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:probe/do_sys_open _text+3060904
$ sudo ./perf probe 'do_sys_open filename:string'
$ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:probe/do_sys_open _text+3060896 filename_string=+0(%gpr4):string
For second case, perf probed on GEP. So when function will be called via
LEP, probe won't hit.
$ sudo ./perf record -a -e probe:do_sys_open ls
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.195 MB perf.data ]
To resolve this issue, let's not prioritize symbol table, let perf
decide what it wants to use. Perf is already converting GEP to LEP when
it uses symbol table. When perf uses debuginfo, let it find LEP offset
form symbol table. This way we fall back to probe on LEP for all cases.
After patch:
$ sudo ./perf probe 'do_sys_open filename:string'
$ sudo cat /sys/kernel/debug/tracing/kprobe_events
p:probe/do_sys_open _text+3060904 filename_string=+0(%gpr4):string
$ sudo ./perf record -a -e probe:do_sys_open ls
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.197 MB perf.data (11 samples) ]
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1470723805-5081-2-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Instead of inline code, introduce function to post process kernel
probe trace events.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1470723805-5081-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'perf probe' tool detects a variable's type and use the detected
type to add a new probe. Then, kprobes prints its variable in
hexadecimal format if the variable is unsigned and prints in decimal if
it is signed.
We sometimes want to see unsigned variable in decimal format (i.e.
sector_t or size_t). In that case, we need to investigate the variable's
size manually to specify just signedness.
This patch add signedness casting support. By specifying "s" or "u" as a
type, perf-probe will investigate variable size as usual and use the
specified signedness.
E.g. without this:
$ perf probe -a 'submit_bio bio->bi_iter.bi_sector'
Added new event:
probe:submit_bio (on submit_bio with bi_sector=bio->bi_iter.bi_sector)
You can now use it in all perf tools, such as:
perf record -e probe:submit_bio -aR sleep 1
$ cat trace_pipe|head
dbench-9692 [003] d..1 971.096633: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x3a3d00
dbench-9692 [003] d..1 971.096685: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x1a3d80
dbench-9692 [003] d..1 971.096687: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x3a3d80
...
// need to investigate the variable size
$ perf probe -a 'submit_bio bio->bi_iter.bi_sector:s64'
Added new event:
probe:submit_bio (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s64)
You can now use it in all perf tools, such as:
perf record -e probe:submit_bio -aR sleep 1
With this:
// just use "s" to cast its signedness
$ perf probe -v -a 'submit_bio bio->bi_iter.bi_sector:s'
Added new event:
probe:submit_bio (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s)
You can now use it in all perf tools, such as:
perf record -e probe:submit_bio -aR sleep 1
$ cat trace_pipe|head
dbench-9689 [001] d..1 1212.391237: submit_bio: (submit_bio+0x0/0x140) bi_sector=128
dbench-9689 [001] d..1 1212.391252: submit_bio: (submit_bio+0x0/0x140) bi_sector=131072
dbench-9697 [006] d..1 1212.398611: submit_bio: (submit_bio+0x0/0x140) bi_sector=30208
This commit also update perf-probe.txt to describe "types". Most parts
are based on existing documentation: Documentation/trace/kprobetrace.txt
Committer note:
Testing using 'perf trace':
# perf probe -a 'submit_bio bio->bi_iter.bi_sector'
Added new event:
probe:submit_bio (on submit_bio with bi_sector=bio->bi_iter.bi_sector)
You can now use it in all perf tools, such as:
perf record -e probe:submit_bio -aR sleep 1
# trace --no-syscalls --ev probe:submit_bio
0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=0xc133c0)
3181.861 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffb8)
3181.881 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffc0)
3184.488 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffc8)
<SNIP>
4717.927 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x4dc7a88)
4717.970 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x4dc7880)
^C[root@jouet ~]#
Now, using this new feature:
[root@jouet ~]# perf probe -a 'submit_bio bio->bi_iter.bi_sector:s'
Added new event:
probe:submit_bio (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s)
You can now use it in all perf tools, such as:
perf record -e probe:submit_bio -aR sleep 1
[root@jouet ~]# trace --no-syscalls --ev probe:submit_bio
0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145704)
0.017 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145712)
0.019 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145720)
2.567 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145728)
5631.919 probe:submit_bio:(ffffffffac3aee00) bi_sector=0)
5631.941 probe:submit_bio:(ffffffffac3aee00) bi_sector=8)
5631.945 probe:submit_bio:(ffffffffac3aee00) bi_sector=16)
5631.948 probe:submit_bio:(ffffffffac3aee00) bi_sector=24)
^C#
With callchains:
# trace --no-syscalls --ev probe:submit_bio/max-stack=10/
0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662544)
submit_bio+0xa8200001 ([kernel.kallsyms])
submit_bh+0xa8200013 ([kernel.kallsyms])
jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms])
kjournald2+0xa82000ca ([kernel.kallsyms])
kthread+0xa82000d8 ([kernel.kallsyms])
ret_from_fork+0xa820001f ([kernel.kallsyms])
0.023 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662552)
submit_bio+0xa8200001 ([kernel.kallsyms])
submit_bh+0xa8200013 ([kernel.kallsyms])
jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms])
kjournald2+0xa82000ca ([kernel.kallsyms])
kthread+0xa82000d8 ([kernel.kallsyms])
ret_from_fork+0xa820001f ([kernel.kallsyms])
0.027 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662560)
submit_bio+0xa8200001 ([kernel.kallsyms])
submit_bh+0xa8200013 ([kernel.kallsyms])
jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms])
kjournald2+0xa82000ca ([kernel.kallsyms])
kthread+0xa82000d8 ([kernel.kallsyms])
ret_from_fork+0xa820001f ([kernel.kallsyms])
2.593 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662568)
submit_bio+0xa8200001 ([kernel.kallsyms])
submit_bh+0xa8200013 ([kernel.kallsyms])
journal_submit_commit_record+0xa82001ac ([kernel.kallsyms])
jbd2_journal_commit_transaction+0xa82012e8 ([kernel.kallsyms])
kjournald2+0xa82000ca ([kernel.kallsyms])
kthread+0xa82000d8 ([kernel.kallsyms])
ret_from_fork+0xa820001f ([kernel.kallsyms])
^C#
Signed-off-by: Naohiro Aota <naohiro.aota@hgst.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1470710408-23515-1-git-send-email-naohiro.aota@hgst.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If module is "module" then dso->short_name is "[module]". Substring
comparing is't enough: "raid10" matches to "[raid1]". This patch also
checks terminating zero in module name.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: http://lkml.kernel.org/r/147039975648.715620.12985971832789032159.stgit@buzz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adjust map->reloc offset for the unmapped address when finding
alternative symbol address from map, because KASLR can relocate the
kernel symbol address.
The same adjustment has been done when finding appropriate kernel symbol
address from map which was introduced by commit f90acac757 ("perf
probe: Find given address from offline dwarf")
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu@linaro.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20160806192948.e366f3fbc4b194de600f8326@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When we use libtraceevent to format trace event fields into printable
strings to use in hist entries it is important to trim it from the
default 4 KiB it starts with to what is really used, to reduce the
memory footprint, so use realloc(seq.buffer, seq.len + 1) when returning
the seq.buffer formatted with the fields contents.
Reported-and-Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/n/tip-t3hl7uxmilrkigzmc90rlhk2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
New features:
- Add --sample-cpu to 'perf record', to explicitely ask for sampling
the CPU (Jiri Olsa)
Fixes:
- Fix processing of multi byte chunks in objdump output, fixing
disassemble processing for annotation on at least ARM64 (Jan Stancek)
- Use SyS_epoll_wait in a BPF 'perf test' entry instead of sys_epoll_wait, that
is not present in the DWARF info in vmlinux files (Arnaldo Carvalho de Melo)
- Add -wno-shadow when processing files using perl headers, fixing
the build on Fedora Rawhide and Arch Linux (Namhyung Kim)
Infrastructure:
- Annotate prep work to better catch and report errors related to
using objdump to disassemble DSOs (Arnaldo Carvalho de Melo)
- Add 'alloc', 'scnprintf' and 'and' methods for bitmap processing (Jiri Olsa)
- Add nested output resorting callback in hists processing (Jiri Olsa)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXopDWAAoJENZQFvNTUqpAowMP/RYzsngLAfx2c1OGdXSYdesD
0bdkTwTA/ob+uQdSvksR6Y9zf+I5nD6dwB6NbBbeXgs8ZwWNNjLObz1hFtYZYQ6j
qBlf1iJ6k0cHm8EaF5fs0J6/RyU+TasqBrgDiqiMTlHJs5gsyD892vVqz0SxiKwP
i1nbyG8VrgBKTAv5j7pMZSn12IsSdGzymGzb/sfGmqz38t97Jp3hUj9MDb8I/wMJ
iEorX0wUNJRu1jfvjiev9gtLvPbmKQ8Nnj05Qy+aT4Lf0iNa4kLz/jqXXeCHs57n
0uoJoRn/vAqYoBFtyLkYBppuygubc7neuk4AaaOu8CQ6Y2sKgX9WTyZ8a0PxOOQZ
aDIU/GraJ5mzOJCVq/4QRQPx7OSw0hJ33kNa03+cGxU5uQQdUOLJCrSOL8WOcH8C
izRwomVLpUvNA1bsWeRl9C01/c/qGKYl7Mytptkt8xbA4LyUAWD7DZhIAIUOV2qY
ekP8Xsc/qeaHCM80XaYJWhgcAd5SaxfL3aIUalDk6G+4KVMoDlyU3fxa977wEj30
R2yOZdG8sp3c2KvrdXQASZbcgsLlDq8Bqt7LbtPbQOoa8NEfAl/6e/LIF2CAwgLc
8pL5j6tPcetEiIpUHoNpuwGEYGCkWwIntPczGK2j3+ppj4r7pYaLO3XxwRu/8tnH
RG7QV1U68YcFM47awIT5
=uPmA
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-20160803' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
New features:
- Add --sample-cpu to 'perf record', to explicitely ask for sampling
the CPU (Jiri Olsa)
Fixes:
- Fix processing of multi byte chunks in objdump output, fixing
disassemble processing for annotation on at least ARM64 (Jan Stancek)
- Use SyS_epoll_wait in a BPF 'perf test' entry instead of sys_epoll_wait, that
is not present in the DWARF info in vmlinux files (Arnaldo Carvalho de Melo)
- Add -wno-shadow when processing files using perl headers, fixing
the build on Fedora Rawhide and Arch Linux (Namhyung Kim)
Infrastructure changes:
- Annotate prep work to better catch and report errors related to
using objdump to disassemble DSOs (Arnaldo Carvalho de Melo)
- Add 'alloc', 'scnprintf' and 'and' methods for bitmap processing (Jiri Olsa)
- Add nested output resorting callback in hists processing (Jiri Olsa)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Adding --sample-cpu option to be able to explicitly enable CPU sample
type. Currently it's only enable implicitly in case the target is cpu
related.
It will be useful for following c2c record tool.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1470074555-24889-8-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When dealing with nested hist entries it's helpful to have a way to
resort those nested objects.
Adding optional callback call into output_resort function and following
new interface function:
typedef int (*hists__resort_cb_t)(struct hist_entry *he);
void hists__output_resort_cb(struct hists *hists,
struct ui_progress *prog,
hists__resort_cb_t cb);
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1470074555-24889-7-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If dso__build_id_filename(..., NULL, ...) returns !NULL its because it
allocated it, so, when reaching the 'if (dso__is_kcore()) test, we
already checked that and were just "fallbacking" to using
dso->long_name, but without freeing filename, thus leaking it.
Fix it by adding the dso__is_kcore() test to the 'or' group just after
it, the one containing the full fallback code, including freeing the
filename.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: ee205503f2 ("perf tools: Fix annotation with kcore")
Link: http://lkml.kernel.org/n/tip-qi4rpjq8yo6myvg99kkgt0xz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We were just using pr_error() which makes it difficult for non stdio UIs
to provide errors using its widgets, as they need to somehow catch what
was passed to pr_error().
Fix it by introducing a __strerror() interface like the ones used
elsewhere, for instance target__strerror().
This is just the initial step, more work will be done, but first some
error handling bugs noticed while working on this need to be dealt with.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-dgd22zl2xg7x4vcnoa83jxfb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This function will not annotate anything, it will just disassembly the
given map->dso and symbol.
It currently does this by parsing the output of 'objdump --disassemble',
but this could conceivably be done using a library or an offshot of
the kernel's instruction decoder (arch/x86/lib/inat.c), etc.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-2xpfl4bfnrd6x584b390qok7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf fixes from Thomas Gleixner:
"This update contains:
- a fix for the bpf tools to use the new EM_BPF code
- a fix for the module parser of perf to retrieve the
proper text start address
- add str_error_c to libapi to avoid linking against
tools/lib/str_error_r.o"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools lib api: Add str_error_c to libapi
perf s390: Fix 'start' address of module's map
tools lib bpf: Use official ELF e_machine value
So no need for checking if it uses the strerror_r() GNU variant error
reporting mechanism, i.e. if it returns a pointer to a immutable string
internal to glibc.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: c8b5f2c96d ("tools: Introduce str_error_r()")
Link: http://lkml.kernel.org/n/tip-xr83cd4y4r3cn6tq6w4f59jb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We will need to redirect the stderr as well, so open code popen as
a starting point.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-k0zt9svg4bswiglem7ornts4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
1/ Replace pcommit with ADR / directed-flushing:
The pcommit instruction, which has not shipped on any product, is
deprecated. Instead, the requirement is that platforms implement either
ADR, or provide one or more flush addresses per nvdimm. ADR
(Asynchronous DRAM Refresh) flushes data in posted write buffers to the
memory controller on a power-fail event. Flush addresses are defined in
ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure:
"Flush Hint Address Structure". A flush hint is an mmio address that
when written and fenced assures that all previous posted writes
targeting a given dimm have been flushed to media.
2/ On-demand ARS (address range scrub):
Linux uses the results of the ACPI ARS commands to track bad blocks
in pmem devices. When latent errors are detected we re-scrub the media
to refresh the bad block list, userspace can also request a re-scrub at
any time.
3/ Support for the Microsoft DSM (device specific method) command format.
4/ Support for EDK2/OVMF virtual disk device memory ranges.
5/ Various fixes and cleanups across the subsystem.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXmXBsAAoJEB7SkWpmfYgCEwwP/1IOt9ocP+iHLMDH9KE7VaTZ
NmUDR+Zy6g5cRQM7SgcuU5BXUcx+OsSrSrUTVF1cW994o9Gbz1mFotkv0ZAsPcYY
ZVRQxo2oqHrssyOcg+PsgKWiXn68rJOCgmpEyzaJywl5qTMst7pzsT1s1f7rSh6h
trCf4VaJJwxZR8fARGtlHUnnhPe2Orp99EZRKEWprAsIv2kPuWpPHSjRjuEgN1JG
KW8AYwWqFTtiLRUk86I4KBB0wcDrfctsjgN9Ogd6+aHyQBRnVSr2U+vDCFkC8KLu
qiDCpYp+yyxBjclnljz7tRRT3GtzfCUWd4v2KVWqgg2IaobUc0Lbukp/rmikUXQP
WLikT2OCQ994eFK5OX3Q3cIU/4j459TQnof8q14yVSpjAKrNUXVSR5puN7Hxa+V7
41wKrAsnsyY1oq+Yd/rMR8VfH7PHx3bFkrmRCGZCufLX1UQm4aYj+sWagDKiV3yA
DiudghbOnhfurfGsnXUVw7y7GKs+gNWNBmB6ndAD6ZEHmKoGUhAEbJDLCc3DnANl
b/2mv1MIdIcC1DlCmnbbcn6fv6bICe/r8poK3VrCK3UgOq/EOvKIWl7giP+k1JuC
6DdVYhlNYIVFXUNSLFAwz8OkLu8byx7WDm36iEqrKHtPw+8qa/2bWVgOU6OBgpjV
cN3edFVIdxvZeMgM5Ubq
=xCBG
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
- Replace pcommit with ADR / directed-flushing.
The pcommit instruction, which has not shipped on any product, is
deprecated. Instead, the requirement is that platforms implement
either ADR, or provide one or more flush addresses per nvdimm.
ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
to the memory controller on a power-fail event.
Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
A flush hint is an mmio address that when written and fenced assures
that all previous posted writes targeting a given dimm have been
flushed to media.
- On-demand ARS (address range scrub).
Linux uses the results of the ACPI ARS commands to track bad blocks
in pmem devices. When latent errors are detected we re-scrub the
media to refresh the bad block list, userspace can also request a
re-scrub at any time.
- Support for the Microsoft DSM (device specific method) command
format.
- Support for EDK2/OVMF virtual disk device memory ranges.
- Various fixes and cleanups across the subsystem.
* tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
nfit: do an ARS scrub on hitting a latent media error
nfit: move to nfit/ sub-directory
nfit, libnvdimm: allow an ARS scrub to be triggered on demand
libnvdimm: register nvdimm_bus devices with an nd_bus driver
pmem: clarify a debug print in pmem_clear_poison
x86/insn: remove pcommit
Revert "KVM: x86: add pcommit support"
nfit, tools/testing/nvdimm/: unify shutdown paths
libnvdimm: move ->module to struct nvdimm_bus_descriptor
nfit: cleanup acpi_nfit_init calling convention
nfit: fix _FIT evaluation memory leak + use after free
tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
tools/testing/nvdimm: add virtual ramdisk range
acpi, nfit: treat virtual ramdisk SPA as pmem region
pmem: kill __pmem address space
pmem: kill wmb_pmem()
libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
fs/dax: remove wmb_pmem()
libnvdimm, pmem: flush posted-write queues on shutdown
...
That is the default used when no events is specified in tools, separate
it so that simpler tools that need no evlist can use it directly.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-67mwuthscwroz88x9pswcqyv@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Because it uses that function, which would lead every tool using it
to need to link against tools/lib/str_error_r.o.
This fixes building tools/vm/, that links with libapi.
Reported-by: Arjan van de Ven <arjan@linux.intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: b31e3e3316 ("tools lib api fs: Use str_error_r()")
Link: http://lkml.kernel.org/n/tip-aedt3qzibhnhaov2j4caqi61@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
At present, when creating module's map, perf gets 'start' address by
parsing '/proc/modules', but it's the module base address, it isn't the
start address of the '.text' section.
In most arches, it's OK. But for s390, it places 'GOT' and 'PLT'
relocations before '.text' section. So there exists an offset between
module base address and '.text' section, which will incur wrong symbol
resolution for modules.
Fix this bug by getting 'start' address of module's map from parsing
'/sys/module/[module name]/sections/.text', not from '/proc/modules'.
Signed-off-by: Song Shan Gong <gongss@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Link: http://lkml.kernel.org/r/1469070651-6447-2-git-send-email-gongss@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This reverts commit e083a21fca.
Not needed at all, tools/perf/util/perf_regs.h, included via:
#include "perf_regs.h"
Should have a definition for PERF_REGS_MAX, and since this is dependent
on HAVE_PERF_REGS_SUPPORT, fixes the build on powerpc, noticed by trying
to cross compile this from ubuntu16.04 with a locally build libz &
elfutils pair, since those are not available in multilib packages.
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-0bv204s71t4wuw1l53b6fz79@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The pcommit instruction is being deprecated in favor of either ADR
(asynchronous DRAM refresh: flush-on-power-fail) at the platform level, or
posted-write-queue flush addresses as defined by the ACPI 6.x NFIT (NVDIMM
Firmware Interface Table).
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add support for Intel's AVX-512 instructions to perf tools instruction
decoder used by Intel PT. The kernel's instruction decoder was updated in
a previous patch.
AVX-512 instructions are documented in Intel Architecture Instruction Set
Extensions Programming Reference (February 2016).
AVX-512 instructions are identified by a EVEX prefix which, for the purpose
of instruction decoding, can be treated as though it were a 4-byte VEX
prefix.
Existing instructions which can now accept an EVEX prefix need not be
further annotated in the op code map (x86-opcode-map.txt). In the case of
new instructions, the op code map is updated accordingly.
Also add associated Mask Instructions that are used to manipulate mask
registers used in AVX-512 instructions.
A representative set of instructions is added to the perf tools new
instructions test in a subsequent patch.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: X86 ML <x86@kernel.org>
Link: http://lkml.kernel.org/r/1469003437-32706-4-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
vcvtph2ps does not have an immediate operand, so remove the erroneous
'Ib' from its opcode map entry. Add vcvtph2ps to the perf tools new
instructions test to verify it.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: X86 ML <x86@kernel.org>
Link: http://lkml.kernel.org/r/1469003437-32706-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It's used from 2 objects in perf, so it's better to keep just one copy.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1468685480-18951-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jirka reported that python code returns all arrays as strings. This
makes impossible to get all items for byte array tracepoint field
containing 0x00 value item.
Fixing this by scanning full length of the array and returning it as
PyByteArray object in case non printable byte is found.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reported-and-Tested-by: Jiri Pirko <jiri@mellanox.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1468685480-18951-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Warn unmatched function filter correctly instead of warning
"symbol-loading error", since that can be a filter issue.
From the technical point of view, this adds a filter chech in map__load
and if there is a filter, it returns -2 (filter-out), instead of -1
(error), and perf-probe checks it and change message.
E.g. without this fix:
# perf probe -F rt_sp*
no symbols found in [kernel.kallsyms], maybe install a debug package?
Failed to load symbols in kernel
With this fix:
# perf probe -F rt_sp*
no symbols passed the given filter.
Failed to find symbols matched to "rt_sp*"
Error: Failed to show functions.
Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/146885835596.16106.2293540792775552481.stgit@devbox
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In some cases it's necessry to figure out the map-local index of a given
Linux logical CPU ID. Add a new helper, cpu_map__idx, to acquire this.
As the logic is largely the same as the existing cpu_map__has, this is
rewritten in terms of the new helper.
At the same time, add the inverse operation, cpu_map__cpu, which yields
the logical CPU id for a map-local index. While this can be performed
manually, wrapping this in a helper can make code more legible.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1468577293-19667-3-git-send-email-mark.rutland@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Not used anymore, remove one more file referencing kernel sources, i.e.
outside of tools/
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ykfjt3t8l0npxfwmekiwwyu6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Not used anymore. This also stops include linux/swab.h directly
from the kernel sources, remove that reference from the MANIFEST.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If write_backward attribute is set, records are written into kernel
ring buffer from end to beginning, but read from beginning to end.
To avoid 'XX out of order events recorded' warning message (timestamps
of records is in reverse order when using write_backward), suppress the
warning message if write_backward is selected by at lease one event.
Result:
Before this patch:
# perf record -m 1 -e raw_syscalls:sys_exit/overwrite/ \
-e raw_syscalls:sys_enter \
dd if=/dev/zero of=/dev/null count=300
300+0 records in
300+0 records out
153600 bytes (154 kB) copied, 0.000601617 s, 255 MB/s
[ perf record: Woken up 5 times to write data ]
Warning:
40 out of order events recorded.
[ perf record: Captured and wrote 0.096 MB perf.data (696 samples) ]
After this patch:
# perf record -m 1 -e raw_syscalls:sys_exit/overwrite/ \
-e raw_syscalls:sys_enter \
dd if=/dev/zero of=/dev/null count=300
300+0 records in
300+0 records out
153600 bytes (154 kB) copied, 0.000644873 s, 238 MB/s
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 0.096 MB perf.data (696 samples) ]
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-15-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch allows following config terms and option:
Globally setting events to overwrite;
# perf record --overwrite ...
Set specific events to be overwrite or no-overwrite.
# perf record --event cycles/overwrite/ ...
# perf record --event cycles/no-overwrite/ ...
Add missing config terms and update the config term array size because
the longest string length has changed.
For overwritable events, it automatically selects attr.write_backward
since perf requires it to be backward for reading.
Test result:
# perf record --overwrite -e syscalls:*enter_nanosleep* usleep 1
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (1 samples) ]
# perf evlist -v
syscalls:sys_enter_nanosleep: type: 2, size: 112, config: 0x134, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, write_backward: 1
# Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-14-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There's no user of these two function outside evlist.c. Remove them from
public namespace.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-13-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introduce a bkw_mmap_state state machine to evlist:
.________________(forbid)_____________.
| V
NOTREADY --(0)--> RUNNING --(1)--> DATA_PENDING --(2)--> EMPTY
^ ^ | ^ |
| |__(forbid)____/ |___(forbid)___/|
| |
\_________________(3)_______________/
NOTREADY : Backward ring buffers are not ready
RUNNING : Backward ring buffers are recording
DATA_PENDING : We are required to collect data from backward ring buffers
EMPTY : We have collected data from backward ring buffers.
(0): Setup backward ring buffer
(1): Pause ring buffers for reading
(2): Read from ring buffers
(3): Resume ring buffers for recording
We can't avoid this complexity. Since we deliberately drop records from
overwritable ring buffer, there's no way for us to check remaining from
ring buffer itself (by checking head and old pointers). Therefore, we
need DATA_PENDING and EMPTY state to help us recording what we have done
to the ring buffer.
In record__mmap_read_evlist(), drive this state machine from DATA_PENDING
to EMPTY.
In perf_evlist__mmap_per_evsel(), drive this state machine from NOTREADY
to RUNNING when creating backward mmap.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-11-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now there's no real user of evlist->backward. Drop it. We are going to
use evlist->backward_mmap as a container for backward ring buffer.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-10-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add backward_mmap to evlist, free it together with normal mmap.
Improve perf_evlist__pick_pc(), search backward_mmap if evlist->mmap is
not available.
This patch doesn't alloc this array. It will be allocated conditionally
in the following commits.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-8-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In perf_evlist__mmap_per_cpu() and perf_evlist__mmap_per_thread(), in
case of mmap failure, successfully created maps should be cleared.
Current code uses two loops calling __perf_evlist__munmap() for each
function.
This patch extracts common code to perf_evlist__munmap_nofree() and use
previous introduced decoupled API perf_mmap__munmap(). Now
__perf_evlist__munmap() can be removed because of no user.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-7-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Insetad of saving a index into fdarray entries private field, save the
corresponding 'struct perf_mmap' pointer, and release them directly
using perf_mmap__put().
Following commits introduce multiple mmap arrays to evlist. Without this
patch, perf_evlist__munmap_filtered() is unable to retrive correct
'struct perf_mmap' pointer.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-6-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently, the evlist mmap related helpers and APIs accept evlist and
idx, and dereference 'struct perf_mmap' by evlist->mmap[idx]. This is
unnecessary, and force each evlist contains only one mmap array.
Following commits are going to introduce multiple mmap arrays to a
evlist. This patch refators these APIs and helpers, introduces
functions accept perf_mmap pointer directly. New helpers and APIs are
decoupled with perf_evlist, and become perf_mmap functions (so they have
perf_mmap prefix).
Old functions are reimplemented with new functions. Some of them will be
removed in following commits.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
evsel->overwrite indicator means an event should be put into
overwritable ring buffer. In current implementation, it equals to
evsel->attr.write_backward. To reduce compliexity, remove
evsel->overwrite, use evsel->attr.write_backward instead.
In addition, in __perf_evsel__open(), if kernel doesn't support
write_backward and user explicitly set it in evsel, don't fallback
like other missing feature, since it is meaningless to fall back to
a forward ring buffer in this case: we are unable to stably read
from an forward overwritable ring buffer.
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There are cases where further work would be needed to overcome the fact
that neither sysconf(_SC_LEVEL1_DCACHE_LINESIZE) nor
/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size are
available in some systems (Android, for instance), so bail out when such
a situation takes place.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ho8d8g8mh0o2dri7ckcccafi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The Bionic libc has this definition, so don't duplicate it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chris Phlipot <cphlipot0@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-rmd19832zkt07e4crdzyen9z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Support a special SDT probe format which can omit the '%' prefix only if
the SDT group name starts with "sdt_". So, for example both of
"%sdt_libc:setjump" and "sdt_libc:setjump" are acceptable for perf probe
--add.
E.g. without this:
# perf probe -a sdt_libc:setjmp
Semantic error :There is non-digit char in line number.
...
With this:
# perf probe -a sdt_libc:setjmp
Added new event:
sdt_libc:setjmp (on %setjmp in /usr/lib64/libc-2.20.so)
You can now use it in all perf tools, such as:
perf record -e sdt_libc:setjmp -aR sleep 1
Suggested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/146831794674.17065.13359473252168740430.stgit@devbox
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>