Where we don't have "raw_syscalls:sys_enter", so we need to look for a
"*syscalls:sys_enter*" to initialize the offsets for the
__augmented_syscalls__ evsel, which is the case with etcsnoop, that was
segfaulting, fixed:
# trace -e /home/acme/git/perf/tools/perf/examples/bpf/etcsnoop.c
0.000 ( ): gnome-shell/2105 openat(dfd: CWD, filename: "/etc/localtime") ...
631.834 ( ): cat/6521 openat(dfd: CWD, filename: "/etc/ld.so.cache", flags: RDONLY|CLOEXEC) ...
632.637 ( ): bash/6521 openat(dfd: CWD, filename: "/etc/passwd") ...
^C#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: b9b6a2ea2b ("perf trace: Do not hardcode the size of the tracepoint common_ fields")
Link: https://lkml.kernel.org/n/tip-0tjwcit8qitsmh4nyvf2b0jo@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We were not taking into account the "... [continued]" printed
characters, fix it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-qt20y0acmf8k0bzisce8kw95@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When we get the sys_enter for a syscall we check if the last one is
still waiting for its matching sys_exit, if so we print this:
468.753 ( ): firefox/32382 poll(ufds: 0x7f3988d3dd00, nfds: 7, timeout_msecs: 4294967295) ...
449.575 ( 0.004 ms): Softwar~cThrea/32434 futex(uaddr: 0x7f39a18a9b70, op: WAKE|PRIVATE_FLAG, val: 1) = 0
At some point we'll get that poll sys_exit event and will print a "[continued]" line.
While making the sizing of the alignment after the syscall arg list and
its result configurable, so that we can mimic strace, which uses a
smaller alingment by default, a bug was introduced where the closing
parens appeared before the syscall name and its arg list, fix it.
Fixes: 4b8a240ed5 ("perf trace: Add alignment spaces after the closing parens")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-oi45i54s59h1w1kmgpzrfuum@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that beautifiers can access things like dev_maj.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-wm5o51f206c5pi063dsaeraq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We keep a table for the fds to map them back to pathnames when showing
'fd' based APIs such as write(), store as well the major number for the
device the path is in, to use in things like choosing the right ioctl
'cmd' beautifier.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-qjkds7bnk7v7fk2xhqsb0a4v@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that we can have that table expanded when setting other attributes.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-hzvpe3qwafe6sqcq3bhtbxds@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that we can add more per file attributes besides the pathname, such
as which ioctl beautifier to use, for cases such as the sound and
usbdeffs ioctls, that both use the 'U' command, so we have to
differentiate at the major number for the device file.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-1895cmhrdz2dkl5prf2cj2yj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We shouldn't hardcode the size of the tracepoint common_ fields, use the
offset of the 'id'/'__syscallnr' field in the sys_enter event instead.
This caused the augmented syscalls code to fail on a particular build of a
PREEMPT_RT_FULL kernel where these extra 'common_migrate_disable' and
'common_padding' fields were before the syscall id one:
# cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/format
name: sys_enter
ID: 22
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned short common_migrate_disable; offset:8; size:2; signed:0;
field:unsigned short common_padding; offset:10; size:2; signed:0;
field:long id; offset:16; size:8; signed:1;
field:unsigned long args[6]; offset:24; size:48; signed:0;
print fmt: "NR %ld (%lx, %lx, %lx, %lx, %lx, %lx)", REC->id, REC->args[0], REC->args[1], REC->args[2], REC->args[3], REC->args[4], REC->args[5]
#
All those 'common_' prefixed fields are zeroed when they hit a BPF tracepoint
hook, we better just discard those, i.e. somehow pass an offset to the
BPF program from the start of the ctx and make adjustments in the 'perf trace'
handlers to adjust the offset of the syscall arg offsets obtained from tracefs.
Till then, fix it the quick way and add this to the augmented_raw_syscalls.c to
bet it to work in such kernels:
diff --git a/tools/perf/examples/bpf/augmented_raw_syscalls.c b/tools/perf/examples/bpf/augmented_raw_syscalls.c
index 53c233370fae..1f746f931e13 100644
--- a/tools/perf/examples/bpf/augmented_raw_syscalls.c
+++ b/tools/perf/examples/bpf/augmented_raw_syscalls.c
@@ -38,12 +38,14 @@ struct bpf_map SEC("maps") syscalls = {
struct syscall_enter_args {
unsigned long long common_tp_fields;
+ long rt_common_tp_fields;
long syscall_nr;
unsigned long args[6];
};
struct syscall_exit_args {
unsigned long long common_tp_fields;
+ long rt_common_tp_fields;
long syscall_nr;
long ret;
};
Just to check that this was the case. Fix it properly later, for now remove the
hardcoding of the offset in the 'perf trace' side and document the situation
with this patch.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-2pqavrktqkliu5b9nzouio21@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While updating 'perf trace' on an machine with an old precompiled
augmented_raw_syscalls.o that didn't setup the syscall map the new 'perf
trace' codebase notices the augmented_raw_syscalls.o eBPF event, decides
to use it instead of the old raw_syscalls:sys_{enter,exit} method, but
then because we don't have the syscall map tries to set the tracepoint
filter on the sys_{enter,exit} evsels, that are NULL, segfaulting.
Make the code more robust by checking it those tracepoints have
their respective evsels in place before trying to set the tp filter.
With this we still get everything to work, just not setting up the
syscall filters, which is better than a segfault. Now to update the
precompiled augmented_raw_syscalls.o and continue development :-)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-3ft5rjdl05wgz2pwpx2z8btu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Also to make it match 'strace' output, for regression testing.
Both now produce this option, when 'perf trace' uses a .perfconfig
asking for the strace like output:
mmap(0x7faf66e6a000, 1363968, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22000) = 0x7faf66e6a000
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-27qhouo1kaac2iyl85nfnsf5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This actually so far, AFAIK is available only in x86, so the code was
put in place with x86 prefixes, in arches where it is not available it
will just not be called, so no further mechanisms are needed at this
time.
Later, when other arches wire this up, we'll just look at the uname
(live sessions) or perf_env data in the perf.data header to auto-wire
the right beautifier.
With this the output is the same as produced by 'strace' when used with
the following ~/.perfconfig:
# cat ~/.perfconfig
[llvm]
dump-obj = true
[trace]
add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
show_zeros = yes
show_duration = no
no_inherit = yes
show_timestamp = no
show_arg_names = no
args_alignment = -40
show_prefix = yes
#
And, on fedora 29, since the string tables are generated from the kernel
sources, we don't know about 0x3001, just like strace:
--- /tmp/strace 2018-12-17 11:22:08.707586721 -0300
+++ /tmp/trace 2018-12-18 11:11:32.037512729 -0300
@@ -1,49 +1,49 @@
-arch_prctl(0x3001 /* ARCH_??? */, 0x7ffc8a92dc80) = -1 EINVAL (Invalid argument)
+arch_prctl(0x3001 /* ARCH_??? */, 0x7ffe4eb93ae0) = -1 EINVAL (Invalid argument)
-arch_prctl(ARCH_SET_FS, 0x7faf6700f540) = 0
+arch_prctl(ARCH_SET_FS, 0x7fb507364540) = 0
And that seems to be related to the CET/Shadow Stack feature, that
userland in Fedora 29 (glibc 2.28) are querying the kernel about, that
0x3001 seems to be ARCH_CET_STATUS, I'll check the situation and test
with a fedora 29 kernel to see if the other codes are used.
A diff that ignores the different pointers for different runs needs to
be put in place in the upcoming regression tests comparing 'perf trace's
output to strace's.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-73a9prs8ktkrt97trtdmdjs8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To match 'strace' output, like in:
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffc8a92dc80) = -1 EINVAL (Invalid argument)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-kx59j2dk5l1x04ou57mt99ck@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We'll use it in the upcoming arch_prctl() 'code' arg beautifier.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-6e4tj2fjen8qa73gy4u49vav@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Matching strace's output format. The 'format' file for the syscall
tracepoints have an indication if the arg is a pointer, with some
exceptions like 'mmap' that has its first arg as an 'unsigned long', so
use a heuristic using the argument name, i.e. if it contains the 'addr'
substring, format it with the pointer formatter.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ddghemr8qrm6i0sb8awznbze@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To match strace, now both emit the same line for calls like:
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-krxl6klsqc9qyktoaxyih942@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To use strace's style, helping in comparing the output of 'perf trace'
with the one from 'strace', to help in upcoming regression tests.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mw6peotz4n84rga0fk78buff@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that the user, in an upcoming patch, can select printing it to get
the full string as used in the source code, not one with a common prefix
chopped off so as to make the output more compact.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-zypczc88gzbmeqx7b372s138@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To match 'strace' output, helping with upcoming regression tests
comparing both outputs.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-jab52t1dcuh6vlztqle9g7u9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I.f. if children should inherit the parent perf_event configuration,
i.e. if we should trace children as well or just the parent.
The default is to follow children, to disable this and have a behaviour
similar to strace, set this config option or use the --no_inherit 'perf
trace' option.
E.g.:
Default:
# perf config trace.no_inherit
# trace -e clone,*sleep time sleep 1
0.000 time/21107 clone(clone_flags: CHILD_CLEARTID|CHILD_SETTID|0x11, newsp: 0, child_tidptr: 0x7f7b8f9ae810) = 21108 (time)
? time/21108 ... [continued]: clone()
0.691 sleep/21108 nanosleep(rqtp: 0x7ffed01d0540, rmtp: 0 ) = 0
0.00user 0.00system 0:01.00elapsed 0%CPU (0avgtext+0avgdata 1988maxresident)k
0inputs+0outputs (0major+76minor)pagefaults 0swaps
#
Disable it:
# trace -e clone,*sleep time sleep 1
0.000 clone(clone_flags: CHILD_CLEARTID|CHILD_SETTID|0x11, newsp: 0, child_tidptr: 0x7ff41e100810) = 21414 (time)
0.00user 0.00system 0:01.00elapsed 0%CPU (0avgtext+0avgdata 1964maxresident)k
0inputs+0outputs (0major+76minor)pagefaults 0swaps
#
Notice that since there is just one thread, the "comm/TID" column is
suppressed.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-thd8s16pagyza71ufi5vjlan@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The default so far, since we show argument names followed by its values,
was to make the output more compact by suppressing most zeroed args.
Make this configurable so that users can choose what best suit their
needs.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-q0gxws02ygodh94o0hzim5xd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To add augmented_raw_syscalls to the events speficied by the user, or be
the only one if no events were specified by the user, one can add this
to perfconfig:
# cat ~/.perfconfig
[trace]
add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
#
I.e. pre-compile the augmented_raw_syscalls.c BPF program and make it
always load, this way:
# perf trace -e open* cat /etc/passwd > /dev/null
0.000 ( 0.013 ms): cat/31557 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3
0.035 ( 0.007 ms): cat/31557 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) = 3
0.353 ( 0.009 ms): cat/31557 openat(dfd: CWD, filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
0.424 ( 0.006 ms): cat/31557 openat(dfd: CWD, filename: /etc/passwd) = 3
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-0lgj7vh64hg3ce44gsmvj7ud@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We'll start adding more perf-syscall stuff, so lets do this prep step so
that the next ones are just about adding more fields.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-vac4sn1ns1vj4y07lzj7y4b8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Just another map, this time an BPF_MAP_TYPE_ARRAY, stating with
one bool per syscall, stating if it should be filtered or not.
So, with a pre-built augmented_raw_syscalls.o file, we use:
# perf trace -e open*,augmented_raw_syscalls.o
0.000 ( 0.016 ms): DNS Res~er #37/29652 openat(dfd: CWD, filename: /etc/hosts, flags: CLOEXEC ) = 138
187.039 ( 0.048 ms): gsd-housekeepi/2436 openat(dfd: CWD, filename: /etc/fstab, flags: CLOEXEC ) = 11
187.348 ( 0.041 ms): gsd-housekeepi/2436 openat(dfd: CWD, filename: /proc/self/mountinfo, flags: CLOEXEC ) = 11
188.793 ( 0.036 ms): gsd-housekeepi/2436 openat(dfd: CWD, filename: /proc/self/mountinfo, flags: CLOEXEC ) = 11
189.803 ( 0.029 ms): gsd-housekeepi/2436 openat(dfd: CWD, filename: /proc/self/mountinfo, flags: CLOEXEC ) = 11
190.774 ( 0.027 ms): gsd-housekeepi/2436 openat(dfd: CWD, filename: /proc/self/mountinfo, flags: CLOEXEC ) = 11
284.620 ( 0.149 ms): DataStorage/3076 openat(dfd: CWD, filename: /home/acme/.mozilla/firefox/ina67tev.default/SiteSecurityServiceState.txt, flags: CREAT|TRUNC|WRONLY, mode: IRUGO|IWUSR|IWGRP) = 167
^C#
What is it that this gsd-housekeeping thingy needs to open
/proc/self/mountinfo four times periodically? :-)
This map will be extended to tell per-syscall parameters, i.e. how many
bytes to copy per arg, using the function signature to get the types and
then the size of those types, via BTF.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-cy222g9ucvnym3raqvxp0hpg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So when we do something like:
# perf trace -e open*,augmented_raw_syscalls.o
We need to set trace->trace_syscalls because there is logic that use
that when mixing strace-like output with other events, such as scheduler
tracepoints, but with that set we ended up having multiple
raw_syscalls:sys_{enter,exit} setup, which garbled the output, so
check if trace->augmented_raw_syscalls is set and avoid the two extra
tracepoints.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-kjmnbrlgu0c38co1ye8egbsb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rename it to trace__set_ev_qualifier_tp_filter(), as this just sets up
tracepoint filters on the raw_syscalls:sys_{enter,exit} tracepoints, and
since we're going to do the same for the augmented_raw_syscalls
codepath, when used, rename it to clarify.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-8bjsul8x7osw7nxjodnyfn14@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Noticed while working on renameat2.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-8omchrcjcvlwoxxv6wrjehfh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I was trigger happy on this one, as using ordered_events as implemented
by Jiri for use with the --block code under discussion on lkml incurs
in delaying processing to form batches that then get ordered and then
printed.
With 'perf trace' we want to process the events as they go, without that
delay, and doing it that way works well for the common case which is to
trace a thread or a workload started by 'perf trace'.
So revert back to not using ordered_events but add an option to select
that mode so that users can experiment with their particular use case to
see if works better, i.e. if the added delay is not a problem and the
ordering helps.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-8ki7sld6rusnjhhtaly26i5o@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Just hide a bit more how events gets delivered, hiding ordered_events
details from the main loop.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-lxwwf3238ta4neq2zh1y1h45@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sort events to provide the precise outcome of ordered events, just like
is done with 'perf report' and 'perf top'.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dmitry Levin <ldv@altlinux.org>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Luis Cláudio Gonçalves <lclaudio@uudg.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181205160509.1168-9-jolsa@kernel.org
[ split from a larger patch, added trace__ prefixes to new 'struct trace' methods ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mov event delivery code to a new trace__deliver_event() function, so
it's easier to add ordered delivery coming in the following patches.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dmitry Levin <ldv@altlinux.org>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Luis Cláudio Gonçalves <lclaudio@uudg.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181205160509.1168-8-jolsa@kernel.org
[ Add trace__ prefix to the deliver_event method ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To cope with older kernels that don't have this patch backported:
026842d148 ("tracing/syscalls: Rename "/format" tracepoint field name "nr" to "__syscall_nr:")
This makes 'perf trace' work again in RHEL7 kernels.
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-6h1syw2isegnhb1bjmtr9x9k@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The default timeout of 500ms for parsing /proc/<pid>/maps files is too
short for profiling many of our services.
This can be overridden by passing --proc-map-timeout to the relevant
command but it'd be nice to globally increase our default value.
This patch permits setting a different default with the
core.proc-map-timeout config file parameter.
Signed-off-by: Mark Drayton <mbd@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20181204203420.1683114-1-mbd@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Go over the tools/ files that are maintained in Arnaldo's tree and
fix common typos: half of them were in comments, the other half
in JSON files.
No change in functionality intended.
Committer notes:
This was split from a larger patch as there are code that is,
additionally, maintained outside the kernel tree, so to ease
cherry-picking and/or backporting, split this into multiple patches.
Just typos in comments, no need to backport, reducing the possibility of
possible backporting artifacts.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20181203102200.GA104797@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In order to make libtraceevent into a proper library, variables, data
structures and functions require a unique prefix to prevent name space
conflicts.
This renames 'struct tep_event_format' to 'struct tep_event', which
describes more closely the purpose of the struct.
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181130154647.436403995@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[ Fixup conflict with 6e33c250a88f ("tools lib traceevent: Fix compile warnings in tools/lib/traceevent/event-parse.c") ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This makes the augmented_syscalls support the --filter-pids and
auto-filtered feedback loop pids just like when working without BPF,
i.e. with just raw_syscalls:sys_{enter,exit} and tracepoint filters.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-zc5n453sxxm0tz1zfwwelyti@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Lookup for the first map named "filtered_pids" and, if augmenting
syscalls, i.e. if a BPF event is present and the
"__augmented_syscalls__" is present, then fill in that map with the pids
to filter, be it feedback loop ones (perf trace's pid, its father if it
is "sshd", more auto-filtered in the future) or the ones explicitely
stated in the tool command line via --filter-pids.
The code to actually fill in the map comes next.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-rhzytmw7qpe6lqyjxi1ded9t@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As we'll need that name for a new function to set filters for both
tracepoints and BPF maps for filtering pids.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mdkck6hf3fnd21rz2766280q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To better reflect that this is a tracepoint filter, as opposed, for
instance to map based BPF filters.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-9138svli6ddcphrr3ymy9oy3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For now with BPF raw_augmented we hook into raw_syscalls:sys_enter and
there we get all 6 syscall args plus the tracepoint common fields
(sizeof(long)) and the syscall_nr (another long). So we check if that is
the case and if so don't look after the sc->args_size, but always after
the full raw_syscalls:sys_enter payload, which is fixed.
We'll revisit this later to pass s->args_size to the BPF augmenter (now
tools/perf/examples/bpf/augmented_raw_syscalls.c, so that it copies only
what we need for each syscall, like what happens when we use
syscalls:sys_enter_NAME, so that we reduce the kernel/userspace traffic
to just what is needed for each syscall.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-nlslrg8apxdsobt4pwl3n7ur@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The pathname beautifiers so far support just one augmented pathname per
syscall, so do it just for mount's first arg, later this will get fixed.
With:
# perf probe -l
probe:vfs_getname (on getname_flags:73@acme/git/linux/fs/namei.c with pathname)
#
Later this will get added to augmented_syscalls.c (eBPF):
In one xterm:
# perf trace -e mount,umount
2687.331 ( 3.544 ms): mount/8892 mount(dev_name: /mnt, dir_name: 0x561f9ac184a0, type: 0x561f9ac1b170, flags: BIND) = 0
3912.126 ( 8.807 ms): umount/8895 umount2(name: /mnt) = 0
^C#
In the other:
$ sudo mount --bind /proc /mnt
$ sudo umount /mnt
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Benjamin Peterson <benjamin@python.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-qsvhrm2es635cl4zicqjeth2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
By using the SCA_FILENAME beautifier, that works when either the
probe:vfs_getname probe is in place or with the eBPF program
tools/perf/examples/bpf/augmented_syscalls.c:
# perf probe -l
probe:vfs_getname (on getname_flags:73@acme/git/linux/fs/namei.c with pathname)
# perf trace -e umount
9630.332 ( 9.521 ms): umount/8082 umount2(name: /mnt) = 0
#
The augmented syscalls one will be done in the next patch.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Benjamin Peterson <benjamin@python.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-hegbzlpd2nrn584l5jxn7sy2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When trying to trace the 'umount' syscall on x86_64 I noticed that it
was failing:
# trace -e umount umount /mnt
event syntax error: 'umount'
\___ parser error
Run 'perf list' for a list of valid events
Usage: perf trace [<options>] [<command>]
or: perf trace [<options>] -- <command> [<options>]
or: perf trace record [<options>] [<command>]
or: perf trace record [<options>] -- <command> [<options>]
-e, --event <event> event/syscall selector. use 'perf list' to list available events
#
This is because in the x86-64 we have it just as 'umount2':
$ grep umount arch/x86/entry/syscalls/syscall_64.tbl
166 common umount2 __x64_sys_umount
$
So if the syscall name fails, try fallbacking to looking at the aliases
we have in the syscall_fmts table to then re-lookup, now:
# trace -e umount umount -f /mnt
umount: /mnt: not mounted.
1.759 ( 0.004 ms): umount/18365 umount2(name: 0x55fbfcbc4480, flags: 1) = -1 EINVAL Invalid argument
#
Time to beautify the flags arg :-)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Benjamin Peterson <benjamin@python.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ukweodgzbmjd25lfkgryeft1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>