Separate thread wait time into 3 parts - sleep, iowait and preempt based
on the prev_state of the last event.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170113104523.31212-1-namhyung@kernel.org
[ Fix the build on centos:5 where 'wait' shadows a global declaration ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In 2059fc7a5a ("perf symbols: Allow forcing reading of non-root owned
files by root") 'perf report' was added the option of forcing reading of
non-root owned symbol file.
This add the same behavior for perf script.
Reported-by: Mark Drayton <mbd@fb.com>
Signed-off-by: Yannick Brosseau <scientist@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20170113182527.18625-1-scientist@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The "--dump-raw-script" is not a valid option, replace it with the valid
one, "--dump-raw-trace"
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 133dc4c39c ("perf: Rename 'perf trace' to 'perf script'")
LPU-Reference: 728644547.14560155.1484320012612.JavaMail.zimbra@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds missing member names to struct initializations.
Although in C99 for struct S {int x, int y} two init codes struct S s =
{.x = (a), (b)} and struct S s = {.x = (a), .y = (b)} are the same, it
is better to explicitly write .y (.argh in this patch) for readability
and robustness against language/compiler evolutions.
Signed-off-by: Soramichi Akiyama <akiyama@m.soramichi.jp>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170113215623.32fb1ac2d862af0048c30fe6@m.soramichi.jp
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Don't warn for feature-dwarf==0 if user explicitily disabled DWARF by
using NO_DWARF=1.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20170112210159.76143-1-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move the scale factor parsing code to an own function to reuse it in an
upcoming patch.
v2: Return error in case strdup returns NULL.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170103150833.6694-2-andi@firstfloor.org
[ Keep returning -ENOMEM when strdup() fails in perf_pmu__parse_scale()/convert_scale() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
1b07304c58 ("KVM: nVMX: support descriptor table exits")
That adds entries to VMX_EXIT_REASONS, that is used by
tools/perf/arch/x86/util/kvm-stat.c.
This also picks the changes in:
1dc35dacc1 ("KVM: nVMX: check host CR3 on vmentry and vmexit")
But these are not used in 'perf kvm stat', do it just to silence the
kernel/tools file cache coherency detector:
$ make -C tools/perf
make: Entering directory '/home/acme/git/linux/tools/perf'
BUILD: Doing 'make -j4' parallel build
Warning: arch/x86/include/uapi/asm/vmx.h differs from kernel
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ladi Prosek <lprosek@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-56uowkk8t5zje49a42asffcy@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It's now possible to specify the threshold time for perf.data like:
$ perf record --switch-output=30s ...
Once it's reached, the current data are dumped in to the
perf.data.<timestamp> file and session does on.
$ perf record --switch-output=30s ...
[ perf record: dump data: Woken up 44 times ]
[ perf record: Dump perf.data.2017010213043746 ]
...
The time is expected to be a number with appended unit
character - s/m/h/d.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483955520-29063-7-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding switch-output size warning if the requested
size of lower than the wakeup ring buffer size.
$ perf record --switch-output=1K ls
WARNING: switch-output data size lower than wakeup kernel buffer size (258K) expect bigger perf.data sizes
...
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Suggested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1483955520-29063-6-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It's now possible to specify the threshold size for perf.data like:
$ perf record --switch-output=2G ...
Once it's reached, the current data are dumped in to the
perf.data.<timestamp> file and session does on.
$ perf record --switch-output=2G ...
[ perf record: dump data: Woken up 7244 times ]
[ perf record: Dump perf.data.2017010214093746 ]
...
The size is expected to be a number with appended unit character -
B/K/M/G.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483955520-29063-5-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Next patches will add --switch-output option arguments, changing the
option to allow that and adding its default value to 'signal'.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483955520-29063-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Next patches will add more --switch-output option arguments,
so preparing the data holder.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483955520-29063-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add unit_number__scnprintf function to display size units and use it in
-m option info message.
Before:
$ perf record -m 10M ls
rounding mmap pages size to 16777216 bytes (4096 pages)
...
After:
$ perf record -m 10M ls
rounding mmap pages size to 16M (4096 pages)
...
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1483955520-29063-2-git-send-email-jolsa@kernel.org
[ Rename it to unit_number__scnprintf for consistency ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch fixes a typo: s/enable to/unable to/
Signed-off-by: Soramichi AKIYAMA <akiyama@m.soramichi.jp>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: bcf3145fbe ("perf evlist: Enhance perf_evlist__start_workload()")
Link: http://lkml.kernel.org/r/20170110200006.e1f7a766b4faf1f107ae2e1b@m.soramichi.jp
[ Wasn't applying, fixed it up by hand, added Fixes: tag ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Makes it easier to specify both events and syscalls (to be formatter
strace-like), i.e. previously one would have to do:
# perf trace -e nanosleep --event sched:sched_switch usleep 1
Now it is possible to do:
# perf trace -e nanosleep,sched:sched_switch usleep 1
0.000 ( 0.021 ms): usleep/17962 nanosleep(rqtp: 0x7ffdedd61ec0) ...
0.021 ( ): sched:sched_switch:usleep:17962 [120] S ==> swapper/1:0 [120])
0.000 ( 0.066 ms): usleep/17962 ... [continued]: nanosleep()) = 0
#
The old style --expr and using both -e and --event continues to work.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ieg6bakub4657l9e6afn85r4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Its similar to doing grep on a /proc/kallsyms, but it also shows extra
information like the path to the kernel module and the unrelocated
addresses in it, to help in diagnosing problems.
It is also helps demonstrate the use of the symbols routines so that
tool writers can use them more effectively.
Using it:
$ perf kallsyms e1000_xmit_frame netif_rx usb_stor_set_xfer_buf
e1000_xmit_frame: [e1000e] /lib/modules/4.9.0+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko 0xffffffffc046fc10-0xffffffffc0470bb0 (0x19c80-0x1ac20)
netif_rx: [kernel] [kernel.kallsyms] 0xffffffff916f03a0-0xffffffff916f0410 (0xffffffff916f03a0-0xffffffff916f0410)
usb_stor_set_xfer_buf: [usb_storage] /lib/modules/4.9.0+/kernel/drivers/usb/storage/usb-storage.ko 0xffffffffc057aea0-0xffffffffc057af19 (0xf10-0xf89)
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-79bk9pakujn4l4vq0f90klv3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To reduce the boilerplate for searching for functions in the running
kernel and modules.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-93iqzayafpaxaguoiwjqezgz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As it was getting the BUILD_BUG_ON_ZERO() definition by luck.
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/n/tip-dh71o31ar72ajck8o2x4aoae@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The install command for libperf-jvmti.so does not check if libdir exists
before installing. This means that when the install command is run:
install libperf-jvmti.so '/tmp/test_root/usr/lib64';
libperf-jvmti.so will get installed to /usr/lib64 as a file and break
further installation. Fix this by ensuring the directory gets created
first.
See https://bugzilla.redhat.com/show_bug.cgi?id=1410296
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: d4dfdf00d4 ("perf jvmti: Plug compilation into perf build")
Link: http://lkml.kernel.org/r/1483741088-13543-1-git-send-email-labbott@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes:
- Fix prev/next_prio formatting for deadline tasks in libtraceevent (Daniel Bristot de Oliveira)
- Robustify reading of build-ids from /sys/kernel/note (Arnaldo Carvalho de Melo)
- Fix building some sample/bpf in Alpine Linux 3.4 (Arnaldo Carvalho de Melo)
- Fix 'make install-bin' to install libtraceevent plugins (Arnaldo Carvalho de Melo)
- Fix 'perf record --switch-output' documentation and comment (Jiri Olsa)
- 'perf probe' fixes for cross arch probing (Masami Hiramatsu)
Improvement:
- Show total scheduling time in 'perf sched timehist' (Namhyumg Kim)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYbS6KAAoJENZQFvNTUqpAWqkP/1xdvbDqeTA2YjcWG+ffQJye
LULSNQpRShvBGpB3zsOzOebOtw0R5oWb7YAp4JFZNyMIk5qcnS9gVwdw9Qx1lc/W
Gq8OuhI28Pr9AHE37eTJ1+12HI2aAOylwKFqUpEWMzoaUqBDRh5JV2MeVGVQXBRe
YBJ4Qwk1a7Ng74uzN/pdiC6V7pjddPEoLZEDG5p35vrgtNAY1JGlX5rvNR7CO6nN
72QHRnB4r1d9fgNGS5oO5zEU1q5+JN5phe+gDANdk/8pCqDegRjZpHnj5gcm71jr
mEvbstmlV1XPZSA9aFVu2L5LZuv6asS02HjrbGydqO2DUTpDgrynHsjAuqqaTnwk
kLGwA2I+Kyh8g155H+HrQANONe2TTOXRkBVWIoFBWkhd1JUS/p3GE2qvqx5N36sr
U26d9i/Bk/pbOYOdRUFMc4lckZY2HrrRWj8dVr06qLr2rlbYoLv8hpqvdyHr9MJS
Id2Yqqd1JVzBDU3/+GfdFDBjltVtpSG1mdc6LWy9IpXFMJoSOQNsrbjf+mgMadeV
EbtY8MdDZ525EB8niv9E9Vfd56LiWTKCjg9Mc758+U+Ns50YlQFsDq2hBO2+jQ/k
W3QNX+WJiLgxqQK8adveXoYgCSWPNVttIvg8Lj1KNk8Y//vUC1KSrvDfaKiROzLB
iF0u5mLRAQ+Pfec6X5PP
=o9KL
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-for-mingo-4.10-20170104' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes and one improvement from Arnaldo Carvalho de Melo:
Fixes:
- Fix prev/next_prio formatting for deadline tasks in libtraceevent (Daniel Bristot de Oliveira)
- Robustify reading of build-ids from /sys/kernel/note (Arnaldo Carvalho de Melo)
- Fix building some sample/bpf in Alpine Linux 3.4 (Arnaldo Carvalho de Melo)
- Fix 'make install-bin' to install libtraceevent plugins (Arnaldo Carvalho de Melo)
- Fix 'perf record --switch-output' documentation and comment (Jiri Olsa)
- Fix 'perf probe' for cross arch probing (Masami Hiramatsu)
Improvement:
- Show total scheduling time in 'perf sched timehist' (Namhyumg Kim)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix perf-probe to show probe definition on gcc generated symbols for
offline kernel (including cross-arch kernel image).
gcc sometimes optimizes functions and generate new symbols with suffixes
such as ".constprop.N" or ".isra.N" etc. Since those symbol names are
not recorded in DWARF, we have to find correct generated symbols from
offline ELF binary to probe on it (kallsyms doesn't correct it). For
online kernel or uprobes we don't need it because those are rebased on
_text, or a section relative address.
E.g. Without this:
$ perf probe -k build-arm/vmlinux -F __slab_alloc*
__slab_alloc.constprop.9
$ perf probe -k build-arm/vmlinux -D __slab_alloc
p:probe/__slab_alloc __slab_alloc+0
If you put above definition on target machine, it should fail
because there is no __slab_alloc in kallsyms.
With this fix, perf probe shows correct probe definition on
__slab_alloc.constprop.9:
$ perf probe -k build-arm/vmlinux -D __slab_alloc
p:probe/__slab_alloc __slab_alloc.constprop.9+0
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/148350060434.19001.11864836288580083501.stgit@devbox
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix --funcs (-F) option to show correct symbols for offline module.
Since previous perf-probe uses machine__findnew_module_map() for offline
module, even if user passes a module file (with full path) which is for
other architecture, perf-probe always tries to load symbol map for
current kernel module.
This fix uses dso__new_map() to load the map from given binary as same
as a map for user applications.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/148350053478.19001.15435255244512631545.stgit@devbox
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Markus reported that perf segfaults when reading /sys/kernel/notes from
a kernel linked with GNU gold, due to what looks like a gold bug, so do
some bounds checking to avoid crashing in that case.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Report-Link: http://lkml.kernel.org/r/20161219161821.GA294@x4
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ryhgs6a6jxvz207j2636w31c@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Those are binaries as well, so should be installed by:
make -C tools/perf install-bin'
too.
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-3841b37u05evxrs1igkyu6ks@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently, the sched:sched_switch tracepoint reports deadline tasks with
priority -1. But when reading the trace via perf script I've got the
following output:
# ./d & # (d is a deadline task, see [1])
# perf record -e sched:sched_switch -a sleep 1
# perf script
...
swapper 0 [000] 2146.962441: sched:sched_switch: swapper/0:0 [120] R ==> d:2593 [4294967295]
d 2593 [000] 2146.972472: sched:sched_switch: d:2593 [4294967295] R ==> g:2590 [4294967295]
The task d reports the wrong priority [4294967295]. This happens because
the "int prio" is stored in an unsigned long long val. Although it is
set as a %lld, as int is shorter than unsigned long long,
trace_seq_printf prints it as a positive number.
The fix is just to cast the val as an int, and print it as a %d,
as in the sched:sched_switch tracepoint's "format".
The output with the fix is:
# ./d &
# perf record -e sched:sched_switch -a sleep 1
# perf script
...
swapper 0 [000] 4306.374037: sched:sched_switch: swapper/0:0 [120] R ==> d:10941 [-1]
d 10941 [000] 4306.383823: sched:sched_switch: d:10941 [-1] R ==> swapper/0:0 [120]
[1] d.c
---
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/types.h>
#include <linux/sched.h>
struct sched_attr {
__u32 size, sched_policy;
__u64 sched_flags;
__s32 sched_nice;
__u32 sched_priority;
__u64 sched_runtime, sched_deadline, sched_period;
};
int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags)
{
return syscall(__NR_sched_setattr, pid, attr, flags);
}
int main(void)
{
struct sched_attr attr = {
.size = sizeof(attr),
.sched_policy = SCHED_DEADLINE, /* This creates a 10ms/30ms reservation */
.sched_runtime = 10 * 1000 * 1000,
.sched_period = attr.sched_deadline = 30 * 1000 * 1000,
};
if (sched_setattr(0, &attr, 0) < 0) {
perror("sched_setattr");
return -1;
}
for(;;);
}
---
Committer notes:
Got the program from the provided URL, http://bristot.me/lkml/d.c,
trimmed it and included in the cset log above, so that we have
everything needed to test it in one place.
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/866ef75bcebf670ae91c6a96daa63597ba981f0d.1483443552.git.bristot@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There's no --signal-trigger option, also adding the code comment into
record man page.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483431600-19887-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There's no need for this one to be global.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483431600-19887-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To allow string options with a default argument and variable set when
the option is used.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483431600-19887-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since 'perf probe' supports cross-arch probes, it is possible to analyze
different arch kernel image which has different bits-per-long.
In that case, it fails to get the module name because it uses the
MOD_NAME_OFFSET macro based on the host machine bits-per-long, instead
of the target arch bits-per-long.
This fixes above issue by changing modname-offset based on the target
archs bit width. This is ok because linux kernel uses LP64 model on
64bit arch.
E.g. without this (on x86_64, and target module is arm32):
$ perf probe -m build-arm/fs/configfs/configfs.ko -D configfs_lookup
p:probe/configfs_lookup :configfs_lookup+0
^-Here is an empty module name.
With this fix, you can see correct module name:
$ perf probe -m build-arm/fs/configfs/configfs.ko -D configfs_lookup
p:probe/configfs_lookup configfs:configfs_lookup+0
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/148337043836.6752.383495516397005695.stgit@devbox
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Show length of analyzed sample time and rate of idle task running.
This also takes care of time range given by --time option.
$ perf sched timehist -sI | tail
Samples do not have callchains.
Idle stats:
CPU 0 idle for 930.316 msec ( 92.93%)
CPU 1 idle for 963.614 msec ( 96.25%)
CPU 2 idle for 885.482 msec ( 88.45%)
CPU 3 idle for 938.635 msec ( 93.76%)
Total number of unique tasks: 118
Total number of context switches: 2337
Total run time (msec): 3718.048
Total scheduling time (msec): 1001.131 (x 4)
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20161222060350.17655-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull turbostat updates from Len Brown.
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: remove obsolete -M, -m, -C, -c options
tools/power turbostat: Make extensible via the --add parameter
tools/power turbostat: Denverton uses a 25 MHz crystal, not 19.2 MHz
tools/power turbostat: line up headers when -M is used
tools/power turbostat: fix SKX PKG_CSTATE_LIMIT decoding
tools/power turbostat: Support Knights Mill (KNM)
tools/power turbostat: Display HWP OOB status
tools/power turbostat: fix Denverton BCLK
tools/power turbostat: use intel-family.h model strings
tools/power/turbostat: Add Denverton RAPL support
tools/power/turbostat: Add Denverton support
tools/power/turbostat: split core MSR support into status + limit
tools/power turbostat: fix error case overflow read of slm_freq_table[]
tools/power turbostat: Allocate correct amount of fd and irq entries
tools/power turbostat: switch to tab delimited output
tools/power turbostat: Gracefully handle ACPI S3
tools/power turbostat: tidy up output on Joule counter overflow
The new --add option has replaced the -M, -m, -C, -c options
Eg.
-M 0x10 is now --add msr0x10,raw
-m 0x10 is now --add msr0x10,raw,u32
-C 0x10 is now --add msr0x10,delta
-c 0x10 is now --add msr0x10,delta,u32
The --add option can be repeated to add any number of counters,
while the previous options were limited to adding one of each type.
In addition, the --add option can accept a column label,
and can also display a counter as a percentage of elapsed cycles.
Eg. --add msr0x3fe,core,percent,MY_CC3
Signed-off-by: Len Brown <len.brown@intel.com>
Create the "--add" parameter. This can be used to teach an existing
turbostat binary about any number of any type of counter.
turbostat(8) details the syntax for --add.
Signed-off-by: Len Brown <len.brown@intel.com>
Pull perf fixes from Ingo Molnar:
"On the kernel side there's two x86 PMU driver fixes and a uprobes fix,
plus on the tooling side there's a number of fixes and some late
updates"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
perf sched timehist: Fix invalid period calculation
perf sched timehist: Remove hardcoded 'comm_width' check at print_summary
perf sched timehist: Enlarge default 'comm_width'
perf sched timehist: Honour 'comm_width' when aligning the headers
perf/x86: Fix overlap counter scheduling bug
perf/x86/pebs: Fix handling of PEBS buffer overflows
samples/bpf: Move open_raw_sock to separate header
samples/bpf: Remove perf_event_open() declaration
samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
tools lib bpf: Add bpf_prog_{attach,detach}
samples/bpf: Switch over to libbpf
perf diff: Do not overwrite valid build id
perf annotate: Don't throw error for zero length symbols
perf bench futex: Fix lock-pi help string
perf trace: Check if MAP_32BIT is defined (again)
samples/bpf: Make perf_event_read() static
uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation
samples/bpf: Make samples more libbpf-centric
tools lib bpf: Add flags to bpf_create_map()
tools lib bpf: use __u32 from linux/types.h
...
When --time option is given with a value outside recorded time, the last
sample time (tprev) was set to that value and run time calculation might
be incorrect. This is a problem of the first samples for each cpus
since it would skip the runtime update when tprev is 0. But with --time
option it had non-zero (which is invalid) value so the calculation is
also incorrect.
For example, let's see the followging:
$ perf sched timehist
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
3195.968367 [0003] <idle> 0.000 0.000 0.000
3195.968386 [0002] Timer[4306/4277] 0.000 0.000 0.018
3195.968397 [0002] Web Content[4277] 0.000 0.000 0.000
3195.968595 [0001] JS Helper[4302/4277] 0.000 0.000 0.000
3195.969217 [0000] <idle> 0.000 0.000 0.621
3195.969251 [0001] kworker/1:1H[291] 0.000 0.000 0.033
The sample starts at 3195.968367 but when I gave a time interval from
3194 to 3196 (in sec) it will calculate the whole 2 second as runtime.
In below, 2 cpus accounted it as runtime, other 2 cpus accounted it as
idle time.
Before:
$ perf sched timehist --time 3194,3196 -s | tail
Idle stats:
CPU 0 idle for 1995.991 msec
CPU 1 idle for 20.793 msec
CPU 2 idle for 30.191 msec
CPU 3 idle for 1999.852 msec
Total number of unique tasks: 23
Total number of context switches: 128
Total run time (msec): 3724.940
After:
$ perf sched timehist --time 3194,3196 -s | tail
Idle stats:
CPU 0 idle for 10.811 msec
CPU 1 idle for 20.793 msec
CPU 2 idle for 30.191 msec
CPU 3 idle for 18.337 msec
Total number of unique tasks: 23
Total number of context switches: 128
Total run time (msec): 18.139
Committer notes:
Further testing:
Before:
Idle stats:
CPU 0 idle for 229.785 msec
CPU 1 idle for 937.944 msec
CPU 2 idle for 188.931 msec
CPU 3 idle for 986.185 msec
After:
# perf sched timehist --time 40602,40603 -s | tail
Idle stats:
CPU 0 idle for 229.785 msec
CPU 1 idle for 175.407 msec
CPU 2 idle for 188.931 msec
CPU 3 idle for 223.657 msec
Total number of unique tasks: 68
Total number of context switches: 814
Total run time (msec): 97.688
# for cpu in `seq 0 3` ; do echo -n "CPU $cpu idle for " ; perf sched timehist --time 40602,40603 | grep "\[000${cpu}\].*\<idle\>" | tr -s ' ' | cut -d' ' -f7 | awk '{entries++ ; s+=$1} END {print s " msec (entries: " entries ")"}' ; done
CPU 0 idle for 229.721 msec (entries: 123)
CPU 1 idle for 175.381 msec (entries: 65)
CPU 2 idle for 188.903 msec (entries: 56)
CPU 3 idle for 223.61 msec (entries: 102)
Difference due to the idle stats being accounted at nanoseconds precision while
the <idle> entries in 'perf sched timehist' are trucated at msec.usec.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: 853b740711 ("perf sched timehist: Add option to specify time window of interest")
Link: http://lkml.kernel.org/r/20161222060350.17655-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now that the default 'comm_width' value is 30, no need to check that at
print_summary,
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
[ Split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Current default value is 20 but it's easily changed to a bigger value as
task has a long name and different tid and pid. And it makes the output
not aligned. So change it to have a large value as summary shows.
Committer notes:
Before:
# perf sched record
^C
# perf sched timehist
<SNIP>
40602.770537 [0001] rcuos/2[29] 7.970 0.002 0.020
40602.771512 [0003] <idle> 0.003 0.000 0.986
40602.771586 [0001] <idle> 0.020 0.000 1.049
40602.771606 [0001] qemu-system-x86[3593/3510] 0.000 0.002 0.020
40602.771629 [0003] qemu-system-x86[3510] 0.000 0.003 0.116
40602.771776 [0000] <idle> 0.001 0.000 1.892
<SNIP>
After:
# perf sched timehist
<SNIP>
40602.770537 [0001] rcuos/2[29] 7.970 0.002 0.020
40602.771512 [0003] <idle> 0.003 0.000 0.986
40602.771586 [0001] <idle> 0.020 0.000 1.049
40602.771606 [0001] qemu-system-x86[3593/3510] 0.000 0.002 0.020
40602.771629 [0003] qemu-system-x86[3510] 0.000 0.003 0.116
<SNIP>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
[ Split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Current default value is 20, but that may change in the future, so make
places where we have 20 hardcoded use 'comm_width'.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
[ Split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit d8c5b17f2b ("samples: bpf: add userspace example for attaching
eBPF programs to cgroups") added these functions to samples/libbpf, but
during this merge all of the samples libbpf functionality is shifting to
tools/lib/bpf. Shift these functions there.
Committer notes:
Use bzero + attr.FIELD = value instead of 'attr = { .FIELD = value, just
like the other wrapper calls to sys_bpf with bpf_attr to make this build
in older toolchais, such as the ones in CentOS 5 and 6.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-au2zvtsh55vqeo3v3uw7jr4c@git.kernel.org
Link: 353e6f298c.patch
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes a perf diff regression issue which was introduced by commit
5baecbcd9c ("perf symbols: we can now read separate debug-info files
based on a build ID")
The binary name could be same when perf diff different binaries. Build
id is used to distinguish between them.
However, the previous patch assumes the same binary name has same build
id. So it overwrites the build id according to the binary name,
regardless of whether the build id is set or not.
Check the has_build_id in dso__load. If the build id is already set, use
it.
Before the fix:
$ perf diff 1.perf.data 2.perf.data
# Event 'cycles'
#
# Baseline Delta Shared Object Symbol
# ........ ....... ................ .............................
#
99.83% -99.80% tchain_edit [.] f2
0.12% +99.81% tchain_edit [.] f3
0.02% -0.01% [ixgbe] [k] ixgbe_read_reg
After the fix:
$ perf diff 1.perf.data 2.perf.data
# Event 'cycles'
#
# Baseline Delta Shared Object Symbol
# ........ ....... ................ .............................
#
99.83% +0.10% tchain_edit [.] f3
0.12% -0.08% tchain_edit [.] f2
Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
CC: Dima Kogan <dima@secretsauce.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: 5baecbcd9c ("perf symbols: we can now read separate debug-info files based on a build ID")
Link: http://lkml.kernel.org/r/1481642984-13593-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'perf report --tui' exits with error when it finds a sample of zero
length symbol (i.e. addr == sym->start == sym->end). Actually these are
valid samples. Don't exit TUI and show report with such symbols.
Reported-and-Tested-by: Anton Blanchard <anton@samba.org>
Link: https://lkml.org/lkml/2016/10/8/189
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Riyder <chris.ryder@arm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@kernel.org # v4.9+
Link: http://lkml.kernel.org/r/1479804050-5028-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There might be systems where MAP_32BIT is not defined, like some some
RHEL7 powerpc versions.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Kyle McMartin <kyle@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: 256763b017 ("perf trace beauty mmap: Add more conditional defines")
Link: http://lkml.kernel.org/r/1481831814-23683-1-git-send-email-jolsa@kernel.org
[ Changed the Fixme cset to the one removing the conditional switch case for MAP_32BIT ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Dynamic label support: To date namespace label support has been
limited to disambiguating cases where PMEM (direct load/store) and BLK
(mmio aperture) accessed-capacity alias on the same DIMM. Since 4.9 added
support for multiple namespaces per PMEM-region there is value to
support namespace labels even in the non-aliasing case. The presence of
a valid namespace index block force-enables label support when the
kernel would otherwise rely on region boundaries, and permits the region
to be sub-divided.
* Handle media errors in namespace metadata: Complement the error
handling for media errors in namespace data areas with support for
clearing errors on writes, and downgrading potential machine-check
exceptions to simple i/o errors on read.
* Device-DAX region attributes: Add 'align', 'id', and 'size' as
attributes for device-dax regions. In particular this enables userspace
tooling to generically size memory mapping and i/o operations. Prevent
userspace from growing assumptions / dependencies about the parent
device topology for a dax region. A libnvdimm namespace may not always
be the parent device of a dax region.
* Various cleanups and small fixes.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYVxRcAAoJEB7SkWpmfYgCSnoQAKNs7IJRtGqKFCkMB5VDAW79
Lifi35Hqm+mfFaLMIyRoKJMnhXKQBsthfcHZIy5t63kDWEV/tJ+riZ+m2Ibz4GPE
AUn07jxge2q9v0RwMylqpZ/EHyaK/xD1z0+SdIwVF2VaEDoVdnmu2WEqjuir/lfM
t70B/YoWLg6CcHeTzaV5vdxlRKyG4pzOV3UzPlYLROr3wFJBHD88j/4zcGy3F1ue
DpzGThhtEQXZUZCJLp54E0ch7V2cz/DNUDYwsjbCZrdYYmUJFqjZQxNyftn28aEJ
HI/BERXLhm2HF2qRqC+0C1lJmUylYk4wXUGco+b/u1GYN59eFe/JT3dJgxEHFKL2
nVOUmR8XsRE6gkFWQGcFesRjjI1Rqz2Wgql7oqkSL9grGOwY4rvKX0LhJxgvfuZ0
Itb5h0dMFUriAP0DaaJFMYfANIOx+mqr4RxDMwP5ZADku6+Ga0LXxvcBt06K/mYO
/zLo27MhsuuFy/odXm1A21OjFiH2pM3pSCaytKvDjy7DwzgE7ZzXmixXQ7j8YVp6
OtLYzMjIt8nt4xh0hmav5tb0r9l6mgNqlifacMC9lEM/7SDAiBXXBLuQngJN/j0s
iXllBk0pYVWibf3VcD6oY+qKdmBWvXxPjgj1lHE6j9A7Gw2jwIt/W2NWzV94nKmC
f6d+faHiRokqyVhdgfx3
=YWfP
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"The libnvdimm pull request is relatively small this time around due to
some development topics being deferred to 4.11.
As for this pull request the bulk of it has been in -next for several
releases leading to one late fix being added (commit 868f036fee
("libnvdimm: fix mishandled nvdimm_clear_poison() return value")). It
has received a build success notification from the 0day-kbuild robot
and passes the latest libnvdimm unit tests.
Summary:
- Dynamic label support: To date namespace label support has been
limited to disambiguating cases where PMEM (direct load/store) and
BLK (mmio aperture) accessed-capacity alias on the same DIMM. Since
4.9 added support for multiple namespaces per PMEM-region there is
value to support namespace labels even in the non-aliasing case.
The presence of a valid namespace index block force-enables label
support when the kernel would otherwise rely on region boundaries,
and permits the region to be sub-divided.
- Handle media errors in namespace metadata: Complement the error
handling for media errors in namespace data areas with support for
clearing errors on writes, and downgrading potential machine-check
exceptions to simple i/o errors on read.
- Device-DAX region attributes: Add 'align', 'id', and 'size' as
attributes for device-dax regions. In particular this enables
userspace tooling to generically size memory mapping and i/o
operations. Prevent userspace from growing assumptions /
dependencies about the parent device topology for a dax region. A
libnvdimm namespace may not always be the parent device of a dax
region.
- Various cleanups and small fixes"
* tag 'libnvdimm-for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: add region 'id', 'size', and 'align' attributes
libnvdimm: fix mishandled nvdimm_clear_poison() return value
libnvdimm: replace mutex_is_locked() warnings with lockdep_assert_held
libnvdimm, pfn: fix align attribute
libnvdimm, e820: use module_platform_driver
libnvdimm, namespace: use octal for permissions
libnvdimm, namespace: avoid multiple sector calculations
libnvdimm: remove else after return in nsio_rw_bytes()
libnvdimm, namespace: fix the type of name variable
libnvdimm: use consistent naming for request_mem_region()
nvdimm: use the right length of "pmem"
libnvdimm: check and clear poison before writing to pmem
tools/testing/nvdimm: dynamic label support
libnvdimm: allow a platform to force enable label support
libnvdimm: use generic iostat interfaces
Pull networking fixes and cleanups from David Miller:
1) Revert bogus nla_ok() change, from Alexey Dobriyan.
2) Various bpf validator fixes from Daniel Borkmann.
3) Add some necessary SET_NETDEV_DEV() calls to hsis_femac and hip04
drivers, from Dongpo Li.
4) Several ethtool ksettings conversions from Philippe Reynes.
5) Fix bugs in inet port management wrt. soreuseport, from Tom Herbert.
6) XDP support for virtio_net, from John Fastabend.
7) Fix NAT handling within a vrf, from David Ahern.
8) Endianness fixes in dpaa_eth driver, from Claudiu Manoil
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (63 commits)
net: mv643xx_eth: fix build failure
isdn: Constify some function parameters
mlxsw: spectrum: Mark split ports as such
cgroup: Fix CGROUP_BPF config
qed: fix old-style function definition
net: ipv6: check route protocol when deleting routes
r6040: move spinlock in r6040_close as SOFTIRQ-unsafe lock order detected
irda: w83977af_ir: cleanup an indent issue
net: sfc: use new api ethtool_{get|set}_link_ksettings
net: davicom: dm9000: use new api ethtool_{get|set}_link_ksettings
net: cirrus: ep93xx: use new api ethtool_{get|set}_link_ksettings
net: chelsio: cxgb3: use new api ethtool_{get|set}_link_ksettings
net: chelsio: cxgb2: use new api ethtool_{get|set}_link_ksettings
bpf: fix mark_reg_unknown_value for spilled regs on map value marking
bpf: fix overflow in prog accounting
bpf: dynamically allocate digest scratch buffer
gtp: Fix initialization of Flags octet in GTPv1 header
gtp: gtp_check_src_ms_ipv4() always return success
net/x25: use designated initializers
isdn: use designated initializers
...
Pull kbuild updates from Michal Marek:
- prototypes for x86 asm-exported symbols (Adam Borowski) and a warning
about missing CRCs (Nick Piggin)
- asm-exports fix for LTO (Nicolas Pitre)
- thin archives improvements (Nick Piggin)
- linker script fix for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION (Nick
Piggin)
- genksyms support for __builtin_va_list keyword
- misc minor fixes
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
x86/kbuild: enable modversions for symbols exported from asm
kbuild: fix scripts/adjust_autoksyms.sh* for the no modules case
scripts/kallsyms: remove last remnants of --page-offset option
make use of make variable CURDIR instead of calling pwd
kbuild: cmd_export_list: tighten the sed script
kbuild: minor improvement for thin archives build
kbuild: modpost warn if export version crc is missing
kbuild: keep data tables through dead code elimination
kbuild: improve linker compatibility with lib-ksyms.o build
genksyms: Regenerate parser
kbuild/genksyms: handle va_list type
kbuild: thin archives for multi-y targets
kbuild: kallsyms allow 3-pass generation if symbols size has changed
Running ./test_verifier as unprivileged lets 1 out of 98 tests fail:
[...]
#71 unpriv: check that printk is disallowed FAIL
Unexpected error message!
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r1 = r10
2: (07) r1 += -8
3: (b7) r2 = 8
4: (bf) r3 = r1
5: (85) call bpf_trace_printk#6
unknown func bpf_trace_printk#6
[...]
The test case is correct, just that the error outcome changed with
ebb676daa1 ("bpf: Print function name in addition to function id").
Same as with e00c7b216f ("bpf: fix multiple issues in selftest suite
and samples") issue 2), so just fix up the function name.
Fixes: ebb676daa1 ("bpf: Print function name in addition to function id")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 57a09bf0a4 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL
registers") introduced a regression where existing programs stopped
loading due to reaching the verifier's maximum complexity limit,
whereas prior to this commit they were loading just fine; the affected
program has roughly 2k instructions.
What was found is that state pruning couldn't be performed effectively
anymore due to mismatches of the verifier's register state, in particular
in the id tracking. It doesn't mean that 57a09bf0a4 is incorrect per
se, but rather that verifier needs to perform a lot more work for the
same program with regards to involved map lookups.
Since commit 57a09bf0a4 is only about tracking registers with type
PTR_TO_MAP_VALUE_OR_NULL, the id is only needed to follow registers
until they are promoted through pattern matching with a NULL check to
either PTR_TO_MAP_VALUE or UNKNOWN_VALUE type. After that point, the
id becomes irrelevant for the transitioned types.
For UNKNOWN_VALUE, id is already reset to 0 via mark_reg_unknown_value(),
but not so for PTR_TO_MAP_VALUE where id is becoming stale. It's even
transferred further into other types that don't make use of it. Among
others, one example is where UNKNOWN_VALUE is set on function call
return with RET_INTEGER return type.
states_equal() will then fall through the memcmp() on register state;
note that the second memcmp() uses offsetofend(), so the id is part of
that since d2a4dd37f6 ("bpf: fix state equivalence"). But the bisect
pointed already to 57a09bf0a4, where we really reach beyond complexity
limit. What I found was that states_equal() often failed in this
case due to id mismatches in spilled regs with registers in type
PTR_TO_MAP_VALUE. Unlike non-spilled regs, spilled regs just perform
a memcmp() on their reg state and don't have any other optimizations
in place, therefore also id was relevant in this case for making a
pruning decision.
We can safely reset id to 0 as well when converting to PTR_TO_MAP_VALUE.
For the affected program, it resulted in a ~17 fold reduction of
complexity and let the program load fine again. Selftest suite also
runs fine. The only other place where env->id_gen is used currently is
through direct packet access, but for these cases id is long living, thus
a different scenario.
Also, the current logic in mark_map_regs() is not fully correct when
marking NULL branch with UNKNOWN_VALUE. We need to cache the destination
reg's id in any case. Otherwise, once we marked that reg as UNKNOWN_VALUE,
it's id is reset and any subsequent registers that hold the original id
and are of type PTR_TO_MAP_VALUE_OR_NULL won't be marked UNKNOWN_VALUE
anymore, since mark_map_reg() reuses the uncached regs[regno].id that
was just overridden. Note, we don't need to cache it outside of
mark_map_regs(), since it's called once on this_branch and the other
time on other_branch, which are both two independent verifier states.
A test case for this is added here, too.
Fixes: 57a09bf0a4 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Highlights include:
- Support for the kexec_file_load() syscall, which is a prereq for secure and
trusted boot.
- Prevent kernel execution of userspace on P9 Radix (similar to SMEP/PXN).
- Sort the exception tables at build time, to save time at boot, and store
them as relative offsets to save space in the kernel image & memory.
- Allow building the kernel with thin archives, which should allow us to build
an allyesconfig once some other fixes land.
- Build fixes to allow us to correctly rebuild when changing the kernel endian
from big to little or vice versa.
- Plumbing so that we can avoid doing a full mm TLB flush on P9 Radix.
- Initial stack protector support (-fstack-protector).
- Support for dumping the radix (aka. Linux) and hash page tables via debugfs.
- Fix an oops in cxl coredump generation when cxl_get_fd() is used.
- Freescale updates from Scott: "Highlights include 8xx hugepage support,
qbman fixes/cleanup, device tree updates, and some misc cleanup."
- Many and varied fixes and minor enhancements as always.
Thanks to:
Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual,
Anton Blanchard, Balbir Singh, Bartlomiej Zolnierkiewicz, Christophe Jaillet,
Christophe Leroy, Denis Kirjanov, Elimar Riesebieter, Frederic Barrat,
Gautham R. Shenoy, Geliang Tang, Geoff Levand, Jack Miller, Johan Hovold,
Lars-Peter Clausen, Libin, Madhavan Srinivasan, Michael Neuling, Nathan
Fontenot, Naveen N. Rao, Nicholas Piggin, Pan Xinhui, Peter Senna Tschudin,
Rashmica Gupta, Rui Teng, Russell Currey, Scott Wood, Simon Guo, Suraj
Jitindar Singh, Thiago Jung Bauermann, Tobias Klauser, Vaibhav Jain.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYU4YSAAoJEFHr6jzI4aWAC4gQALtIAqqPon0Cd5b/FVVcMbW7
mMqB2b/0FGEl5GoRTzGUDaQqElilm6AEVfHO86C7DFji/a6olneFfw87iz+mtWuZ
JvrNq68ZiSnoeszdUy4MgtXFLb5sTzNMev4skaHfjI9E5CepWBoR0zH4G+kNVnd5
WSgudv8Cq4Px+MEuTOigt3QYjHzZ3cw/XNOOm9c+oGj+PDW4O9UItVI+S1WLoey4
rAB2nRcLMDPuwfRQC9XsF3zEbkv4h1dEXo/EBRuRpcF+0lLTzFw1lv1WE8OxlUmS
kAXbty3dIytBfSbtJT0c0Ps6sfQ4HFhu6ZV2fjnxNTz2KDkBIN7LBYHmBYiqY9oZ
9zvbUWtfiTu5ocfRtTq7rC/Hcj4Kbr9S9F/FvXR0WyDsKgu4xxAovqC3gcn6YjYK
Rr1tcCI4nUzyhVJVmd+OEhUvc5JbFy9aGage+YeOyejfvvSbXIunaxWlPjoDkvim
Vjl+UKU8gw51XFssqY5ZBi/HNlMFKYedLpMFp/fItnLglhj50V0eFWkpDgdSCYom
vo9ifPLZx8n8m8De3H7TV4E0F4gCHcTeqZdu7tW9AAUVM6iLJcDLm3asGmtNh21t
snOHNOJ5QSIno6ezUUg29T6VBjbPh46fdJJSlIZrEe8OzLZ1haGyttf0tD00PQvY
Z2W/m3gxafnOeGgBqvyv
=xOzf
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights include:
- Support for the kexec_file_load() syscall, which is a prereq for
secure and trusted boot.
- Prevent kernel execution of userspace on P9 Radix (similar to
SMEP/PXN).
- Sort the exception tables at build time, to save time at boot, and
store them as relative offsets to save space in the kernel image &
memory.
- Allow building the kernel with thin archives, which should allow us
to build an allyesconfig once some other fixes land.
- Build fixes to allow us to correctly rebuild when changing the
kernel endian from big to little or vice versa.
- Plumbing so that we can avoid doing a full mm TLB flush on P9
Radix.
- Initial stack protector support (-fstack-protector).
- Support for dumping the radix (aka. Linux) and hash page tables via
debugfs.
- Fix an oops in cxl coredump generation when cxl_get_fd() is used.
- Freescale updates from Scott: "Highlights include 8xx hugepage
support, qbman fixes/cleanup, device tree updates, and some misc
cleanup."
- Many and varied fixes and minor enhancements as always.
Thanks to:
Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Anshuman
Khandual, Anton Blanchard, Balbir Singh, Bartlomiej Zolnierkiewicz,
Christophe Jaillet, Christophe Leroy, Denis Kirjanov, Elimar
Riesebieter, Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff
Levand, Jack Miller, Johan Hovold, Lars-Peter Clausen, Libin,
Madhavan Srinivasan, Michael Neuling, Nathan Fontenot, Naveen N.
Rao, Nicholas Piggin, Pan Xinhui, Peter Senna Tschudin, Rashmica
Gupta, Rui Teng, Russell Currey, Scott Wood, Simon Guo, Suraj
Jitindar Singh, Thiago Jung Bauermann, Tobias Klauser, Vaibhav Jain"
[ And thanks to Michael, who took time off from a new baby to get this
pull request done. - Linus ]
* tag 'powerpc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (174 commits)
powerpc/fsl/dts: add FMan node for t1042d4rdb
powerpc/fsl/dts: add sg_2500_aqr105_phy4 alias on t1024rdb
powerpc/fsl/dts: add QMan and BMan nodes on t1024
powerpc/fsl/dts: add QMan and BMan nodes on t1023
soc/fsl/qman: test: use DEFINE_SPINLOCK()
powerpc/fsl-lbc: use DEFINE_SPINLOCK()
powerpc/8xx: Implement support of hugepages
powerpc: get hugetlbpage handling more generic
powerpc: port 64 bits pgtable_cache to 32 bits
powerpc/boot: Request no dynamic linker for boot wrapper
soc/fsl/bman: Use resource_size instead of computation
soc/fsl/qe: use builtin_platform_driver
powerpc/fsl_pmc: use builtin_platform_driver
powerpc/83xx/suspend: use builtin_platform_driver
powerpc/ftrace: Fix the comments for ftrace_modify_code
powerpc/perf: macros for power9 format encoding
powerpc/perf: power9 raw event format encoding
powerpc/perf: update attribute_group data structure
powerpc/perf: factor out the event format field
powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown
...