Commit Graph

4192 Commits

Author SHA1 Message Date
YueHaibing ff585c5b9a tracing: Make two symbols static
Fix sparse warnings:

kernel/trace/trace.c:6927:24: warning:
 symbol 'get_tracing_log_err' was not declared. Should it be static?
kernel/trace/trace.c:8196:15: warning:
 symbol 'trace_instance_dir' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20190614153210.24424-1-yuehaibing@huawei.com

Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-14 16:49:26 -04:00
Vasily Gorbik cbdaeaf050 tracing: avoid build warning with HAVE_NOP_MCOUNT
Selecting HAVE_NOP_MCOUNT enables -mnop-mcount (if gcc supports it)
and sets CC_USING_NOP_MCOUNT. Reuse __is_defined (which is suitable for
testing CC_USING_* defines) to avoid conditional compilation and fix
the following gcc 9 warning on s390:

kernel/trace/ftrace.c:2514:1: warning: ‘ftrace_code_disable’ defined
but not used [-Wunused-function]

Link: http://lkml.kernel.org/r/patch.git-1a82d13f33ac.your-ad-here.call-01559732716-ext-6629@work.hours

Fixes: 2f4df0017b ("tracing: Add -mcount-nop option support")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-14 16:34:57 -04:00
Eiichi Tsukata becf33f694 tracing: Fix out-of-range read in trace_stack_print()
Puts range check before dereferencing the pointer.

Reproducer:

  # echo stacktrace > trace_options
  # echo 1 > events/enable
  # cat trace > /dev/null

KASAN report:

  ==================================================================
  BUG: KASAN: use-after-free in trace_stack_print+0x26b/0x2c0
  Read of size 8 at addr ffff888069d20000 by task cat/1953

  CPU: 0 PID: 1953 Comm: cat Not tainted 5.2.0-rc3+ #5
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
  Call Trace:
   dump_stack+0x8a/0xce
   print_address_description+0x60/0x224
   ? trace_stack_print+0x26b/0x2c0
   ? trace_stack_print+0x26b/0x2c0
   __kasan_report.cold+0x1a/0x3e
   ? trace_stack_print+0x26b/0x2c0
   kasan_report+0xe/0x20
   trace_stack_print+0x26b/0x2c0
   print_trace_line+0x6ea/0x14d0
   ? tracing_buffers_read+0x700/0x700
   ? trace_find_next_entry_inc+0x158/0x1d0
   s_show+0xea/0x310
   seq_read+0xaa7/0x10e0
   ? seq_escape+0x230/0x230
   __vfs_read+0x7c/0x100
   vfs_read+0x16c/0x3a0
   ksys_read+0x121/0x240
   ? kernel_write+0x110/0x110
   ? perf_trace_sys_enter+0x8a0/0x8a0
   ? syscall_slow_exit_work+0xa9/0x410
   do_syscall_64+0xb7/0x390
   ? prepare_exit_to_usermode+0x165/0x200
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f867681f910
  Code: b6 fe ff ff 48 8d 3d 0f be 08 00 48 83 ec 08 e8 06 db 01 00 66 0f 1f 44 00 00 83 3d f9 2d 2c 00 00 75 10 b8 00 00 00 00 04
  RSP: 002b:00007ffdabf23488 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
  RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f867681f910
  RDX: 0000000000020000 RSI: 00007f8676cde000 RDI: 0000000000000003
  RBP: 00007f8676cde000 R08: ffffffffffffffff R09: 0000000000000000
  R10: 0000000000000871 R11: 0000000000000246 R12: 00007f8676cde000
  R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000000ec0

  Allocated by task 1214:
   save_stack+0x1b/0x80
   __kasan_kmalloc.constprop.0+0xc2/0xd0
   kmem_cache_alloc+0xaf/0x1a0
   getname_flags+0xd2/0x5b0
   do_sys_open+0x277/0x5a0
   do_syscall_64+0xb7/0x390
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  Freed by task 1214:
   save_stack+0x1b/0x80
   __kasan_slab_free+0x12c/0x170
   kmem_cache_free+0x8a/0x1c0
   putname+0xe1/0x120
   do_sys_open+0x2c5/0x5a0
   do_syscall_64+0xb7/0x390
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  The buggy address belongs to the object at ffff888069d20000
   which belongs to the cache names_cache of size 4096
  The buggy address is located 0 bytes inside of
   4096-byte region [ffff888069d20000, ffff888069d21000)
  The buggy address belongs to the page:
  page:ffffea0001a74800 refcount:1 mapcount:0 mapping:ffff88806ccd1380 index:0x0 compound_mapcount: 0
  flags: 0x100000000010200(slab|head)
  raw: 0100000000010200 dead000000000100 dead000000000200 ffff88806ccd1380
  raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
  page dumped because: kasan: bad access detected

  Memory state around the buggy address:
   ffff888069d1ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   ffff888069d1ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  >ffff888069d20000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                     ^
   ffff888069d20080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ffff888069d20100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ==================================================================

Link: http://lkml.kernel.org/r/20190610040016.5598-1-devel@etsukata.com

Fixes: 4285f2fcef ("tracing: Remove the ULONG_MAX stack trace hackery")
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-14 16:28:42 -04:00
Greg Kroah-Hartman 3e6f176f30 blktrace: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-block@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-03 15:39:39 +02:00
Greg Kroah-Hartman 6a54cd872f trace: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-03 15:39:39 +02:00
Song Liu 9fd2e48b9a perf/core: Allow non-privileged uprobe for user processes
Currently, non-privileged user could only use uprobe with

    kernel.perf_event_paranoid = -1

However, setting perf_event_paranoid to -1 leaks other users' processes to
non-privileged uprobes.

To introduce proper permission control of uprobes, we are building the
following system:

  A daemon with CAP_SYS_ADMIN is in charge to create uprobes via tracefs;
  Users asks the daemon to create uprobes;
  Then user can attach uprobe only to processes owned by the user.

This patch allows non-privileged user to attach uprobe to processes owned
by the user.

The following example shows how to use uprobe with non-privileged user.
This is based on Brendan's blog post [1]

1. Create uprobe with root:

  sudo perf probe -x 'readline%return +0($retval):string'

2. Then non-root user can use the uprobe as:

  perf record -vvv -e probe_bash:readline__return -p <pid> sleep 20
  perf script

[1] http://www.brendangregg.com/blog/2015-06-28/linux-ftrace-uprobe.html

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <kernel-team@fb.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190507161545.788381-1-songliubraving@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-03 11:58:18 +02:00
Sebastian Andrzej Siewior 3bd3706251 sched/core: Provide a pointer to the valid CPU mask
In commit:

  4b53a3412d ("sched/core: Remove the tsk_nr_cpus_allowed() wrapper")

the tsk_nr_cpus_allowed() wrapper was removed. There was not
much difference in !RT but in RT we used this to implement
migrate_disable(). Within a migrate_disable() section the CPU mask is
restricted to single CPU while the "normal" CPU mask remains untouched.

As an alternative implementation Ingo suggested to use:

	struct task_struct {
		const cpumask_t		*cpus_ptr;
		cpumask_t		cpus_mask;
        };
with
	t->cpus_ptr = &t->cpus_mask;

In -RT we then can switch the cpus_ptr to:

	t->cpus_ptr = &cpumask_of(task_cpu(p));

in a migration disabled region. The rules are simple:

 - Code that 'uses' ->cpus_allowed would use the pointer.
 - Code that 'modifies' ->cpus_allowed would use the direct mask.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190423142636.14347-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-03 11:49:37 +02:00
David S. Miller 0462eaacee Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2019-05-31

The following pull-request contains BPF updates for your *net-next* tree.

Lots of exciting new features in the first PR of this developement cycle!
The main changes are:

1) misc verifier improvements, from Alexei.

2) bpftool can now convert btf to valid C, from Andrii.

3) verifier can insert explicit ZEXT insn when requested by 32-bit JITs.
   This feature greatly improves BPF speed on 32-bit architectures. From Jiong.

4) cgroups will now auto-detach bpf programs. This fixes issue of thousands
   bpf programs got stuck in dying cgroups. From Roman.

5) new bpf_send_signal() helper, from Yonghong.

6) cgroup inet skb programs can signal CN to the stack, from Lawrence.

7) miscellaneous cleanups, from many developers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-31 21:21:18 -07:00
Linus Torvalds 9e82b4a91d This fixes a memory leak from the error path in the event filter logic.
-----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXO303xQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qlb7AP4179SWPy9RFxvxyjTmpPFPL9oR0Q26
 sOTyIBN99MUTsgEA0FNWz7/FWtFDa1wbh0tEVreaTQlKEeoIYF96dkN0iwE=
 =BJgg
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "This fixes a memory leak from the error path in the event filter
  logic"

* tag 'trace-v5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Avoid memory leak in predicate_parse()
2019-05-29 11:26:40 -07:00
Stanislav Fomichev e672db03ab bpf: tracing: properly use bpf_prog_array api
Now that we don't have __rcu markers on the bpf_prog_array helpers,
let's use proper rcu_dereference_protected to obtain array pointer
under mutex.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-29 15:17:35 +02:00
Tomas Bortoli dfb4a6f219 tracing: Avoid memory leak in predicate_parse()
In case of errors, predicate_parse() goes to the out_free label
to free memory and to return an error code.

However, predicate_parse() does not free the predicates of the
temporary prog_stack array, thence leaking them.

Link: http://lkml.kernel.org/r/20190528154338.29976-1-tomasbortoli@gmail.com

Cc: stable@vger.kernel.org
Fixes: 80765597bc ("tracing: Rewrite filter logic to be simpler and faster")
Reported-by: syzbot+6b8e0fb820e570c59e19@syzkaller.appspotmail.com
Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com>
[ Added protection around freeing prog_stack[i].pred ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-28 16:27:58 -04:00
Steven Rostedt (VMware) 86b3de60a0 ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS
Commit c19fa94a8f ("Add HAVE_64BIT_ALIGNED_ACCESS") added the config for
architectures that required 64bit aligned access for all 64bit words. As
the ftrace ring buffer stores data on 4 byte alignment, this config option
was used to force it to store data on 8 byte alignment to make sure the data
being stored and written directly into the ring buffer was 8 byte aligned as
it would cause issues trying to write an 8 byte word on a 4 not 8 byte
aligned memory location.

But with the removal of the metag architecture, which was the only
architecture to use this, there is no architecture supported by Linux that
requires 8 byte aligne access for all 8 byte words (4 byte alignment is good
enough). Removing this config can simplify the code a bit.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-28 09:36:19 -04:00
Yonghong Song e1afb70252 bpf: check signal validity in nmi for bpf_send_signal() helper
Commit 8b401f9ed2 ("bpf: implement bpf_send_signal() helper")
introduced bpf_send_signal() helper. If the context is nmi,
the sending signal work needs to be deferred to irq_work.
If the signal is invalid, the error will appear in irq_work
and it won't be propagated to user.

This patch did an early check in the helper itself to notify
user invalid signal, as suggested by Daniel.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-28 10:51:33 +02:00
Linus Torvalds c5b440951a Make the GCC 9 warning for sub struct memset go away.
GCC 9 now warns about calling memset() on partial structures when it
 goes across multiple fields. This adds a helper for the place in
 tracing that does this type of clearing of a structure.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXOrlfhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qoDhAP4mogBm0JjJ1LWr8RX2/X7qFm0x1zLz
 5Mk0QKfeRP3MYgEAl2mV/HeFp7aMxEY2CKy0LslmaXPhamPx1r0LlfMgIws=
 =drP3
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing warning fix from Steven Rostedt:
 "Make the GCC 9 warning for sub struct memset go away.

  GCC 9 now warns about calling memset() on partial structures when it
  goes across multiple fields. This adds a helper for the place in
  tracing that does this type of clearing of a structure"

* tag 'trace-v5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Silence GCC 9 array bounds warning
2019-05-26 13:49:40 -07:00
Cheng Jian a124692b69 ftrace: Enable trampoline when rec count returns back to one
Custom trampolines can only be enabled if there is only a single ops
attached to it. If there's only a single callback registered to a function,
and the ops has a trampoline registered for it, then we can call the
trampoline directly. This is very useful for improving the performance of
ftrace and livepatch.

If more than one callback is registered to a function, the general
trampoline is used, and the custom trampoline is not restored back to the
direct call even if all the other callbacks were unregistered and we are
back to one callback for the function.

To fix this, set FTRACE_FL_TRAMP flag if rec count is decremented
to one, and the ops that left has a trampoline.

Testing After this patch :

insmod livepatch_unshare_files.ko
cat /sys/kernel/debug/tracing/enabled_functions

	unshare_files (1) R I	tramp: 0xffffffffc0000000(klp_ftrace_handler+0x0/0xa0) ->ftrace_ops_assist_func+0x0/0xf0

echo unshare_files > /sys/kernel/debug/tracing/set_ftrace_filter
echo function > /sys/kernel/debug/tracing/current_tracer
cat /sys/kernel/debug/tracing/enabled_functions

	unshare_files (2) R I ->ftrace_ops_list_func+0x0/0x150

echo nop > /sys/kernel/debug/tracing/current_tracer
cat /sys/kernel/debug/tracing/enabled_functions

	unshare_files (1) R I	tramp: 0xffffffffc0000000(klp_ftrace_handler+0x0/0xa0) ->ftrace_ops_assist_func+0x0/0xf0

Link: http://lkml.kernel.org/r/1556969979-111047-1-git-send-email-cj.chengjian@huawei.com

Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-25 23:04:43 -04:00
Steven Rostedt (VMware) b6399cc789 tracing/kprobe: Do not run kprobe boot tests if kprobe_event is on cmdline
When having kprobe trace event start up tests enabled and adding a
kprobe_event on the kernel command line, it produced the following:

 trace_kprobe: Testing kprobe tracing:
 WARNING: CPU: 5 PID: 1 at kernel/trace/trace_kprobe.c:1724 kprobe_trace_self_tests_init+0x32d/0x36b
 Modules linked in:
 CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc1-test+ #249
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
 RIP: 0010:kprobe_trace_self_tests_init+0x32d/0x36b
 Code: b7 e8 4f 8d a2 fe 85 c0 74 10 0f 0b 48 c7 c7 c8 1b 0d b7 ff c3 e8 19 af 99 fe 48 c7 c7 40 93 27 b7 e8 7f 1a a5 fe 85 c0 74 10 <0f> 0b 48 c7 c7 f8 1b 0d b7 ff c3 e8 f9 ae
9 a0 fe 85
 RSP: 0018:ffffb36e40653e08 EFLAGS: 00010286
 RAX: 00000000fffffff0 RBX: 0000000000000000 RCX: ffffb36e40653d5c
 RDX: 0000000000000000 RSI: ffffffffb72776e0 RDI: 0000000000000246
 RBP: ffff98414fe58ff8 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: ffff98415d8aa940 R12: 0000000000000000
 R13: ffffffffb737c1b0 R14: 0000000000000000 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff98415ea80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f959ce741b8 CR3: 000000011a210002 CR4: 00000000001606e0
 Call Trace:
  ? init_kprobe_trace+0x19e/0x19e
  ? do_early_param+0x8e/0x8e
  do_one_initcall+0x6f/0x2b4
  ? do_early_param+0x8e/0x8e
  kernel_init_freeable+0x21d/0x2c6
  ? rest_init+0x146/0x146
  kernel_init+0xa/0x10a
  ret_from_fork+0x3a/0x50
 ---[ end trace 488430c083a4c956 ]---

As with the trace events, if a trace event is set on the kernel command
line, the trace events start up tests are suspended. The kprobe start up
tests should do the same when a kprobe is enabled on the kernel command
line.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-25 23:04:43 -04:00
Steven Rostedt (VMware) b3015fe41d tracing: Make a separate config for trace event self tests
The trace event self tests enable loop through *all* events, enables each
one, one at a time, runs some code to trigger various events (not
necessarily the same events), and checks if anything went wrong. The issue
is that trace events are usually the least likely start up test to cause a
problem, but they take the longest to run (because there are so many
events). When one of the other tests trigger a bug, the trace event start up
tests causes the bisect to take much longer, because it takes 10s of seconds
to get through the trace event tests.

By making them a separate config (even though they are enabled by default if
start up tests are set), it is possible to turn them off and still run the
other tracing start up tests much quicker.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-25 23:04:43 -04:00
Masami Hiramatsu 970988e19e tracing/kprobe: Add kprobe_event= boot parameter
Add kprobe_event= boot parameter to define kprobe events
at boot time.
The definition syntax is similar to tracefs/kprobe_events
interface, but use ',' and ';' instead of ' ' and '\n'
respectively. e.g.

  kprobe_event=p,vfs_read,$arg1,$arg2

This puts a probe on vfs_read with argument1 and 2, and
enable the new event.

Link: http://lkml.kernel.org/r/155851395498.15728.830529496248543583.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-25 23:04:43 -04:00
Masami Hiramatsu 539b75b2b9 tracing/kprobe: Cast user-space address correctly
Cast user-space address correctly to pass to probe_user_read().

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-25 23:04:43 -04:00
Matthias Kaehlcke f08367b364 tracing: Use correct function name in trace_filter_add_remove_task() comment
The comment of trace_filter_add_remove_task() refers to the function as
'trace_pid_filter_add_remove_task', use the correct name.

Link: http://lkml.kernel.org/r/20190523192628.134406-1-mka@chromium.org

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-25 23:04:43 -04:00
Masami Hiramatsu e65f7ae7f4 tracing/probe: Support user-space dereference
Support user-space dereference syntax for probe event arguments
to dereference the data-structure or array in user-space.

The syntax is just adding 'u' before an offset value.

 +|-u<OFFSET>(<FETCHARG>)

e.g. +u8(%ax), +u0(+0(%si))

For example, if you probe do_sched_setscheduler(pid, policy,
param) and record param->sched_priority, you can add new
probe as below;

 p do_sched_setscheduler priority=+u0($arg3)

Note that kprobe event provides this and it doesn't change the
dereference method automatically because we do not know whether
the given address is in userspace or kernel on some archs.

So as same as "ustring", this is an option for user, who has to
carefully choose the dereference method.

Link: http://lkml.kernel.org/r/155789872187.26965.4468456816590888687.stgit@devnote2

Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-25 23:04:42 -04:00
Masami Hiramatsu 88903c4643 tracing/probe: Add ustring type for user-space string
Add "ustring" type for fetching user-space string from kprobe event.
User can specify ustring type at uprobe event, and it is same as
"string" for uprobe.

Note that probe-event provides this option but it doesn't choose the
correct type automatically since we have not way to decide the address
is in user-space or not on some arch (and on some other arch, you can
fetch the string by "string" type). So user must carefully check the
target code (e.g. if you see __user on the target variable) and
use this new type.

Link: http://lkml.kernel.org/r/155789871009.26965.14167558859557329331.stgit@devnote2

Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-25 23:04:42 -04:00
Steven Rostedt (VMware) 7375dca164 ftrace: Make enable and update parameters bool when applicable
The code modification functions have "enable" and "update" variables that
are sometimes "int" but used as "bool". Remove the ambiguity and make them
"bool" when they are only used for true or false values.

Link: http://lkml.kernel.org/r/e1429923d9eda92a3cf5ee9e33c7eacce539781d.1558115654.git.naveen.n.rao@linux.vnet.ibm.com

Reported-by: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-25 23:04:42 -04:00
Miguel Ojeda 0c97bf863e tracing: Silence GCC 9 array bounds warning
Starting with GCC 9, -Warray-bounds detects cases when memset is called
starting on a member of a struct but the size to be cleared ends up
writing over further members.

Such a call happens in the trace code to clear, at once, all members
after and including `seq` on struct trace_iterator:

    In function 'memset',
        inlined from 'ftrace_dump' at kernel/trace/trace.c:8914:3:
    ./include/linux/string.h:344:9: warning: '__builtin_memset' offset
    [8505, 8560] from the object at 'iter' is out of the bounds of
    referenced subobject 'seq' with type 'struct trace_seq' at offset
    4368 [-Warray-bounds]
      344 |  return __builtin_memset(p, c, size);
          |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

In order to avoid GCC complaining about it, we compute the address
ourselves by adding the offsetof distance instead of referring
directly to the member.

Since there are two places doing this clear (trace.c and trace_kdb.c),
take the chance to move the workaround into a single place in
the internal header.

Link: http://lkml.kernel.org/r/20190523124535.GA12931@gmail.com

Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
[ Removed unnecessary parenthesis around "iter" ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-25 23:04:30 -04:00
Linus Torvalds a2c48d98fc Tom Zanussi sent me some small fixes and cleanups to the histogram
code and I forgot to incorporate them.
 
 I also added a small clean up patch that was sent to me a while ago
 and I just noticed it.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXOixqRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qlPsAQCmNzno+SJMXLLIojZ80pqs9PqLqIrW
 iBopFNihRBAkxgEAle8Pvr53S4R/Hjn6j++kxBm/uWaEfICOzyqcSyqxlQA=
 =Sin+
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Tom Zanussi sent me some small fixes and cleanups to the histogram
  code and I forgot to incorporate them.

  I also added a small clean up patch that was sent to me a while ago
  and I just noticed it"

* tag 'trace-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  kernel/trace/trace.h: Remove duplicate header of trace_seq.h
  tracing: Add a check_val() check before updating cond_snapshot() track_val
  tracing: Check keys for variable references in expressions too
  tracing: Prevent hist_field_var_ref() from accessing NULL tracing_map_elts
2019-05-25 10:08:14 -07:00
Yonghong Song 8b401f9ed2 bpf: implement bpf_send_signal() helper
This patch tries to solve the following specific use case.

Currently, bpf program can already collect stack traces
through kernel function get_perf_callchain()
when certain events happens (e.g., cache miss counter or
cpu clock counter overflows). But such stack traces are
not enough for jitted programs, e.g., hhvm (jited php).
To get real stack trace, jit engine internal data structures
need to be traversed in order to get the real user functions.

bpf program itself may not be the best place to traverse
the jit engine as the traversing logic could be complex and
it is not a stable interface either.

Instead, hhvm implements a signal handler,
e.g. for SIGALARM, and a set of program locations which
it can dump stack traces. When it receives a signal, it will
dump the stack in next such program location.

Such a mechanism can be implemented in the following way:
  . a perf ring buffer is created between bpf program
    and tracing app.
  . once a particular event happens, bpf program writes
    to the ring buffer and the tracing app gets notified.
  . the tracing app sends a signal SIGALARM to the hhvm.

But this method could have large delays and causing profiling
results skewed.

This patch implements bpf_send_signal() helper to send
a signal to hhvm in real time, resulting in intended stack traces.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-24 23:26:47 +02:00
Jagadeesh Pagadala 4eebe38a37 kernel/trace/trace.h: Remove duplicate header of trace_seq.h
Remove duplicate header which is included twice.

Link: http://lkml.kernel.org/r/1553725186-41442-1-git-send-email-jagdsh.linux@gmail.com

Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Jagadeesh Pagadala <jagdsh.linux@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-22 15:37:41 -04:00
Linus Torvalds 2c1212de6f SPDX update for 5.2-rc2, round 1
Here are series of patches that add SPDX tags to different kernel files,
 based on two different things:
   - SPDX entries are added to a bunch of files that we missed a year ago
     that do not have any license information at all.
 
     These were either missed because the tool saw the MODULE_LICENSE()
     tag, or some EXPORT_SYMBOL tags, and got confused and thought the
     file had a real license, or the files have been added since the last
     big sweep, or they were Makefile/Kconfig files, which we didn't
     touch last time.
 
   - Add GPL-2.0-only or GPL-2.0-or-later tags to files where our scan
     tools can determine the license text in the file itself.  Where this
     happens, the license text is removed, in order to cut down on the
     700+ different ways we have in the kernel today, in a quest to get
     rid of all of these.
 
 These patches have been out for review on the linux-spdx@vger mailing
 list, and while they were created by automatic tools, they were
 hand-verified by a bunch of different people, all whom names are on the
 patches are reviewers.
 
 The reason for these "large" patches is if we were to continue to
 progress at the current rate of change in the kernel, adding license
 tags to individual files in different subsystems, we would be finished
 in about 10 years at the earliest.
 
 There will be more series of these types of patches coming over the next
 few weeks as the tools and reviewers crunch through the more "odd"
 variants of how to say "GPLv2" that developers have come up with over
 the years, combined with other fun oddities (GPL + a BSD disclaimer?)
 that are being unearthed, with the goal for the whole kernel to be
 cleaned up.
 
 These diffstats are not small, 3840 files are touched, over 10k lines
 removed in just 24 patches.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXOP8uw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynmGQCgy3evqzleuOITDpuWaxewFdHqiJYAnA7KRw4H
 1KwtfRnMtG6dk/XaS7H7
 =O9lH
 -----END PGP SIGNATURE-----

Merge tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull SPDX update from Greg KH:
 "Here is a series of patches that add SPDX tags to different kernel
  files, based on two different things:

   - SPDX entries are added to a bunch of files that we missed a year
     ago that do not have any license information at all.

     These were either missed because the tool saw the MODULE_LICENSE()
     tag, or some EXPORT_SYMBOL tags, and got confused and thought the
     file had a real license, or the files have been added since the
     last big sweep, or they were Makefile/Kconfig files, which we
     didn't touch last time.

   - Add GPL-2.0-only or GPL-2.0-or-later tags to files where our scan
     tools can determine the license text in the file itself. Where this
     happens, the license text is removed, in order to cut down on the
     700+ different ways we have in the kernel today, in a quest to get
     rid of all of these.

  These patches have been out for review on the linux-spdx@vger mailing
  list, and while they were created by automatic tools, they were
  hand-verified by a bunch of different people, all whom names are on
  the patches are reviewers.

  The reason for these "large" patches is if we were to continue to
  progress at the current rate of change in the kernel, adding license
  tags to individual files in different subsystems, we would be finished
  in about 10 years at the earliest.

  There will be more series of these types of patches coming over the
  next few weeks as the tools and reviewers crunch through the more
  "odd" variants of how to say "GPLv2" that developers have come up with
  over the years, combined with other fun oddities (GPL + a BSD
  disclaimer?) that are being unearthed, with the goal for the whole
  kernel to be cleaned up.

  These diffstats are not small, 3840 files are touched, over 10k lines
  removed in just 24 patches"

* tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (24 commits)
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 24
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 23
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 22
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 21
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 20
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 19
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 17
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 15
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 14
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 12
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 11
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 10
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 9
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 7
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 5
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 4
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 3
  ...
2019-05-21 12:33:38 -07:00
Tom Zanussi 9b2ca371b1 tracing: Add a check_val() check before updating cond_snapshot() track_val
Without this check a snapshot is taken whenever a bucket's max is hit,
rather than only when the global max is hit, as it should be.

Before:

  In this example, we do a first run of the workload (cyclictest),
  examine the output, note the max ('triggering value') (347), then do
  a second run and note the max again.

  In this case, the max in the second run (39) is below the max in the
  first run, but since we haven't cleared the histogram, the first max
  is still in the histogram and is higher than any other max, so it
  should still be the max for the snapshot.  It isn't however - the
  value should still be 347 after the second run.

  # echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
  # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_prio,next_comm,prev_pid,prev_prio,prev_comm):onmax($wakeup_lat).snapshot() if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

  # cyclictest -p 80 -n -s -t 2 -D 2

  # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

  { next_pid:       2143 } hitcount:        199
    max:         44  next_prio:        120  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/4

  { next_pid:       2145 } hitcount:       1325
    max:         38  next_prio:         19  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/2

  { next_pid:       2144 } hitcount:       1982
    max:        347  next_prio:         19  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/6

  Snapshot taken (see tracing/snapshot).  Details:
      triggering value { onmax($wakeup_lat) }:        347
      triggered by event with key: { next_pid:       2144 }

  # cyclictest -p 80 -n -s -t 2 -D 2

  # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

  { next_pid:       2143 } hitcount:        199
    max:         44  next_prio:        120  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/4

  { next_pid:       2148 } hitcount:        199
    max:         16  next_prio:        120  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/1

  { next_pid:       2145 } hitcount:       1325
    max:         38  next_prio:         19  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/2

  { next_pid:       2150 } hitcount:       1326
    max:         39  next_prio:         19  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/4

  { next_pid:       2144 } hitcount:       1982
    max:        347  next_prio:         19  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/6

  { next_pid:       2149 } hitcount:       1983
    max:        130  next_prio:         19  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/0

  Snapshot taken (see tracing/snapshot).  Details:
    triggering value { onmax($wakeup_lat) }:    39
    triggered by event with key: { next_pid:       2150 }

After:

  In this example, we do a first run of the workload (cyclictest),
  examine the output, note the max ('triggering value') (375), then do
  a second run and note the max again.

  In this case, the max in the second run is still 375, the highest in
  any bucket, as it should be.

  # cyclictest -p 80 -n -s -t 2 -D 2

  # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

  { next_pid:       2072 } hitcount:        200
    max:         28  next_prio:        120  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/5

  { next_pid:       2074 } hitcount:       1323
    max:        375  next_prio:         19  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/2

  { next_pid:       2073 } hitcount:       1980
    max:        153  next_prio:         19  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/6

  Snapshot taken (see tracing/snapshot).  Details:
    triggering value { onmax($wakeup_lat) }:        375
    triggered by event with key: { next_pid:       2074 }

  # cyclictest -p 80 -n -s -t 2 -D 2

  # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

  { next_pid:       2101 } hitcount:        199
    max:         49  next_prio:        120  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/6

  { next_pid:       2072 } hitcount:        200
    max:         28  next_prio:        120  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/5

  { next_pid:       2074 } hitcount:       1323
    max:        375  next_prio:         19  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/2

  { next_pid:       2103 } hitcount:       1325
    max:         74  next_prio:         19  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/4

  { next_pid:       2073 } hitcount:       1980
    max:        153  next_prio:         19  next_comm: cyclictest
    prev_pid:          0  prev_prio:        120  prev_comm: swapper/6

  { next_pid:       2102 } hitcount:       1981
    max:         84  next_prio:         19  next_comm: cyclictest
    prev_pid:         12  prev_prio:        120  prev_comm: kworker/0:1

  Snapshot taken (see tracing/snapshot).  Details:
    triggering value { onmax($wakeup_lat) }:        375
    triggered by event with key: { next_pid:       2074 }

Link: http://lkml.kernel.org/r/95958351329f129c07504b4d1769c47a97b70d65.1555597045.git.tom.zanussi@linux.intel.com

Cc: stable@vger.kernel.org
Fixes: a3785b7eca ("tracing: Add hist trigger snapshot() action")
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-21 12:48:07 -04:00
Tom Zanussi c8d94a1878 tracing: Check keys for variable references in expressions too
There's an existing check for variable references in keys, but it
doesn't go far enough.  It checks whether a key field is a variable
reference but doesn't check whether it's an expression containing
variable references, which can cause the same problems for callers.

Use the existing field_has_hist_vars() function rather than a direct
top-level flag check to catch all possible variable references.

Link: http://lkml.kernel.org/r/e8c3d3d53db5ca90ceea5a46e5413103a6902fc7.1555597045.git.tom.zanussi@linux.intel.com

Cc: stable@vger.kernel.org
Fixes: 067fe038e7 ("tracing: Add variable reference handling to hist triggers")
Reported-by: Vincent Bernat <vincent@bernat.ch>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-21 12:46:32 -04:00
Tom Zanussi 55267c88c0 tracing: Prevent hist_field_var_ref() from accessing NULL tracing_map_elts
hist_field_var_ref() is an implementation of hist_field_fn_t(), which
can be called with a null tracing_map_elt elt param when assembling a
key in event_hist_trigger().

In the case of hist_field_var_ref() this doesn't make sense, because a
variable can only be resolved by looking it up using an already
assembled key i.e. a variable can't be used to assemble a key since
the key is required in order to access the variable.

Upper layers should prevent the user from constructing a key using a
variable in the first place, but in case one slips through, it
shouldn't cause a NULL pointer dereference.  Also if one does slip
through, we want to know about it, so emit a one-time warning in that
case.

Link: http://lkml.kernel.org/r/64ec8dc15c14d305295b64cdfcc6b2b9dd14753f.1555597045.git.tom.zanussi@linux.intel.com

Reported-by: Vincent Bernat <vincent@bernat.ch>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-21 12:43:49 -04:00
Thomas Gleixner ec8f24b7fa treewide: Add SPDX license identifier - Makefile/Kconfig
Add SPDX license identifiers to all Make/Kconfig files which:

 - Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:46 +02:00
Linus Torvalds 78e0365184 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:1) Use after free in __dev_map_entry_free(), from Eric Dumazet.

 1) Use after free in __dev_map_entry_free(), from Eric Dumazet.

 2) Fix TCP retransmission timestamps on passive Fast Open, from Yuchung
    Cheng.

 3) Orphan NFC, we'll take the patches directly into my tree. From
    Johannes Berg.

 4) We can't recycle cloned TCP skbs, from Eric Dumazet.

 5) Some flow dissector bpf test fixes, from Stanislav Fomichev.

 6) Fix RCU marking and warnings in rhashtable, from Herbert Xu.

 7) Fix some potential fib6 leaks, from Eric Dumazet.

 8) Fix a _decode_session4 uninitialized memory read bug fix that got
    lost in a merge. From Florian Westphal.

 9) Fix ipv6 source address routing wrt. exception route entries, from
    Wei Wang.

10) The netdev_xmit_more() conversion was not done %100 properly in mlx5
    driver, fix from Tariq Toukan.

11) Clean up botched merge on netfilter kselftest, from Florian
    Westphal.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (74 commits)
  of_net: fix of_get_mac_address retval if compiled without CONFIG_OF
  net: fix kernel-doc warnings for socket.c
  net: Treat sock->sk_drops as an unsigned int when printing
  kselftests: netfilter: fix leftover net/net-next merge conflict
  mlxsw: core: Prevent reading unsupported slave address from SFP EEPROM
  mlxsw: core: Prevent QSFP module initialization for old hardware
  vsock/virtio: Initialize core virtio vsock before registering the driver
  net/mlx5e: Fix possible modify header actions memory leak
  net/mlx5e: Fix no rewrite fields with the same match
  net/mlx5e: Additional check for flow destination comparison
  net/mlx5e: Add missing ethtool driver info for representors
  net/mlx5e: Fix number of vports for ingress ACL configuration
  net/mlx5e: Fix ethtool rxfh commands when CONFIG_MLX5_EN_RXNFC is disabled
  net/mlx5e: Fix wrong xmit_more application
  net/mlx5: Fix peer pf disable hca command
  net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_index
  net/mlx5: Add meaningful return codes to status_to_err function
  net/mlx5: Imply MLXFW in mlx5_core
  Revert "tipc: fix modprobe tipc failed after switch order of device registration"
  vsock/virtio: free packets during the socket release
  ...
2019-05-20 08:21:07 -07:00
David S. Miller c7d5ec26ea Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2019-05-16

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a use after free in __dev_map_entry_free(), from Eric.

2) Several sockmap related bug fixes: a splat in strparser if
   it was never initialized, remove duplicate ingress msg list
   purging which can race, fix msg->sg.size accounting upon
   skb to msg conversion, and last but not least fix a timeout
   bug in tcp_bpf_wait_data(), from John.

3) Fix LRU map to avoid messing with eviction heuristics upon
   syscall lookup, e.g. map walks from user space side will
   then lead to eviction of just recently created entries on
   updates as it would mark all map entries, from Daniel.

4) Don't bail out when libbpf feature probing fails. Also
   various smaller fixes to flow_dissector test, from Stanislav.

5) Fix missing brackets for BTF_INT_OFFSET() in UAPI, from Gary.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-15 18:28:44 -07:00
Linus Torvalds d2d8b14604 The major changes in this tracing update includes:
- Removing of non-DYNAMIC_FTRACE from 32bit x86
 
  - Removing of mcount support from x86
 
  - Emulating a call from int3 on x86_64, fixes live kernel patching
 
  - Consolidated Tracing Error logs file
 
 Minor updates:
 
  - Removal of klp_check_compiler_support()
 
  - kdb ftrace dumping output changes
 
  - Accessing and creating ftrace instances from inside the kernel
 
  - Clean up of #define if macro
 
  - Introduction of TRACE_EVENT_NOP() to disable trace events based on config
    options
 
 And other minor fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXNxMZxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qq4PAP44kP6VbwL8CHyI2A3xuJ6Hwxd+2Z2r
 ip66RtzyJ+2iCgEA2QCuWUlEt2bLpF9a8IQ4N9tWenSeW2i7gunPb+tioQw=
 =RVQo
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The major changes in this tracing update includes:

   - Removal of non-DYNAMIC_FTRACE from 32bit x86

   - Removal of mcount support from x86

   - Emulating a call from int3 on x86_64, fixes live kernel patching

   - Consolidated Tracing Error logs file

  Minor updates:

   - Removal of klp_check_compiler_support()

   - kdb ftrace dumping output changes

   - Accessing and creating ftrace instances from inside the kernel

   - Clean up of #define if macro

   - Introduction of TRACE_EVENT_NOP() to disable trace events based on
     config options

  And other minor fixes and clean ups"

* tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
  x86: Hide the int3_emulate_call/jmp functions from UML
  livepatch: Remove klp_check_compiler_support()
  ftrace/x86: Remove mcount support
  ftrace/x86_32: Remove support for non DYNAMIC_FTRACE
  tracing: Simplify "if" macro code
  tracing: Fix documentation about disabling options using trace_options
  tracing: Replace kzalloc with kcalloc
  tracing: Fix partial reading of trace event's id file
  tracing: Allow RCU to run between postponed startup tests
  tracing: Fix white space issues in parse_pred() function
  tracing: Eliminate const char[] auto variables
  ring-buffer: Fix mispelling of Calculate
  tracing: probeevent: Fix to make the type of $comm string
  tracing: probeevent: Do not accumulate on ret variable
  tracing: uprobes: Re-enable $comm support for uprobe events
  ftrace/x86_64: Emulate call function while updating in breakpoint handler
  x86_64: Allow breakpoints to emulate call instructions
  x86_64: Add gap to int3 to allow for call emulation
  tracing: kdb: Allow ftdump to skip all but the last few entries
  tracing: Add trace_total_entries() / trace_total_entries_cpu()
  ...
2019-05-15 16:05:47 -07:00
Stanislav Fomichev 390e99cfdd bpf: mark bpf_event_notify and bpf_event_init as static
Both of them are not declared in the headers and not used outside
of bpf_trace.c file.

Fixes: a38d1107f9 ("bpf: support raw tracepoints in modules")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-14 01:27:18 +02:00
Linus Torvalds 8148c17b17 This is the bulk of the GPIO changes for the v5.2 kernel cycle:
Core changes:
 - The gpiolib MMIO driver has been enhanced to handle two direction
   registers, i.e. one register to set lines as input and one register
   to set lines as output. It turns out some silicon engineer thinks
   the ability to configure a line as input and output at the same
   time makes sense, this can be debated but includes a lot of analog
   electronics reasoning, and the registers are there and need to
   be handled consistently. Unsurprisingly, we enforce the lines to
   be either inputs or outputs in such schemes.
 - Send in the proper argument value to .set_config() dispatched to
   the pin control subsystem. Nobody used it before, now someone
   does, so fix it to work as expected.
 - The ACPI gpiolib portions can now handle pin bias setting (pull up
   or pull down). This has been in the ACPI spec for years and we
   finally have it properly integrated with Linux GPIOs. It was based
   on an observation from Andy Schevchenko that Thomas Petazzoni's
   changes to the core for biasing the PCA950x GPIO expander actually
   happen to fit hand-in-glove with what the ACPI core needed.
   Such nice synergies happen sometimes.
 
 New drivers:
 - A new driver for the Mellanox BlueField GPIO controller. This is
   using 64bit MMIO registers and can configure lines as inputs
   and outputs at the same time and after improving the MMIO library
   we handle it just fine. Interesting.
 - A new IXP4xx proper gpiochip driver with hierarchical interrupts
   should be coming in from the ARM SoC tree as well.
 
 Driver enhancements:
 - The PCA053x driver handles the CAT9554 GPIO expander.
 - The PCA053x driver handles the NXP PCAL6416 GPIO expander.
 - Wake-up support on PCA053x GPIO lines.
 - OMAP now does a nice asynchronous IRQ handling on wake-ups by
   letting everything wake up on edges, and this makes runtime PM
   work as expected too.
 
 Misc:
 - Several cleanups such as devres fixes.
 - Get rid of some languager comstructs that cause problems when
   compiling with LLVMs clang.
 - Documentation review and update.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJc1olZAAoJEEEQszewGV1zEU4P/RmTf3hG8xmNPS3MDTmR6gAy
 /YJOXjXBf3CD/dmEAyyaNLnUQismrtRNvHSoEGbno7gkU+htzp9UfUJkj6+HIXs2
 RpF+Hi78HzZNDxGWuBLu6OZolpmBtx+sRKOhHk/XfNS45qd1FgXWDuulzsYa9Xsr
 hYMXdtdv9wY/vcc68q1rtKAbzlu5ZNCa3Zj1iNOr/XQt3Nl2BW66hGLgjK4mOvgx
 fJy4rFXuDIMfDvo69U1Opz2b39sfE7XMhfZS/MOgg4yEV9zGRgDoI1tyMcTqGb8Q
 8LQbp5dXkP+3dJQB8tgbu3Vk4WC1Rd/pmIli5sMgsk0HYQ6XegfT6HJKozSmwN9r
 0s8jKlrocWZvdPo1aJwQgtRS56t2rFWcrcRye8bLqxkkW5cYIq9CwkE8USwB31Kv
 PFpoOwRuCtj0gkCxf7WIEcC5NAkYPow3K1KPdk3E0Si6I3pj0NqqlaAD0JAlkC2V
 aPq3xbTuFCAdmcADEt2Z+dUJ7WIs5Y9oQgosMAx+A2AD4K3QDBMu3pZsT6SCu4XZ
 mK0eWJi9/CvOj/s7bA0BEJVxQA+p8KYsNRBOULg/8aAOqGcLnSydQjqrxDTE8YrL
 xmmRG7i7ht0B9CchZuIB5hqdvjbCgvcVa5OnCUDfLxE0GdCx8iJ9y9OrsMXbabYq
 8FcPDo1N38cTYLnLqvKI
 =rhto
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull gpio updates from Linus Walleij:
 "This is the bulk of the GPIO changes for the v5.2 kernel cycle. A bit
  later than usual because I was ironing out my own mistakes. I'm
  holding some stuff back for the next kernel as a result, and this
  should be a healthy and well tested batch.

  Core changes:

   - The gpiolib MMIO driver has been enhanced to handle two direction
     registers, i.e. one register to set lines as input and one register
     to set lines as output. It turns out some silicon engineer thinks
     the ability to configure a line as input and output at the same
     time makes sense, this can be debated but includes a lot of analog
     electronics reasoning, and the registers are there and need to be
     handled consistently. Unsurprisingly, we enforce the lines to be
     either inputs or outputs in such schemes.

   - Send in the proper argument value to .set_config() dispatched to
     the pin control subsystem. Nobody used it before, now someone does,
     so fix it to work as expected.

   - The ACPI gpiolib portions can now handle pin bias setting (pull up
     or pull down). This has been in the ACPI spec for years and we
     finally have it properly integrated with Linux GPIOs. It was based
     on an observation from Andy Schevchenko that Thomas Petazzoni's
     changes to the core for biasing the PCA950x GPIO expander actually
     happen to fit hand-in-glove with what the ACPI core needed. Such
     nice synergies happen sometimes.

  New drivers:

   - A new driver for the Mellanox BlueField GPIO controller. This is
     using 64bit MMIO registers and can configure lines as inputs and
     outputs at the same time and after improving the MMIO library we
     handle it just fine. Interesting.

   - A new IXP4xx proper gpiochip driver with hierarchical interrupts
     should be coming in from the ARM SoC tree as well.

  Driver enhancements:

   - The PCA053x driver handles the CAT9554 GPIO expander.

   - The PCA053x driver handles the NXP PCAL6416 GPIO expander.

   - Wake-up support on PCA053x GPIO lines.

   - OMAP now does a nice asynchronous IRQ handling on wake-ups by
     letting everything wake up on edges, and this makes runtime PM work
     as expected too.

  Misc:

   - Several cleanups such as devres fixes.

   - Get rid of some languager comstructs that cause problems when
     compiling with LLVMs clang.

   - Documentation review and update"

* tag 'gpio-v5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (85 commits)
  gpio: Update documentation
  docs: gpio: convert docs to ReST and rename to *.rst
  gpio: sch: Remove write-only core_base
  gpio: pxa: Make two symbols static
  gpiolib: acpi: Respect pin bias setting
  gpiolib: acpi: Add acpi_gpio_update_gpiod_lookup_flags() helper
  gpiolib: acpi: Set pin value, based on bias, more accurately
  gpiolib: acpi: Change type of dflags
  gpiolib: Introduce GPIO_LOOKUP_FLAGS_DEFAULT
  gpiolib: Make use of enum gpio_lookup_flags consistent
  gpiolib: Indent entry values of enum gpio_lookup_flags
  gpio: pca953x: add support for pca6416
  dt-bindings: gpio: pca953x: document the nxp,pca6416
  gpio: pca953x: add pcal6416 to the of_device_id table
  gpio: gpio-omap: Remove conditional pm_runtime handling for GPIO interrupts
  gpio: gpio-omap: configure edge detection for level IRQs for idle wakeup
  tracing: stop making gpio tracing configurable
  gpio: pca953x: Configure wake-up path when wake-up is enabled
  gpio: of: Optimize quirk checks
  gpio: mmio: Drop bgpio_dir_inverted
  ...
2019-05-11 10:54:43 -04:00
Srivatsa S. Bhat (VMware) b941699760 tracing: Fix documentation about disabling options using trace_options
To disable a tracing option using the trace_options file, the option
name needs to be prefixed with 'no', and not suffixed, as the README
states. Fix it.

Link: http://lkml.kernel.org/r/154872690031.47356.5739053380942044586.stgit@srivatsa-ubuntu

Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-08 12:15:12 -04:00
Gustavo A. R. Silva 8623b00676 tracing: Replace kzalloc with kcalloc
Replace kzalloc() function with its 2-factor argument form, kcalloc().

This patch replaces cases of:

	kzalloc(a * b, gfp)

with:
	kcalloc(a, b, gfp)

This code was detected with the help of Coccinelle.

Link: http://lkml.kernel.org/r/20190115043408.GA23456@embeddedor

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-08 12:15:12 -04:00
Elazar Leibovich cbe08bcbbe tracing: Fix partial reading of trace event's id file
When reading only part of the id file, the ppos isn't tracked correctly.
This is taken care by simple_read_from_buffer.

Reading a single byte, and then the next byte would result EOF.

While this seems like not a big deal, this breaks abstractions that
reads information from files unbuffered. See for example
https://github.com/golang/go/issues/29399

This code was mentioned as problematic in
commit cd458ba9d5
("tracing: Do not (ab)use trace_seq in event_id_read()")

An example C code that show this bug is:

  #include <stdio.h>
  #include <stdint.h>

  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>
  #include <unistd.h>

  int main(int argc, char **argv) {
    if (argc < 2)
      return 1;
    int fd = open(argv[1], O_RDONLY);
    char c;
    read(fd, &c, 1);
    printf("First  %c\n", c);
    read(fd, &c, 1);
    printf("Second %c\n", c);
  }

Then run with, e.g.

  sudo ./a.out /sys/kernel/debug/tracing/events/tcp/tcp_set_state/id

You'll notice you're getting the first character twice, instead of the
first two characters in the id file.

Link: http://lkml.kernel.org/r/20181231115837.4932-1-elazar@lightbitslabs.com

Cc: Orit Wasserman <orit.was@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 23725aeeab ("ftrace: provide an id file for each event")
Signed-off-by: Elazar Leibovich <elazar@lightbitslabs.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-08 12:15:12 -04:00
Anders Roxell 6fc2171c5c tracing: Allow RCU to run between postponed startup tests
When building a allmodconfig kernel for arm64 and boot that in qemu,
CONFIG_FTRACE_STARTUP_TEST gets enabled and that takes time so the
watchdog expires and prints out a message like this:
'watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]'
Depending on what the what test gets called from init_trace_selftests()
it stays minutes in the loop.
Rework so that function cond_resched() gets called in the
init_trace_selftests loop.

Link: http://lkml.kernel.org/r/20181130145622.26334-1-anders.roxell@linaro.org

Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-08 12:15:12 -04:00
Colin Ian King bfcd631eb6 tracing: Fix white space issues in parse_pred() function
Trivial fix to clean up an indentation issue, a whole chunk of code
has an extra space in the indentation.

Link: http://lkml.kernel.org/r/20181109132312.20994-1-colin.king@canonical.com

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-08 12:15:12 -04:00
Rasmus Villemoes 0f5e5a3ab7 tracing: Eliminate const char[] auto variables
Automatic const char[] variables cause unnecessary code
generation. For example, the this_mod variable leads to

    3f04:       48 b8 5f 5f 74 68 69 73 5f 6d   movabs $0x6d5f736968745f5f,%rax # __this_m
    3f0e:       4c 8d 44 24 02                  lea    0x2(%rsp),%r8
    3f13:       48 8d 7c 24 10                  lea    0x10(%rsp),%rdi
    3f18:       48 89 44 24 02                  mov    %rax,0x2(%rsp)
    3f1d:       4c 89 e9                        mov    %r13,%rcx
    3f20:       b8 65 00 00 00                  mov    $0x65,%eax # e
    3f25:       48 c7 c2 00 00 00 00            mov    $0x0,%rdx
                        3f28: R_X86_64_32S      .rodata.str1.1+0x18d
    3f2c:       be 48 00 00 00                  mov    $0x48,%esi
    3f31:       c7 44 24 0a 6f 64 75 6c         movl   $0x6c75646f,0xa(%rsp) # odul
    3f39:       66 89 44 24 0e                  mov    %ax,0xe(%rsp)

i.e., the string gets built on the stack at runtime. Similar code can be
found for the other instances I'm replacing here. Putting the string
in .rodata reduces the combined .text+.rodata size and saves time and
stack space at runtime.

The simplest fix, and what I've done for the this_mod case, is to just
make the variable static.

However, for the "<faulted>" case where the same string is used twice,
that prevents the linker from merging those two literals, so instead use
a macro - that also keeps the two instances automatically in
sync (instead of only the compile-time strlen expression).

Finally, for the two runs of spaces, it turns out that the "build
these strings on the stack" is not the worst part of what gcc does -
it turns print_func_help_header_irq() into "if (tgid) { /*
print_event_info + five seq_printf calls */ } else { /* print
event_info + another five seq_printf */}". Taking inspiration from a
suggestion from Al Viro, use %.*s to make snprintf either stop after
the first two spaces or print the whole string. As a bonus, the
seq_printfs now fit on single lines (at least, they are not longer
than the existing ones in the function just above), making it easier
to see that the ascii art lines up.

x86-64 defconfig + CONFIG_FUNCTION_TRACER:

$ scripts/stackdelta /tmp/stackusage.{0,1}
./kernel/trace/ftrace.c ftrace_mod_callback     152     136     -16
./kernel/trace/trace.c  trace_default_header    56      32      -24
./kernel/trace/trace.c  tracing_mark_raw_write  96      72      -24
./kernel/trace/trace.c  tracing_mark_write      104     80      -24

bloat-o-meter

add/remove: 1/0 grow/shrink: 0/4 up/down: 14/-375 (-361)
Function                                     old     new   delta
this_mod                                       -      14     +14
ftrace_mod_callback                          577     542     -35
tracing_mark_raw_write                       444     374     -70
tracing_mark_write                           616     540     -76
trace_default_header                         600     406    -194

Link: http://lkml.kernel.org/r/20190320081757.6037-1-linux@rasmusvillemoes.dk

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-08 12:15:12 -04:00
Yangtao Li 5c173bedb2 ring-buffer: Fix mispelling of Calculate
It's not "Caculate".

Link: http://lkml.kernel.org/r/20181101154640.23162-1-tiny.windzz@gmail.com

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-08 12:15:12 -04:00
Masami Hiramatsu 3dd1f7f24f tracing: probeevent: Fix to make the type of $comm string
Fix to make the type of $comm "string".  If we set the other type to $comm
argument, it shows meaningless value or wrong data. Currently probe events
allow us to set string array type (e.g. ":string[2]"), or other digit types
like x8 on $comm. But since clearly $comm is just a string data, it should
not be fetched by other types including array.

Link: http://lkml.kernel.org/r/155723736241.9149.14582064184468574539.stgit@devnote2

Cc: Andreas Ziegler <andreas.ziegler@fau.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 533059281e ("tracing: probeevent: Introduce new argument fetching code")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-08 12:15:11 -04:00
Masami Hiramatsu 489fe0096b tracing: probeevent: Do not accumulate on ret variable
Do not accumulate strlen result on "ret" local variable, because
it is accumulated on "total" local variable for array case.

Link: http://lkml.kernel.org/r/155723735237.9149.3192150444705457531.stgit@devnote2

Fixes: 40b53b7718 ("tracing: probeevent: Add array type support")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-08 12:15:11 -04:00
Masami Hiramatsu 4dd537aca2 tracing: uprobes: Re-enable $comm support for uprobe events
Since commit 533059281e ("tracing: probeevent: Introduce new
argument fetching code") dropped the $comm support from uprobe
events, this re-enables it.

For $comm support, uses strlcpy() instead of strncpy_from_user()
to copy current task's comm. Because it is in the kernel space,
strncpy_from_user() always fails to copy the comm.
This also uses strlen() instead of strnlen_user() to measure the
length of the comm.

Note that this uses -ECOMM as a token value to fetch the comm
string. If the user-space pointer points -ECOMM, it will be
translated to task->comm.

Link: http://lkml.kernel.org/r/155723734162.9149.4042756162201097965.stgit@devnote2

Fixes: 533059281e ("tracing: probeevent: Introduce new argument fetching code")
Reported-by: Andreas Ziegler <andreas.ziegler@fau.de>
Acked-by: Andreas Ziegler <andreas.ziegler@fau.de>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-08 12:15:11 -04:00
Linus Torvalds 80f232121b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support AES128-CCM ciphers in kTLS, from Vakul Garg.

   2) Add fib_sync_mem to control the amount of dirty memory we allow to
      queue up between synchronize RCU calls, from David Ahern.

   3) Make flow classifier more lockless, from Vlad Buslov.

   4) Add PHY downshift support to aquantia driver, from Heiner
      Kallweit.

   5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces
      contention on SLAB spinlocks in heavy RPC workloads.

   6) Partial GSO offload support in XFRM, from Boris Pismenny.

   7) Add fast link down support to ethtool, from Heiner Kallweit.

   8) Use siphash for IP ID generator, from Eric Dumazet.

   9) Pull nexthops even further out from ipv4/ipv6 routes and FIB
      entries, from David Ahern.

  10) Move skb->xmit_more into a per-cpu variable, from Florian
      Westphal.

  11) Improve eBPF verifier speed and increase maximum program size,
      from Alexei Starovoitov.

  12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit
      spinlocks. From Neil Brown.

  13) Allow tunneling with GUE encap in ipvs, from Jacky Hu.

  14) Improve link partner cap detection in generic PHY code, from
      Heiner Kallweit.

  15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan
      Maguire.

  16) Remove SKB list implementation assumptions in SCTP, your's truly.

  17) Various cleanups, optimizations, and simplifications in r8169
      driver. From Heiner Kallweit.

  18) Add memory accounting on TX and RX path of SCTP, from Xin Long.

  19) Switch PHY drivers over to use dynamic featue detection, from
      Heiner Kallweit.

  20) Support flow steering without masking in dpaa2-eth, from Ioana
      Ciocoi.

  21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri
      Pirko.

  22) Increase the strict parsing of current and future netlink
      attributes, also export such policies to userspace. From Johannes
      Berg.

  23) Allow DSA tag drivers to be modular, from Andrew Lunn.

  24) Remove legacy DSA probing support, also from Andrew Lunn.

  25) Allow ll_temac driver to be used on non-x86 platforms, from Esben
      Haabendal.

  26) Add a generic tracepoint for TX queue timeouts to ease debugging,
      from Cong Wang.

  27) More indirect call optimizations, from Paolo Abeni"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits)
  cxgb4: Fix error path in cxgb4_init_module
  net: phy: improve pause mode reporting in phy_print_status
  dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings
  net: macb: Change interrupt and napi enable order in open
  net: ll_temac: Improve error message on error IRQ
  net/sched: remove block pointer from common offload structure
  net: ethernet: support of_get_mac_address new ERR_PTR error
  net: usb: smsc: fix warning reported by kbuild test robot
  staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check
  net: dsa: support of_get_mac_address new ERR_PTR error
  net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats
  vrf: sit mtu should not be updated when vrf netdev is the link
  net: dsa: Fix error cleanup path in dsa_init_module
  l2tp: Fix possible NULL pointer dereference
  taprio: add null check on sched_nest to avoid potential null pointer dereference
  net: mvpp2: cls: fix less than zero check on a u32 variable
  net_sched: sch_fq: handle non connected flows
  net_sched: sch_fq: do not assume EDT packets are ordered
  net: hns3: use devm_kcalloc when allocating desc_cb
  net: hns3: some cleanup for struct hns3_enet_ring
  ...
2019-05-07 22:03:58 -07:00
Linus Torvalds 0bc40e549a Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The changes in here are:

   - text_poke() fixes and an extensive set of executability lockdowns,
     to (hopefully) eliminate the last residual circumstances under
     which we are using W|X mappings even temporarily on x86 kernels.
     This required a broad range of surgery in text patching facilities,
     module loading, trampoline handling and other bits.

   - tweak page fault messages to be more informative and more
     structured.

   - remove DISCONTIGMEM support on x86-32 and make SPARSEMEM the
     default.

   - reduce KASLR granularity on 5-level paging kernels from 512 GB to
     1 GB.

   - misc other changes and updates"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  x86/mm: Initialize PGD cache during mm initialization
  x86/alternatives: Add comment about module removal races
  x86/kprobes: Use vmalloc special flag
  x86/ftrace: Use vmalloc special flag
  bpf: Use vmalloc special flag
  modules: Use vmalloc special flag
  mm/vmalloc: Add flag for freeing of special permsissions
  mm/hibernation: Make hibernation handle unmapped pages
  x86/mm/cpa: Add set_direct_map_*() functions
  x86/alternatives: Remove the return value of text_poke_*()
  x86/jump-label: Remove support for custom text poker
  x86/modules: Avoid breaking W^X while loading modules
  x86/kprobes: Set instruction page as executable
  x86/ftrace: Set trampoline pages as executable
  x86/kgdb: Avoid redundant comparison of patched code
  x86/alternatives: Use temporary mm for text poking
  x86/alternatives: Initialize temporary mm for patching
  fork: Provide a function for copying init_mm
  uprobes: Initialize uprobes earlier
  x86/mm: Save debug registers when loading a temporary mm
  ...
2019-05-06 16:13:31 -07:00
Linus Torvalds 2c6a392cdd Merge branch 'core-stacktrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull stack trace updates from Ingo Molnar:
 "So Thomas looked at the stacktrace code recently and noticed a few
  weirdnesses, and we all know how such stories of crummy kernel code
  meeting German engineering perfection end: a 45-patch series to clean
  it all up! :-)

  Here's the changes in Thomas's words:

   'Struct stack_trace is a sinkhole for input and output parameters
    which is largely pointless for most usage sites. In fact if embedded
    into other data structures it creates indirections and extra storage
    overhead for no benefit.

    Looking at all usage sites makes it clear that they just require an
    interface which is based on a storage array. That array is either on
    stack, global or embedded into some other data structure.

    Some of the stack depot usage sites are outright wrong, but
    fortunately the wrongness just causes more stack being used for
    nothing and does not have functional impact.

    Another oddity is the inconsistent termination of the stack trace
    with ULONG_MAX. It's pointless as the number of entries is what
    determines the length of the stored trace. In fact quite some call
    sites remove the ULONG_MAX marker afterwards with or without nasty
    comments about it. Not all architectures do that and those which do,
    do it inconsistenly either conditional on nr_entries == 0 or
    unconditionally.

    The following series cleans that up by:

      1) Removing the ULONG_MAX termination in the architecture code

      2) Removing the ULONG_MAX fixups at the call sites

      3) Providing plain storage array based interfaces for stacktrace
         and stackdepot.

      4) Cleaning up the mess at the callsites including some related
         cleanups.

      5) Removing the struct stack_trace based interfaces

    This is not changing the struct stack_trace interfaces at the
    architecture level, but it removes the exposure to the generic
    code'"

* 'core-stacktrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  x86/stacktrace: Use common infrastructure
  stacktrace: Provide common infrastructure
  lib/stackdepot: Remove obsolete functions
  stacktrace: Remove obsolete functions
  livepatch: Simplify stack trace retrieval
  tracing: Remove the last struct stack_trace usage
  tracing: Simplify stack trace retrieval
  tracing: Make ftrace_trace_userstack() static and conditional
  tracing: Use percpu stack trace buffer more intelligently
  tracing: Simplify stacktrace retrieval in histograms
  lockdep: Simplify stack trace handling
  lockdep: Remove save argument from check_prev_add()
  lockdep: Remove unused trace argument from print_circular_bug()
  drm: Simplify stacktrace handling
  dm persistent data: Simplify stack trace handling
  dm bufio: Simplify stack trace retrieval
  btrfs: ref-verify: Simplify stack trace retrieval
  dma/debug: Simplify stracktrace retrieval
  fault-inject: Simplify stacktrace retrieval
  mm/page_owner: Simplify stack trace handling
  ...
2019-05-06 13:11:48 -07:00
Linus Torvalds 6ec62961e6 Merge branch 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
 "This is a series from Peter Zijlstra that adds x86 build-time uaccess
  validation of SMAP to objtool, which will detect and warn about the
  following uaccess API usage bugs and weirdnesses:

   - call to %s() with UACCESS enabled
   - return with UACCESS enabled
   - return with UACCESS disabled from a UACCESS-safe function
   - recursive UACCESS enable
   - redundant UACCESS disable
   - UACCESS-safe disables UACCESS

  As it turns out not leaking uaccess permissions outside the intended
  uaccess functionality is hard when the interfaces are complex and when
  such bugs are mostly dormant.

  As a bonus we now also check the DF flag. We had at least one
  high-profile bug in that area in the early days of Linux, and the
  checking is fairly simple. The checks performed and warnings emitted
  are:

   - call to %s() with DF set
   - return with DF set
   - return with modified stack frame
   - recursive STD
   - redundant CLD

  It's all x86-only for now, but later on this can also be used for PAN
  on ARM and objtool is fairly cross-platform in principle.

  While all warnings emitted by this new checking facility that got
  reported to us were fixed, there might be GCC version dependent
  warnings that were not reported yet - which we'll address, should they
  trigger.

  The warnings are non-fatal build warnings"

* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC versions
  x86/uaccess: Dont leak the AC flag into __put_user() argument evaluation
  sched/x86_64: Don't save flags on context switch
  objtool: Add Direction Flag validation
  objtool: Add UACCESS validation
  objtool: Fix sibling call detection
  objtool: Rewrite alt->skip_orig
  objtool: Add --backtrace support
  objtool: Rewrite add_ignores()
  objtool: Handle function aliases
  objtool: Set insn->func for alternatives
  x86/uaccess, kcov: Disable stack protector
  x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP
  x86/uaccess, ubsan: Fix UBSAN vs. SMAP
  x86/uaccess, kasan: Fix KASAN vs SMAP
  x86/smap: Ditch __stringify()
  x86/uaccess: Introduce user_access_{save,restore}()
  x86/uaccess, signal: Fix AC=1 bloat
  x86/uaccess: Always inline user_access_begin()
  x86/uaccess, xen: Suppress SMAP warnings
  ...
2019-05-06 11:39:17 -07:00
David S. Miller ff24e4980a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Three trivial overlapping conflicts.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-02 22:14:21 -04:00
Douglas Anderson 03197fc02b tracing: kdb: Allow ftdump to skip all but the last few entries
The 'ftdump' command in kdb is currently a bit of a last resort, at
least if you have lots of traces turned on.  It's going to print a
whole boatload of data out your serial port which is probably running
at 115200.  This could easily take many, many minutes.

Usually you're most interested in what's at the _end_ of the ftrace
buffer, AKA what happened most recently.  That means you've got to
wait the full time for the dump.  The 'ftdump' command does attempt to
help you a little bit by allowing you to skip a fixed number of
entries.  Unfortunately it provides no way for you to know how many
entries you should skip.

Let's do similar to python and allow you to use a negative number to
indicate that you want to skip all entries except the last few.  This
allows you to quickly see what you want.

Note that we also change the printout in ftdump to print the
(positive) number of entries actually skipped since that could be
helpful to know when you've specified a negative skip count.

Link: http://lkml.kernel.org/r/20190319171206.97107-3-dianders@chromium.org

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-02 21:32:55 -04:00
Douglas Anderson ecffc8a8c7 tracing: Add trace_total_entries() / trace_total_entries_cpu()
These two new exported functions will be used in a future patch by
kdb_ftdump() to quickly skip all but the last few trace entries.

Link: http://lkml.kernel.org/r/20190319171206.97107-2-dianders@chromium.org

Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-02 21:32:31 -04:00
Douglas Anderson dbfe67334a tracing: kdb: The skip_lines parameter should have been skip_entries
The things skipped by kdb's "ftdump" command when you pass it a
parameter has always been entries, not lines.  The difference usually
doesn't matter but when the trace buffer has multi-line entries (like
a stack dump) it can matter.

Let's fix this both in the help text for ftdump and also in the local
variable names.

Link: http://lkml.kernel.org/r/20190319171206.97107-1-dianders@chromium.org

Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-02 21:31:19 -04:00
Nadav Amit c7b6f29b62 bpf: Fail bpf_probe_write_user() while mm is switched
When using a temporary mm, bpf_probe_write_user() should not be able to
write to user memory, since user memory addresses may be used to map
kernel memory.  Detect these cases and fail bpf_probe_write_user() in
such cases.

Suggested-by: Jann Horn <jannh@google.com>
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: <will.deacon@arm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-24-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30 12:37:48 +02:00
Thomas Gleixner 9f50c91b11 tracing: Remove the last struct stack_trace usage
Simplify the stack retrieval code by using the storage array based
interface.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-mm@kvack.org
Cc: David Rientjes <rientjes@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: iommu@lists.linux-foundation.org
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org
Cc: dm-devel@redhat.com
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: linux-arch@vger.kernel.org
Link: https://lkml.kernel.org/r/20190425094803.340000461@linutronix.de
2019-04-29 12:37:56 +02:00
Thomas Gleixner ee6dd0db4d tracing: Simplify stack trace retrieval
Replace the indirection through struct stack_trace by using the storage
array based interfaces.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-mm@kvack.org
Cc: David Rientjes <rientjes@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: iommu@lists.linux-foundation.org
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org
Cc: dm-devel@redhat.com
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: linux-arch@vger.kernel.org
Link: https://lkml.kernel.org/r/20190425094803.248604594@linutronix.de
2019-04-29 12:37:55 +02:00
Thomas Gleixner c438f140cc tracing: Make ftrace_trace_userstack() static and conditional
It's only used in trace.c and there is absolutely no point in compiling it
in when user space stack traces are not supported.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-mm@kvack.org
Cc: David Rientjes <rientjes@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: iommu@lists.linux-foundation.org
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org
Cc: dm-devel@redhat.com
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: linux-arch@vger.kernel.org
Link: https://lkml.kernel.org/r/20190425094803.162400595@linutronix.de
2019-04-29 12:37:55 +02:00
Thomas Gleixner 2a820bf749 tracing: Use percpu stack trace buffer more intelligently
The per cpu stack trace buffer usage pattern is odd at best. The buffer has
place for 512 stack trace entries on 64-bit and 1024 on 32-bit. When
interrupts or exceptions nest after the per cpu buffer was acquired the
stacktrace length is hardcoded to 8 entries. 512/1024 stack trace entries
in kernel stacks are unrealistic so the buffer is a complete waste.

Split the buffer into 4 nest levels, which are 128/256 entries per
level. This allows nesting contexts (interrupts, exceptions) to utilize the
cpu buffer for stack retrieval and avoids the fixed length allocation along
with the conditional execution pathes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-mm@kvack.org
Cc: David Rientjes <rientjes@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: iommu@lists.linux-foundation.org
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org
Cc: dm-devel@redhat.com
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: linux-arch@vger.kernel.org
Link: https://lkml.kernel.org/r/20190425094803.066064076@linutronix.de
2019-04-29 12:37:55 +02:00
Thomas Gleixner e7d916632b tracing: Simplify stacktrace retrieval in histograms
The indirection through struct stack_trace is not necessary at all. Use the
storage array based interface.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-mm@kvack.org
Cc: David Rientjes <rientjes@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: iommu@lists.linux-foundation.org
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org
Cc: dm-devel@redhat.com
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: linux-arch@vger.kernel.org
Link: https://lkml.kernel.org/r/20190425094802.979089273@linutronix.de
2019-04-29 12:37:54 +02:00
Thomas Gleixner 3d9a807291 tracing: Cleanup stack trace code
- Remove the extra array member of stack_dump_trace[] along with the
  ARRAY_SIZE - 1 initialization for struct stack_trace :: max_entries.

  Both are historical leftovers of no value. The stack tracer never exceeds
  the array and there is no extra storage requirement either.

- Make variables which are only used in trace_stack.c static.

- Simplify the enable/disable logic.

- Rename stack_trace_print() as it's using the stack_trace_ namespace. Free
  the name up for stack trace related functions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-mm@kvack.org
Cc: David Rientjes <rientjes@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: iommu@lists.linux-foundation.org
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org
Cc: dm-devel@redhat.com
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: linux-arch@vger.kernel.org
Link: https://lkml.kernel.org/r/20190425094801.230654524@linutronix.de
2019-04-29 12:37:46 +02:00
David S. Miller 5f0d736e7f Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2019-04-28

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Introduce BPF socket local storage map so that BPF programs can store
   private data they associate with a socket (instead of e.g. separate hash
   table), from Martin.

2) Add support for bpftool to dump BTF types. This is done through a new
   `bpftool btf dump` sub-command, from Andrii.

3) Enable BPF-based flow dissector for skb-less eth_get_headlen() calls which
   was currently not supported since skb was used to lookup netns, from Stanislav.

4) Add an opt-in interface for tracepoints to expose a writable context
   for attached BPF programs, used here for NBD sockets, from Matt.

5) BPF xadd related arm64 JIT fixes and scalability improvements, from Daniel.

6) Change the skb->protocol for bpf_skb_adjust_room() helper in order to
   support tunnels such as sit. Add selftests as well, from Willem.

7) Various smaller misc fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28 08:42:41 -04:00
Matt Mullins 9df1c28bb7 bpf: add writable context for raw tracepoints
This is an opt-in interface that allows a tracepoint to provide a safe
buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program.
The size of the buffer must be a compile-time constant, and is checked
before allowing a BPF program to attach to a tracepoint that uses this
feature.

The pointer to this buffer will be the first argument of tracepoints
that opt in; the pointer is valid and can be bpf_probe_read() by both
BPF_PROG_TYPE_RAW_TRACEPOINT and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE
programs that attach to such a tracepoint, but the buffer to which it
points may only be written by the latter.

Signed-off-by: Matt Mullins <mmullins@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-26 19:04:19 -07:00
Linus Torvalds e9e1a2e7b4 There tracing fixes:
- Use "nosteal" for ring buffer splice pages
  - Memory leak fix in error path of trace_pid_write()
  - Fix preempt_enable_no_resched() (use preempt_enable()) in ring buffer code
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXMMoghQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qmB1AQDfpVxYxcmxibBBAM6fZyILYpKqDWmy
 ut6gHZ+GHhQT4AEAwSRsC6V4yO3d5dJFpkcQXUj1v+Ip9XU+dv//s8O6tAI=
 =LsG/
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Three tracing fixes:

   - Use "nosteal" for ring buffer splice pages

   - Memory leak fix in error path of trace_pid_write()

   - Fix preempt_enable_no_resched() (use preempt_enable()) in ring
     buffer code"

* tag 'trace-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  trace: Fix preempt_enable_no_resched() abuse
  tracing: Fix a memory leak by early error exit in trace_pid_write()
  tracing: Fix buffer_ref pipe ops
2019-04-26 11:09:55 -07:00
Peter Zijlstra d6097c9e44 trace: Fix preempt_enable_no_resched() abuse
Unless the very next line is schedule(), or implies it, one must not use
preempt_enable_no_resched(). It can cause a preemption to go missing and
thereby cause arbitrary delays, breaking the PREEMPT=y invariant.

Link: http://lkml.kernel.org/r/20190423200318.GY14281@hirez.programming.kicks-ass.net

Cc: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: the arch/x86 maintainers <x86@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: stable@vger.kernel.org
Fixes: 2c2d7329d8 ("tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-26 11:45:03 -04:00
Wenwen Wang 91862cc786 tracing: Fix a memory leak by early error exit in trace_pid_write()
In trace_pid_write(), the buffer for trace parser is allocated through
kmalloc() in trace_parser_get_init(). Later on, after the buffer is used,
it is then freed through kfree() in trace_parser_put(). However, it is
possible that trace_pid_write() is terminated due to unexpected errors,
e.g., ENOMEM. In that case, the allocated buffer will not be freed, which
is a memory leak bug.

To fix this issue, free the allocated buffer when an error is encountered.

Link: http://lkml.kernel.org/r/1555726979-15633-1-git-send-email-wang6495@umn.edu

Fixes: f4d34a87e9 ("tracing: Use pid bitmap instead of a pid array for set_event_pid")
Cc: stable@vger.kernel.org
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-26 11:45:03 -04:00
Jann Horn b987222654 tracing: Fix buffer_ref pipe ops
This fixes multiple issues in buffer_pipe_buf_ops:

 - The ->steal() handler must not return zero unless the pipe buffer has
   the only reference to the page. But generic_pipe_buf_steal() assumes
   that every reference to the pipe is tracked by the page's refcount,
   which isn't true for these buffers - buffer_pipe_buf_get(), which
   duplicates a buffer, doesn't touch the page's refcount.
   Fix it by using generic_pipe_buf_nosteal(), which refuses every
   attempted theft. It should be easy to actually support ->steal, but the
   only current users of pipe_buf_steal() are the virtio console and FUSE,
   and they also only use it as an optimization. So it's probably not worth
   the effort.
 - The ->get() and ->release() handlers can be invoked concurrently on pipe
   buffers backed by the same struct buffer_ref. Make them safe against
   concurrency by using refcount_t.
 - The pointers stored in ->private were only zeroed out when the last
   reference to the buffer_ref was dropped. As far as I know, this
   shouldn't be necessary anyway, but if we do it, let's always do it.

Link: http://lkml.kernel.org/r/20190404215925.253531-1-jannh@google.com

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Fixes: 73a757e631 ("ring-buffer: Return reader page back into existing ring buffer")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-26 11:44:39 -04:00
David S. Miller 8b44836583 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two easy cases of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-25 23:52:29 -04:00
David S. Miller 2843ba2ec7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2019-04-22

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) allow stack/queue helpers from more bpf program types, from Alban.

2) allow parallel verification of root bpf programs, from Alexei.

3) introduce bpf sysctl hook for trusted root cases, from Andrey.

4) recognize var/datasec in btf deduplication, from Andrii.

5) cpumap performance optimizations, from Jesper.

6) verifier prep for alu32 optimization, from Jiong.

7) libbpf xsk cleanup, from Magnus.

8) other various fixes and cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-22 21:35:55 -07:00
Steven Rostedt (VMware) 52fde6e70c function_graph: Have selftest also emulate tr->reset() as it did with tr->init()
The function_graph boot up self test emulates the tr->init() function in
order to add a wrapper around the function graph tracer entry code to test
for lock ups and such. But it does not emulate the tr->reset(), and just
calls the function_graph tracer tr->reset() function which will use its own
fgraph_ops to unregister function tracing with. As the fgraph_ops is
becoming more meaningful with the register_ftrace_graph() and
unregister_ftrace_graph() functions, the two need to be the same. The
emulated tr->init() uses its own fgraph_ops descriptor, which means the
unregister_ftrace_graph() must use the same ftrace_ops, which the selftest
currently does not do. By emulating the tr->reset() as the selftest does
with the tr->init() it will be able to pass the same fgraph_ops descriptor
to the unregister_ftrace_graph() as it did with the register_ftrace_graph().

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-21 19:46:56 -04:00
Masami Hiramatsu fabe38ab6b kprobes: Mark ftrace mcount handler functions nokprobe
Mark ftrace mcount handler functions nokprobe since
probing on these functions with kretprobe pushes
return address incorrectly on kretprobe shadow stack.

Reported-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Tested-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/155094062044.6137.6419622920568680640.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19 14:26:06 +02:00
Alban Crequy 02a8c817a3 bpf: add map helper functions push, pop, peek in more BPF programs
commit f1a2e44a3a ("bpf: add queue and stack maps") introduced new BPF
helper functions:
- BPF_FUNC_map_push_elem
- BPF_FUNC_map_pop_elem
- BPF_FUNC_map_peek_elem

but they were made available only for network BPF programs. This patch
makes them available for tracepoint, cgroup and lirc programs.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Cc: Mauricio Vasquez B <mauricio.vasquez@polito.it>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-16 10:24:02 +02:00
Linus Torvalds 6b3a707736 Merge branch 'page-refs' (page ref overflow)
Merge page ref overflow branch.

Jann Horn reported that he can overflow the page ref count with
sufficient memory (and a filesystem that is intentionally extremely
slow).

Admittedly it's not exactly easy.  To have more than four billion
references to a page requires a minimum of 32GB of kernel memory just
for the pointers to the pages, much less any metadata to keep track of
those pointers.  Jann needed a total of 140GB of memory and a specially
crafted filesystem that leaves all reads pending (in order to not ever
free the page references and just keep adding more).

Still, we have a fairly straightforward way to limit the two obvious
user-controllable sources of page references: direct-IO like page
references gotten through get_user_pages(), and the splice pipe page
duplication.  So let's just do that.

* branch page-refs:
  fs: prevent page refcount overflow in pipe_buf_get
  mm: prevent get_user_pages() from overflowing page refcount
  mm: add 'try_get_page()' helper function
  mm: make page ref count overflow check tighter and more explicit
2019-04-14 15:09:40 -07:00
Thomas Gleixner 4285f2fcef tracing: Remove the ULONG_MAX stack trace hackery
No architecture terminates the stack trace with ULONG_MAX anymore. As the
code checks the number of entries stored anyway there is no point in
keeping all that ULONG_MAX magic around.

The histogram code zeroes the storage before saving the stack, so if the
trace is shorter than the maximum number of entries it can terminate the
print loop if a zero entry is detected.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexander Potapenko <glider@google.com>
Link: https://lkml.kernel.org/r/20190410103645.048761764@linutronix.de
2019-04-14 19:58:32 +02:00
Matthew Wilcox 15fab63e1e fs: prevent page refcount overflow in pipe_buf_get
Change pipe_buf_get() to return a bool indicating whether it succeeded
in raising the refcount of the page (if the thing in the pipe is a page).
This removes another mechanism for overflowing the page refcount.  All
callers converted to handle a failure.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-14 10:00:04 -07:00
Steven Rostedt (VMware) 2fa717a033 ftrace: Do not process STUB functions in ftrace_ops_list_func()
The function_graph tracer has a stub function and its ops flag has the
FTRACE_OPS_FL_STUB set. As the function graph does not use the
ftrace_ops->func pointer but instead is called by a separate part of the
ftrace trampoline. The function_graph tracer still requires to pass in a
ftrace_ops that may also hold the hash of the functions to call. But there's
no reason to test that hash in the function tracing portion. Instead of
testing to see if we should call the stub function, just test if the ops has
FTRACE_OPS_FL_STUB set, and just skip it.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-11 11:46:13 -04:00
Steven Rostedt (VMware) ee6a6500fe ftrace: Remove ASSIGN_OPS_HASH() macro from ftrace.c
The ASSIGN_OPS_HASH() macro was moved to fgraph.c where it was used, but for
some reason it wasn't removed from ftrace.c, as it is no longer referenced
there.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-10 10:45:38 -04:00
Tom Zanussi a8d655792a tracing: Add error_log to README
Add brief blurb about error_log to the 'Important files' section.

Link: http://lkml.kernel.org/r/c81e60f9aded495081231a32d2d1023c4d043a7a.1554072478.git.tom.zanussi@linux.intel.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-08 09:22:50 -04:00
Steven Rostedt (VMware) 2f754e771b tracing: Have the error logs show up in the proper instances
As each instance has their own error_log file, it makes more sense that the
instances show the errors of their own instead of all error_logs having the
same data. Make it that the errors show up in the instance error_log file
that the error happens in. If no instance trace_array is available, then
NULL can be passed in which will create the error in the top level instance
(the one at the top of the tracefs directory).

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-08 09:22:44 -04:00
Steven Rostedt (VMware) d0cd871ba0 tracing: Have histogram code pass around trace_array for error handling
Have the trace_array that associates the trace instance of the histogram
passed around to functions so that error handling can display the error
message in the proper instance.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-08 09:22:38 -04:00
Steven Rostedt (VMware) 1e144d73f7 tracing: Add trace_array parameter to create_event_filter()
Pass in the trace_array that represents the instance the filter being
changed is in to create_event_filter(). This will allow for error messages
that happen when writing to the filter can be displayed in the proper
instance "error_log" file.

Note, for calls to create_filter() (that was also modified to support
create_event_filter()), that changes filters that do not exist in a instance
(for perf for example), NULL may be passed in, which means that there will
not be any message to log for that filter.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-08 09:22:28 -04:00
Uwe Kleine-König 12f2639038 tracing: stop making gpio tracing configurable
gpio tracing was made configurable in 4.4-rc1 (commit ddd70280bf
("tracing: gpio: Add Kconfig option for enabling/disabling trace
events")). Since then it is the only event type that can be compiled
conditionally. Given that there is only little overhead I don't
understand the reasoning and I was annoyed more than once that gpio
events were not available without recompiling.

So drop the Kconfig symbol and make gpio events available
unconditionally.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-04-08 15:11:48 +02:00
Steven Rostedt (Red Hat) b35f549df1 syscalls: Remove start and number from syscall_get_arguments() args
At Linux Plumbers, Andy Lutomirski approached me and pointed out that the
function call syscall_get_arguments() implemented in x86 was horribly
written and not optimized for the standard case of passing in 0 and 6 for
the starting index and the number of system calls to get. When looking at
all the users of this function, I discovered that all instances pass in only
0 and 6 for these arguments. Instead of having this function handle
different cases that are never used, simply rewrite it to return the first 6
arguments of a system call.

This should help out the performance of tracing system calls by ptrace,
ftrace and perf.

Link: http://lkml.kernel.org/r/20161107213233.754809394@goodmis.org

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Dave Martin <dave.martin@arm.com>
Cc: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: x86@kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: uclinux-h8-devel@lists.sourceforge.jp
Cc: linux-hexagon@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: nios2-dev@lists.rocketboards.org
Cc: openrisc@lists.librecores.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-arch@vger.kernel.org
Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts
Acked-by: Max Filippov <jcmvbkbc@gmail.com> # For xtensa changes
Acked-by: Will Deacon <will.deacon@arm.com> # For the arm64 bits
Reviewed-by: Thomas Gleixner <tglx@linutronix.de> # for x86
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-05 09:26:43 -04:00
Steven Rostedt (Red Hat) d08e411397 tracing/syscalls: Pass in hardcoded 6 into syscall_get_arguments()
The only users that calls syscall_get_arguments() with a variable and not a
hard coded '6' is ftrace_syscall_enter(). syscall_get_arguments() can be
optimized by removing a variable input, and always grabbing 6 arguments
regardless of what the system call actually uses.

Change ftrace_syscall_enter() to pass the 6 args into a local stack array
and copy the necessary arguments into the trace event as needed.

This is needed to remove two parameters from syscall_get_arguments().

Link: http://lkml.kernel.org/r/20161107213233.627583542@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-04 09:17:52 -04:00
Peter Zijlstra 4a6c91fbde x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP
For CONFIG_TRACE_BRANCH_PROFILING=y the likely/unlikely things get
overloaded and generate callouts to this code, and thus also when
AC=1.

Make it safe.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-03 11:02:24 +02:00
Masami Hiramatsu ab105a4fb8 tracing: Use tracing error_log with probe events
Use tracing error_log with probe events for logging error more
precisely. This also makes all parse error returns -EINVAL
(except for -ENOMEM), because user can see better error message
in error_log file now.

Link: http://lkml.kernel.org/r/6a4d90e141d138040ea61f4776b991597077451e.1554072478.git.tom.zanussi@linux.intel.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-02 18:24:07 -04:00
Tom Zanussi 34f76afaac tracing: Use tracing error_log with trace event filters
Use tracing_log_err() from the new tracing error_log mechanism to send
filter parse errors to tracing/error_log.

With this change, users will be able to see filter errors by looking
at tracing/error_log.

The same errors will also be available in the filter file, as
expected.

Link: http://lkml.kernel.org/r/1d942c419941539a11d78a6810fc5740a99b2974.1554072478.git.tom.zanussi@linux.intel.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-02 18:24:07 -04:00
Tom Zanussi d566c5e9d1 tracing: Use tracing error_log with hist triggers
Replace hist_err() and hist_err_event() with tracing_log_err() from
the new tracing error_log mechanism.

Also add a couple related helper functions and remove most of the old
hist_err()-related code.

With this change, users no longer read the hist files for hist trigger
error information, but instead look at tracing/error_log for the same
information.

Link: http://lkml.kernel.org/r/c98f77a97c9715d18b623eeb5741057b330d5ac0.1554072478.git.tom.zanussi@linux.intel.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-02 18:24:07 -04:00
Tom Zanussi a1a05bb40e tracing: Save the last hist command's associated event name
In preparation for making use of the new trace error log, save the
subsystem and event name associated with the last hist command - it
will be passed as the location param in the event_log_err() calls.

Link: http://lkml.kernel.org/r/eb0fd1362be8f39facb86c83eecf441b7a5876f8.1554072478.git.tom.zanussi@linux.intel.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-02 18:24:07 -04:00
Tom Zanussi 8a062902be tracing: Add tracing error log
Introduce a new ftrace file, tracing/error_log, for ftrace commands to
log errors.  This is useful for allowing more complex commands such as
hist trigger and kprobe_event commands to point out specifically where
something may have gone wrong without forcing them to resort to more
ad hoc methods such as tacking error messages onto existing output
files.

To log a tracing error, call the event_log_err() function, passing it
a location string describing where it came from e.g. kprobe_events or
system:event, the command that caused the error, an array of static
error strings describing errors and an index within that array which
describes the specific error, along with the position to place the
error caret.

Reading the log displays the last (currently) 8 errors logged in the
following format:

  [timestamp] <loc>: error: <static error text>
    Command: <command that caused the error>
                      ^

Memory for the error log isn't allocated unless there has been a trace
event error, and the error log can be cleared and have its memory
freed by writing the empty string in truncation mode to it:

  # echo > tracing/error_log.

Link: http://lkml.kernel.org/r/0c2c82571fd38c5f3a88ca823627edff250e9416.1554072478.git.tom.zanussi@linux.intel.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Improvements-suggested-by: Steve Rostedt <rostedt@goodmis.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-02 18:24:06 -04:00
Divya Indi f45d1225ad tracing: Kernel access to Ftrace instances
Ftrace provides the feature “instances” that provides the capability to
create multiple Ftrace ring buffers. However, currently these buffers
are created/accessed via userspace only. The kernel APIs providing these
features are not exported, hence cannot be used by other kernel
components.

This patch aims to extend this infrastructure to provide the
flexibility to create/log/remove/ enable-disable existing trace events
to these buffers from within the kernel.

Link: http://lkml.kernel.org/r/1553106531-3281-2-git-send-email-divya.indi@oracle.com

Signed-off-by: Divya Indi <divya.indi@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-02 18:24:06 -04:00
YueHaibing 40ed29b373 ring-buffer: Fix ring buffer size in rb_write_something()
'cnt' should be used to calculate ring buffer size rather than data->cnt

Link: http://lkml.kernel.org/r/1537704693-184237-1-git-send-email-yuehaibing@huawei.com

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-02 18:24:06 -04:00
Hariprasad Kelam 9efb85c5cf ftrace: Fix warning using plain integer as NULL & spelling corrections
Changed  0 --> NULL to avoid sparse warning
Corrected spelling mistakes reported by checkpatch.pl
Sparse warning below:

sudo make C=2 CF=-D__CHECK_ENDIAN__ M=kernel/trace

CHECK   kernel/trace/ftrace.c
kernel/trace/ftrace.c:3007:24: warning: Using plain integer as NULL pointer
kernel/trace/ftrace.c:4758:37: warning: Using plain integer as NULL pointer

Link: http://lkml.kernel.org/r/20190323183523.GA2244@hari-Inspiron-1545

Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-26 08:35:36 -04:00
Frank Rowand 3dee10da2e tracing: initialize variable in create_dyn_event()
Fix compile warning in create_dyn_event(): 'ret' may be used uninitialized
in this function [-Wuninitialized].

Link: http://lkml.kernel.org/r/1553237900-8555-1-git-send-email-frowand.list@gmail.com

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Fixes: 5448d44c38 ("tracing: Add unified dynamic event framework")
Signed-off-by: Frank Rowand <frank.rowand@sony.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-26 08:35:36 -04:00
Tom Zanussi ff9d31d0d4 tracing: Remove unnecessary var_ref destroy in track_data_destroy()
Commit 656fe2ba85 (tracing: Use hist trigger's var_ref array to
destroy var_refs) centralized the destruction of all the var_refs
in one place so that other code didn't have to do it.

The track_data_destroy() added later ignored that and also destroyed
the track_data var_ref, causing a double-free error flagged by KASAN.

==================================================================
BUG: KASAN: use-after-free in destroy_hist_field+0x30/0x70
Read of size 8 at addr ffff888086df2210 by task bash/1694

CPU: 6 PID: 1694 Comm: bash Not tainted 5.1.0-rc1-test+ #15
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03
07/14/2016
Call Trace:
 dump_stack+0x71/0xa0
 ? destroy_hist_field+0x30/0x70
 print_address_description.cold.3+0x9/0x1fb
 ? destroy_hist_field+0x30/0x70
 ? destroy_hist_field+0x30/0x70
 kasan_report.cold.4+0x1a/0x33
 ? __kasan_slab_free+0x100/0x150
 ? destroy_hist_field+0x30/0x70
 destroy_hist_field+0x30/0x70
 track_data_destroy+0x55/0xe0
 destroy_hist_data+0x1f0/0x350
 hist_unreg_all+0x203/0x220
 event_trigger_open+0xbb/0x130
 do_dentry_open+0x296/0x700
 ? stacktrace_count_trigger+0x30/0x30
 ? generic_permission+0x56/0x200
 ? __x64_sys_fchdir+0xd0/0xd0
 ? inode_permission+0x55/0x200
 ? security_inode_permission+0x18/0x60
 path_openat+0x633/0x22b0
 ? path_lookupat.isra.50+0x420/0x420
 ? __kasan_kmalloc.constprop.12+0xc1/0xd0
 ? kmem_cache_alloc+0xe5/0x260
 ? getname_flags+0x6c/0x2a0
 ? do_sys_open+0x149/0x2b0
 ? do_syscall_64+0x73/0x1b0
 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
 ? _raw_write_lock_bh+0xe0/0xe0
 ? __kernel_text_address+0xe/0x30
 ? unwind_get_return_address+0x2f/0x50
 ? __list_add_valid+0x2d/0x70
 ? deactivate_slab.isra.62+0x1f4/0x5a0
 ? getname_flags+0x6c/0x2a0
 ? set_track+0x76/0x120
 do_filp_open+0x11a/0x1a0
 ? may_open_dev+0x50/0x50
 ? _raw_spin_lock+0x7a/0xd0
 ? _raw_write_lock_bh+0xe0/0xe0
 ? __alloc_fd+0x10f/0x200
 do_sys_open+0x1db/0x2b0
 ? filp_open+0x50/0x50
 do_syscall_64+0x73/0x1b0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fa7b24a4ca2
Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48 8d 05 85 7a 0d 00 8b 00 85 c0
75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff
0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
RSP: 002b:00007fffbafb3af0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 000055d3648ade30 RCX: 00007fa7b24a4ca2
RDX: 0000000000000241 RSI: 000055d364a55240 RDI: 00000000ffffff9c
RBP: 00007fffbafb3bf0 R08: 0000000000000020 R09: 0000000000000002
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000003 R14: 0000000000000001 R15: 000055d364a55240
==================================================================

So remove the track_data_destroy() destroy_hist_field() call for that
var_ref.

Link: http://lkml.kernel.org/r/1deffec420f6a16d11dd8647318d34a66d1989a9.camel@linux.intel.com

Fixes: 466f4528fb ("tracing: Generalize hist trigger onmax and save action")
Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-26 08:34:00 -04:00
Linus Torvalds 11efae3506 for-5.1/block-post-20190315
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlyL124QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgptsxD/42slmoE5TC3vwXcgMBEilrjIHCns6O4Leo
 0r8Awdwil8QkVDphfAWsgkTBjRPUNKv4cCg2kG4VEzAy62YSutUWPeqJZwLOpGDI
 kji9XI6WLqwQ/VhDFwEln9G+xWDUQxds5PZDomlzLpjiNqkFArwwsPFnJbshH4fB
 U6kZrhVSLfvJHIJmC9H4RIWuTEwUH1yFSvzzMqDOOyvRon2g/A2YlHb2KhSCaJPq
 1b0jbhyR0GVP0EH1FdeKvNYFZfvXXSPAbxDN1CEtW/Lq8WxXeoaCj390tC+gL7yQ
 WWHntvUoVU/weWudbT3tVsYgpI91KfPM5OuWTDGod6lFwHrI5X91Pao3KYUGPb9d
 cwvNBOlkNqR1ENZOGTgxLeKwiwV7G1DIjvsaijRQJhGy4Uw4RkM/YEct9JHxWBIF
 x4ZuSVUVZ5Y3zNPC945iJ6Z5feOz/UO9bQL00oimu0c0JhAp++3pHWAFJEMQ8q1a
 0IRifkeUyhf0p9CIVPDnUzmNgSBglFkAVTPVAWySBVDU+v0/GoNcYwTzPq4cgPrF
 UJEIlx+RdDpKKmCqBvKjtx4w7BC1lCebL/1ZJrbARNO42djt8xeuyvKw0t+MYVTZ
 UsvLX72tXwUIbj0IZZGuz+8uSGD4ddDs8+x486FN4oaCPf36FUnnkOZZkhjV/KQA
 vsZNrNNZpw==
 =qBae
 -----END PGP SIGNATURE-----

Merge tag 'for-5.1/block-post-20190315' of git://git.kernel.dk/linux-block

Pull more block layer changes from Jens Axboe:
 "This is a collection of both stragglers, and fixes that came in after
  I finalized the initial pull. This contains:

   - An MD pull request from Song, with a few minor fixes

   - Set of NVMe patches via Christoph

   - Pull request from Konrad, with a few fixes for xen/blkback

   - pblk fix IO calculation fix (Javier)

   - Segment calculation fix for pass-through (Ming)

   - Fallthrough annotation for blkcg (Mathieu)"

* tag 'for-5.1/block-post-20190315' of git://git.kernel.dk/linux-block: (25 commits)
  blkcg: annotate implicit fall through
  nvme-tcp: support C2HData with SUCCESS flag
  nvmet: ignore EOPNOTSUPP for discard
  nvme: add proper write zeroes setup for the multipath device
  nvme: add proper discard setup for the multipath device
  nvme: remove nvme_ns_config_oncs
  nvme: disable Write Zeroes for qemu controllers
  nvmet-fc: bring Disconnect into compliance with FC-NVME spec
  nvmet-fc: fix issues with targetport assoc_list list walking
  nvme-fc: reject reconnect if io queue count is reduced to zero
  nvme-fc: fix numa_node when dev is null
  nvme-fc: use nr_phys_segments to determine existence of sgl
  nvme-loop: init nvmet_ctrl fatal_err_work when allocate
  nvme: update comment to make the code easier to read
  nvme: put ns_head ref if namespace fails allocation
  nvme-trace: fix cdw10 buffer overrun
  nvme: don't warn on block content change effects
  nvme: add get-feature to admin cmds tracer
  md: Fix failed allocation of md_register_thread
  It's wrong to add len to sector_nr in raid10 reshape twice
  ...
2019-03-16 12:36:39 -07:00
Linus Torvalds aa2e3ac64a This contains a series of last minute clean ups, small fixes and
error checks.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXIukmxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qi+PAQCKf6Yz7LZ3oBtjKy7jJRhkAn3Ie5Ls
 n7KOXnOGntO1cgD/RdynJcFtpwgDxuj7L/c3iInel0B/rdU5VLbglXy+2AA=
 =y/ft
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes and cleanups from Steven Rostedt:
 "This contains a series of last minute clean ups, small fixes and error
  checks"

* tag 'trace-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/probe: Verify alloc_trace_*probe() result
  tracing/probe: Check event/group naming rule at parsing
  tracing/probe: Check the size of argument name and body
  tracing/probe: Check event name length correctly
  tracing/probe: Check maxactive error cases
  tracing: kdb: Fix ftdump to not sleep
  trace/probes: Remove kernel doc style from non kernel doc comment
  tracing/probes: Make reserved_field_names static
2019-03-15 14:47:02 -07:00
Masami Hiramatsu a039480e9e tracing/probe: Verify alloc_trace_*probe() result
Since alloc_trace_*probe() returns -EINVAL only if !event && !group,
it should not happen in trace_*probe_create(). If we catch that case
there is a bug. So use WARN_ON_ONCE() instead of pr_info().

Link: http://lkml.kernel.org/r/155253785078.14922.16902223633734601469.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-14 19:54:21 -04:00
Masami Hiramatsu 5b7a962209 tracing/probe: Check event/group naming rule at parsing
Check event and group naming rule at parsing it instead
of allocating probes.

Link: http://lkml.kernel.org/r/155253784064.14922.2336893061156236237.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-14 19:54:11 -04:00
Masami Hiramatsu b4443c17a3 tracing/probe: Check the size of argument name and body
Check the size of argument name and expression is not 0
and smaller than maximum length.

Link: http://lkml.kernel.org/r/155253783029.14922.12650939303827581096.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-14 19:53:57 -04:00
Masami Hiramatsu dec65d79fd tracing/probe: Check event name length correctly
Ensure given name of event is not too long when parsing it,
and fix to update event name offset correctly when the group
name is given. For example, this makes probe event to check
the "p:foo/" error case correctly.

Link: http://lkml.kernel.org/r/155253782046.14922.14724124823730168629.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-14 19:53:47 -04:00
Masami Hiramatsu 287c038c0b tracing/probe: Check maxactive error cases
Check maxactive on kprobe error case, because maxactive
is only for kretprobe, not for kprobe. Also, maxactive
should not be 0, it should be at least 1.

Link: http://lkml.kernel.org/r/155253780952.14922.15784129810238750331.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-14 19:53:29 -04:00
Mathieu Malaterre f6d85f04e2 blkcg: annotate implicit fall through
There is a plan to build the kernel with -Wimplicit-fallthrough and
this place in the code produced a warning (W=1).

This commit remove the following warning:

  kernel/trace/blktrace.c:725:9: warning: this statement may fall through [-Wimplicit-fallthrough=]

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-13 14:31:12 -06:00
Douglas Anderson 31b265b3ba tracing: kdb: Fix ftdump to not sleep
As reported back in 2016-11 [1], the "ftdump" kdb command triggers a
BUG for "sleeping function called from invalid context".

kdb's "ftdump" command wants to call ring_buffer_read_prepare() in
atomic context.  A very simple solution for this is to add allocation
flags to ring_buffer_read_prepare() so kdb can call it without
triggering the allocation error.  This patch does that.

Note that in the original email thread about this, it was suggested
that perhaps the solution for kdb was to either preallocate the buffer
ahead of time or create our own iterator.  I'm hoping that this
alternative of adding allocation flags to ring_buffer_read_prepare()
can be considered since it means I don't need to duplicate more of the
core trace code into "trace_kdb.c" (for either creating my own
iterator or re-preparing a ring allocator whose memory was already
allocated).

NOTE: another option for kdb is to actually figure out how to make it
reuse the existing ftrace_dump() function and totally eliminate the
duplication.  This sounds very appealing and actually works (the "sr
z" command can be seen to properly dump the ftrace buffer).  The
downside here is that ftrace_dump() fully consumes the trace buffer.
Unless that is changed I'd rather not use it because it means "ftdump
| grep xyz" won't be very useful to search the ftrace buffer since it
will throw away the whole trace on the first grep.  A future patch to
dump only the last few lines of the buffer will also be hard to
implement.

[1] https://lkml.kernel.org/r/20161117191605.GA21459@google.com

Link: http://lkml.kernel.org/r/20190308193205.213659-1-dianders@chromium.org

Reported-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-13 09:46:10 -04:00
Linus Torvalds 5f739e4a49 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Assorted fixes (really no common topic here)"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: Make __vfs_write() static
  vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1
  pipe: stop using ->can_merge
  splice: don't merge into linked buffers
  fs: move generic stat response attr handling to vfs_getattr_nosec
  orangefs: don't reinitialize result_mask in ->getattr
  fs/devpts: always delete dcache dentry-s in dput()
2019-03-12 13:27:20 -07:00
Valdis Klētnieks cede666e2e trace/probes: Remove kernel doc style from non kernel doc comment
CC      kernel/trace/trace_kprobe.o
kernel/trace/trace_kprobe.c:41: warning: cannot understand function prototype: 'struct trace_kprobe '

The real problem is that a comment looked like kerneldoc when it shouldn't be...

Link: http://lkml.kernel.org/r/2812.1552381112@turing-police

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-12 11:23:52 -04:00
Valdis Klētnieks 0841625201 tracing/probes: Make reserved_field_names static
sparse complains:
  CHECK   kernel/trace/trace_probe.c
kernel/trace/trace_probe.c:16:12: warning: symbol 'reserved_field_names' was not declared. Should it be static?

Yes, it should be static.

Link: http://lkml.kernel.org/r/2478.1552380778@turing-police

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-12 10:59:51 -04:00
Linus Torvalds 6cdfa54cd2 The biggest change for this release is in the histogram code.
- Add "onchange(var)" histogram handler that executes a action when $var
    changes.
 
  - Add new "snapshot()" action for histogram handlers, that causes a
    snapshot of the ring buffer when triggered.
    ie. onchange(var).snapshot() will trigger a snapshot if var changes.
 
  - Add alternative for "trace()" action.
    Currently, to trigger a synthetic event, the name of that event is used
    as the handler name, which is inconsistent with the other actions.
    onchange(var).synthetic(param) where it can now be
    onchange(var).trace(synthetic, param). The older method will still be
    allowed, as long as the synthetic events do not overlap with other
    handler names.
 
  - The histogram documentation at testcases were updated for the new
    changes.
 
 Added a quicker way to enable set_ftrace_filter files, that will make
 it much quicker to bisect tracing a function that shouldn't be traced and
 crashes the kernel. (You can echo in numbers to set_ftrace_filter, and it
 will select the corresponding function that is in
 available_filter_functions).
 
 Some better displaying of the tracing data (and more information was added).
 
 The rest are small fixes and more clean ups to the code.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXIXXjRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qrSJAQCbGXAvZE+shfKRhbU1cu1C1nwRMHhH
 eeRecJs1RChGFgD/TwatD4FzARQPjfk7snQD5KWPpoRc9grUACC2cZcaWwQ=
 =LVBI
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The biggest change for this release is in the histogram code:

   - Add "onchange(var)" histogram handler that executes a action when
     $var changes.

   - Add new "snapshot()" action for histogram handlers, that causes a
     snapshot of the ring buffer when triggered. ie.
     onchange(var).snapshot() will trigger a snapshot if var changes.

   - Add alternative for "trace()" action. Currently, to trigger a
     synthetic event, the name of that event is used as the handler
     name, which is inconsistent with the other actions.
     onchange(var).synthetic(param) where it can now be
     onchange(var).trace(synthetic, param). The older method will still
     be allowed, as long as the synthetic events do not overlap with
     other handler names.

   - The histogram documentation at testcases were updated for the new
     changes.

  Outside of the histogram code, we have:

   - Added a quicker way to enable set_ftrace_filter files, that will
     make it much quicker to bisect tracing a function that shouldn't be
     traced and crashes the kernel. (You can echo in numbers to
     set_ftrace_filter, and it will select the corresponding function
     that is in available_filter_functions).

   - Some better displaying of the tracing data (and more information
     was added).

  The rest are small fixes and more clean ups to the code"

* tag 'trace-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (37 commits)
  tracing: Use strncpy instead of memcpy when copying comm in trace.c
  tracing: Use strncpy instead of memcpy when copying comm for hist triggers
  tracing: Use strncpy instead of memcpy for string keys in hist triggers
  tracing: Use str_has_prefix() in synth_event_create()
  x86/ftrace: Fix warning and considate ftrace_jmp_replace() and ftrace_call_replace()
  tracing/perf: Use strndup_user() instead of buggy open-coded version
  doc: trace: Fix documentation for uprobe_profile
  tracing: Fix spelling mistake: "analagous" -> "analogous"
  tracing: Comment why cond_snapshot is checked outside of max_lock protection
  tracing: Add hist trigger action 'expected fail' test case
  tracing: Add alternative synthetic event trace action test case
  tracing: Add hist trigger onchange() handler test case
  tracing: Add hist trigger snapshot() action test case
  tracing: Add SPDX license GPL-2.0 license identifier to inter-event testcases
  tracing: Add alternative synthetic event trace action syntax
  tracing: Add hist trigger onchange() handler Documentation
  tracing: Add hist trigger onchange() handler
  tracing: Add hist trigger snapshot() action Documentation
  tracing: Add hist trigger snapshot() action
  tracing: Add conditional snapshot
  ...
2019-03-11 17:01:32 -07:00
Linus Torvalds ffd602eb46 Kbuild updates for v5.1
- do not generate unneeded top-level built-in.a
 
  - let git ignore O= directory entirely
 
  - optimize scripts/kallsyms slightly
 
  - exclude DWARF info from *.s regardless of config options
 
  - fix GCC toolchain search path for Clang to prepare ld.lld support
 
  - do not generate modules.order when CONFIG_MODULES is disabled
 
  - simplify single target rules and remove VPATH for external module build
 
  - allow to add optional flags to dpkg-buildpackage when building deb-pkg
 
  - move some compiler option tests from Makefile to Kconfig
 
  - various Makefile cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJcgxYUAAoJED2LAQed4NsGr7YQAJq4LmN/aZDI9Mt0YAQjEyyA
 PCpm8J2HI9HO1sMoY7J/ksWmV0BU25G+uspKD7dXAQo3l9fmahQM5e4dsyZ4Xqs8
 DyyYSGtJJnMJaWmupIZNA4UKDCVtwPoVW8YeuK9rwADVokCux9avogof9O1OoA/E
 Pylo+I4UCM82kbpZSd+UxnCx6B0v8XGtW+d31Q4yZXCkw5nw14chrlaprcqB3UgB
 +7C3xOnDWCi7gyxaTqmD7dLay2DM8KCDlznEvBL733Y/cK3to1fywzEPzp0JQCLX
 BLgmmpW13NF++q5BCoTW6sFjZAhBVbiYZwesMrCi75Y32T8zt4G5l4pkvGkSuGF/
 UQh5aoCxaMIp70VPj/loZ0lh78nwVGTok9zRb0rfztM0X4DbmiPi5MNiHRzRpIeE
 1jjEa/GK1t0TDnXc/MuDFK8cWwdhttIqUL5yWfAxjXbtP27eLtsopQUdW7EPHs7d
 sMnfuSUuhOC28yByVxIkBcwawLyYrcWRphJ3ixCO70CoJWt2DT6aOKxcFJefoJix
 Pto6Oo3oQ4iypMM5M9/0Uo+AK2TKRejWIqtZdbo+ir70tNxVH3WDZq++fG0drXOB
 r2I/GY6nRjuzLOe2jzEqywFTFd2xpk4Qo84LGb1R3U6aU5qS2gA0W/q00JS5c2qU
 R8uReJ7bvmLmrVNZ/NI4
 =y9YG
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - do not generate unneeded top-level built-in.a

 - let git ignore O= directory entirely

 - optimize scripts/kallsyms slightly

 - exclude DWARF info from *.s regardless of config options

 - fix GCC toolchain search path for Clang to prepare ld.lld support

 - do not generate modules.order when CONFIG_MODULES is disabled

 - simplify single target rules and remove VPATH for external module
   build

 - allow to add optional flags to dpkg-buildpackage when building
   deb-pkg

 - move some compiler option tests from Makefile to Kconfig

 - various Makefile cleanups

* tag 'kbuild-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits)
  kbuild: remove scripts/basic/% build target
  kbuild: use -Werror=implicit-... instead of -Werror-implicit-...
  kbuild: clean up scripts/gcc-version.sh
  kbuild: remove cc-version macro
  kbuild: update comment block of scripts/clang-version.sh
  kbuild: remove commented-out INITRD_COMPRESS
  kbuild: move -gsplit-dwarf, -gdwarf-4 option tests to Kconfig
  kbuild: [bin]deb-pkg: add DPKG_FLAGS variable
  kbuild: move ".config not found!" message from Kconfig to Makefile
  kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing
  kbuild: simplify single target rules
  kbuild: remove empty rules for makefiles
  kbuild: make -r/-R effective in top Makefile for old Make versions
  kbuild: move tools_silent to a more relevant place
  kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig
  kbuild: refactor cc-cross-prefix implementation
  kbuild: hardcode genksyms path and remove GENKSYMS variable
  scripts/gdb: refactor rules for symlink creation
  kbuild: create symlink to vmlinux-gdb.py in scripts_gdb target
  scripts/gdb: do not descend into scripts/gdb from scripts
  ...
2019-03-10 17:48:21 -07:00
Linus Torvalds bdfa15f1a3 One small fix and one clean up
A small fix Pavel sent me back in august was accidentally lost due to it
 being placed with some other patches that failed some tests, and was rebased
 out of my local tree. Which was a regression that caused event filters
 not to handle negative numbers.
 
 The clean up is from Masami that realized that the code in kprobes that
 calls probe_mem_read() wrapper, which is to be used in code used by both
 kprobes and uprobes, was only in code for kprobes. It should not use the
 wrapper there, but instead call probe_kernel_read() directly.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXH2gZRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qrTyAQC+dfxfyS32BWpGOwZbUIHGYYHtnpEB
 7bIQzXy2q8r4YQD/UGJ0qoHur1gSMEIVjhIM0Qow4VepqqhthIAV/1iVbgU=
 =F9yc
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.0-pre' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix/cleanup from Steven Rostedt:
 "This is a "pre-pull". It's only one small fix and one small clean up.
  I'm testing a few small patches for my real pull request which will
  come at a later time. The second patch depends on your tree anyway so
  I included it along with the urgent fix.

  A small fix Pavel sent me back in august was accidentally lost due to
  it being placed with some other patches that failed some tests, and
  was rebased out of my local tree. Which was a regression that caused
  event filters not to handle negative numbers.

  The clean up is from Masami that realized that the code in kprobes
  that calls probe_mem_read() wrapper, which is to be used in code used
  by both kprobes and uprobes, was only in code for kprobes. It should
  not use the wrapper there, but instead call probe_kernel_read()
  directly"

* tag 'trace-v5.0-pre' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/kprobes: Use probe_kernel_read instead of probe_mem_read
  tracing: Fix event filters and triggers to handle negative numbers
2019-03-07 09:55:56 -08:00
Tom Zanussi 85f726a35e tracing: Use strncpy instead of memcpy when copying comm in trace.c
Because there may be random garbage beyond a string's null terminator,
code that might use the entire comm array e.g. histogram keys, can
give unexpected results if that garbage is copied in too, so avoid
that possibility by using strncpy instead of memcpy.

Link: http://lkml.kernel.org/r/1d6ebac26570c2a29ce9fb575379f17ef5c8b81b.1551802084.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-05 12:14:42 -05:00
Tom Zanussi 27242c62b1 tracing: Use strncpy instead of memcpy when copying comm for hist triggers
Because there may be random garbage beyond a string's null terminator,
code that might use the entire comm array e.g. histogram keys, can
give unexpected results if that garbage is copied in too, so avoid
that possibility by using strncpy instead of memcpy.

Link: http://lkml.kernel.org/r/1eb9f096a8086c3c82c7fc087c900005143cec54.1551802084.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-05 12:14:28 -05:00
Tom Zanussi 9f0bbf3115 tracing: Use strncpy instead of memcpy for string keys in hist triggers
Because there may be random garbage beyond a string's null terminator,
it's not correct to copy the the complete character array for use as a
hist trigger key.  This results in multiple histogram entries for the
'same' string key.

So, in the case of a string key, use strncpy instead of memcpy to
avoid copying in the extra bytes.

Before, using the gdbus entries in the following hist trigger as an
example:

  # echo 'hist:key=comm' > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
  # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist

  ...

  { comm: ImgDecoder #4                      } hitcount:        203
  { comm: gmain                              } hitcount:        213
  { comm: gmain                              } hitcount:        216
  { comm: StreamTrans #73                    } hitcount:        221
  { comm: mozStorage #3                      } hitcount:        230
  { comm: gdbus                              } hitcount:        233
  { comm: StyleThread#5                      } hitcount:        253
  { comm: gdbus                              } hitcount:        256
  { comm: gdbus                              } hitcount:        260
  { comm: StyleThread#4                      } hitcount:        271

  ...

  # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist | egrep gdbus | wc -l
  51

After:

  # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist | egrep gdbus | wc -l
  1

Link: http://lkml.kernel.org/r/50c35ae1267d64eee975b8125e151e600071d4dc.1549309756.git.tom.zanussi@linux.intel.com

Cc: Namhyung Kim <namhyung@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 79e577cbce ("tracing: Support string type key properly")
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-05 08:47:46 -05:00
Tom Zanussi ed581aaf99 tracing: Use str_has_prefix() in synth_event_create()
Since we now have a str_has_prefix() that returns the length, we can
use that instead of explicitly calculating it.

Link: http://lkml.kernel.org/r/03418373fd1e80030e7394b8e3e081c5de28a710.1549309756.git.tom.zanussi@linux.intel.com

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-05 08:47:46 -05:00
Masami Hiramatsu 49ef5f4570 tracing/kprobes: Use probe_kernel_read instead of probe_mem_read
Use probe_kernel_read() instead of probe_mem_read() because
probe_mem_read() is a kind of wrapper for switching memory
read function between uprobes and kprobes.

Link: http://lkml.kernel.org/r/20190222011643.3e19ade84a3db3e83518648f@kernel.org

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-01 16:18:15 -05:00
Pavel Tikhomirov 6a072128d2 tracing: Fix event filters and triggers to handle negative numbers
Then tracing syscall exit event it is extremely useful to filter exit
codes equal to some negative value, to react only to required errors.
But negative numbers does not work:

[root@snorch sys_exit_read]# echo "ret == -1" > filter
bash: echo: write error: Invalid argument
[root@snorch sys_exit_read]# cat filter
ret == -1
        ^
parse_error: Invalid value (did you forget quotes)?

Similar thing happens when setting triggers.

These is a regression in v4.17 introduced by the commit mentioned below,
testing without these commit shows no problem with negative numbers.

Link: http://lkml.kernel.org/r/20180823102534.7642-1-ptikhomirov@virtuozzo.com

Cc: stable@vger.kernel.org
Fixes: 80765597bc ("tracing: Rewrite filter logic to be simpler and faster")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-01 16:11:09 -05:00
Ingo Molnar 9ed8f1a6e7 Merge branch 'linus' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-28 08:27:17 +01:00
Masahiro Yamada b303c6df80 kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig
Since -Wmaybe-uninitialized was introduced by GCC 4.7, we have patched
various false positives:

 - commit e74fc973b6 ("Turn off -Wmaybe-uninitialized when building
   with -Os") turned off this option for -Os.

 - commit 815eb71e71 ("Kbuild: disable 'maybe-uninitialized' warning
   for CONFIG_PROFILE_ALL_BRANCHES") turned off this option for
   CONFIG_PROFILE_ALL_BRANCHES

 - commit a76bcf557e ("Kbuild: enable -Wmaybe-uninitialized warning
   for "make W=1"") turned off this option for GCC < 4.9
   Arnd provided more explanation in https://lkml.org/lkml/2017/3/14/903

I think this looks better by shifting the logic from Makefile to Kconfig.

Link: https://github.com/ClangBuiltLinux/linux/issues/350
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
2019-02-27 21:43:20 +09:00
Jann Horn 83540fbc88 tracing/perf: Use strndup_user() instead of buggy open-coded version
The first version of this method was missing the check for
`ret == PATH_MAX`; then such a check was added, but it didn't call kfree()
on error, so there was still a small memory leak in the error case.
Fix it by using strndup_user() instead of open-coding it.

Link: http://lkml.kernel.org/r/20190220165443.152385-1-jannh@google.com

Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 0eadcc7a7b ("perf/core: Fix perf_uprobe_init()")
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-21 10:35:10 -05:00
Colin Ian King 9e5a36a337 tracing: Fix spelling mistake: "analagous" -> "analogous"
There is a spelling mistake in the mini-howto help text. Fix it.

Link: http://lkml.kernel.org/r/20190217223222.16479-1-colin.king@canonical.com

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20 13:51:08 -05:00
Steven Rostedt (VMware) 1c347a94ca tracing: Comment why cond_snapshot is checked outside of max_lock protection
Before setting tr->cond_snapshot, it must be NULL before it can be updated.
It can go to NULL when a trace event hist trigger is created or removed, and
can only be modified under the max_lock spin lock. But because it can only
be set to something other than NULL under both the max_lock spin lock as
well as the trace_types_lock, we can perform the check if it is not NULL
only under the trace_types_lock and fail out without having to grab the
max_lock spin lock.

This is very subtle, and deserves a comment.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20 13:51:08 -05:00
Tom Zanussi e91eefd731 tracing: Add alternative synthetic event trace action syntax
Add a 'trace(synthetic_event_name, params)' alternative to
synthetic_event_name(params).

Currently, the syntax used for generating synthetic events is to
invoke synthetic_event_name(params) i.e. use the synthetic event name
as a function call.

Users requested a new form that more explicitly shows that the
synthetic event is in effect being traced.  In this version, a new
'trace()' keyword is used, and the synthetic event name is passed in
as the first argument.

In addition, for the sake of consistency with other actions, change
the documention to emphasize the trace() form over the function-call
form, which remains documented as equivalent.

Link: http://lkml.kernel.org/r/d082773e50232a001480cf837679a1e01c1a2eb7.1550100284.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20 13:51:07 -05:00
Tom Zanussi dff81f5592 tracing: Add hist trigger onchange() handler
Add support for a hist:onchange($var) handler, similar to the onmax()
handler but triggering whenever there's any change in $var, not just a
max.

Link: http://lkml.kernel.org/r/dfbc7e4ada242603e9ec3f049b5ad076a07dfd03.1550100284.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20 13:51:07 -05:00
Tom Zanussi a3785b7eca tracing: Add hist trigger snapshot() action
Add support for hist:handlerXXX($var).snapshot(), which will take a
snapshot of the current trace buffer whenever handlerXXX is hit.

As a first user, this also adds snapshot() action support for the
onmax() handler i.e. hist:onmax($var).snapshot().

Also, the hist trigger key printing is moved into a separate function
so the snapshot() action can print a histogram key outside the
histogram display - add and use hist_trigger_print_key() for that
purpose.

Link: http://lkml.kernel.org/r/2f1a952c0dcd8aca8702ce81269581a692396d45.1550100284.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20 13:51:07 -05:00
Tom Zanussi a35873a099 tracing: Add conditional snapshot
Currently, tracing snapshots are context-free - they capture the ring
buffer contents at the time the tracing_snapshot() function was
invoked, and nothing else.  Additionally, they're always taken
unconditionally - the calling code can decide whether or not to take a
snapshot, but the data used to make that decision is kept separately
from the snapshot itself.

This change adds the ability to associate with each trace instance
some user data, along with an 'update' function that can use that data
to determine whether or not to actually take a snapshot.  The update
function can then update that data along with any other state (as part
of the data presumably), if warranted.

Because snapshots are 'global' per-instance, only one user can enable
and use a conditional snapshot for any given trace instance.  To
enable a conditional snapshot (see details in the function and data
structure comments), the user calls tracing_snapshot_cond_enable().
Similarly, to disable a conditional snapshot and free it up for other
users, tracing_snapshot_cond_disable() should be called.

To actually initiate a conditional snapshot, tracing_snapshot_cond()
should be called.  tracing_snapshot_cond() will invoke the update()
callback, allowing the user to decide whether or not to actually take
the snapshot and update the user-defined data associated with the
snapshot.  If the callback returns 'true', tracing_snapshot_cond()
will then actually take the snapshot and return.

This scheme allows for flexibility in snapshot implementations - for
example, by implementing slightly different update() callbacks,
snapshots can be taken in situations where the user is only interested
in taking a snapshot when a new maximum in hit versus when a value
changes in any way at all.  Future patches will demonstrate both
cases.

Link: http://lkml.kernel.org/r/1bea07828d5fd6864a585f83b1eed47ce097eb45.1550100284.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20 13:51:06 -05:00
Tom Zanussi 466f4528fb tracing: Generalize hist trigger onmax and save action
The action refactor code allowed actions and handlers to be separated,
but the existing onmax handler and save action code is still not
flexible enough to handle arbitrary coupling.  This change generalizes
them and in the process makes additional handlers and actions easier
to implement.

The onmax action can be broken up and thought of as two separate
components - a variable to be tracked (the parameter given to the
onmax($var_to_track) function) and an invisible variable created to
save the ongoing result of doing something with that variable, such as
saving the max value of that variable so far seen.

Separating it out like this and renaming it appropriately allows us to
use the same code for similar tracking functions such as
onchange($var_to_track), which would just track the last value seen
rather than the max seen so far, which is useful in some situations.

Additionally, because different handlers and actions may want to save
and access data differently e.g. save and retrieve tracking values as
local variables vs something more global, save_val() and get_val()
interface functions are introduced and max-specific implementations
are used instead.

The same goes for the code that checks whether a maximum has been hit
- a generic check_val() interface and max-checking implementation is
used instead, which allows future patches to make use of he same code
using their own implemetations of similar functionality.

Link: http://lkml.kernel.org/r/980ea73dd8e3f36db3d646f99652f8fed42b77d4.1550100284.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20 13:51:06 -05:00
Tom Zanussi c3e49506a0 tracing: Split up onmatch action data
Currently, the onmatch action data binds the onmatch action to data
related to synthetic event generation.  Since we want to allow the
onmatch handler to potentially invoke a different action, and because
we expect other handlers to generate synthetic events, we need to
separate the data related to these two functions.

Also rename the onmatch data to something more descriptive, and create
and use common action data destroy function.

Link: http://lkml.kernel.org/r/b9abbf9aae69fe3920cdc8ddbcaad544dd258d78.1550100284.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20 13:51:06 -05:00
Tom Zanussi 7d18a10c31 tracing: Refactor hist trigger action code
The hist trigger action code currently implements two essentially
hard-coded pairs of 'actions' - onmax(), which tracks a variable and
saves some event fields when a max is hit, and onmatch(), which is
hard-coded to generate a synthetic event.

These hardcoded pairs (track max/save fields and detect match/generate
synthetic event) should really be decoupled into separate components
that can then be arbitrarily combined.  The first component of each
pair (track max/detect match) is called a 'handler' in the new code,
while the second component (save fields/generate synthetic event) is
called an 'action' in this scheme.

This change refactors the action code to reflect this split by adding
two handlers, HANDLER_ONMATCH and HANDLER_ONMAX, along with two
actions, ACTION_SAVE and ACTION_TRACE.

The new code combines them to produce the existing ONMATCH/TRACE and
ONMAX/SAVE functionality, but doesn't implement the other combinations
now possible.  Future patches will expand these to further useful
cases, such as ONMAX/TRACE, as well as add additional handlers and
actions such as ONCHANGE and SNAPSHOT.

Also, add abbreviated documentation for handlers and actions to
README.

Link: http://lkml.kernel.org/r/98bfdd48c1b4ff29fc5766442f99f5bc3c34b76b.1550100284.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20 13:51:06 -05:00
zhangyi (F) e7f0c424d0 tracing: Do not free iter->trace in fail path of tracing_open_pipe()
Commit d716ff71dd ("tracing: Remove taking of trace_types_lock in
pipe files") use the current tracer instead of the copy in
tracing_open_pipe(), but it forget to remove the freeing sentence in
the error path.

There's an error path that can call kfree(iter->trace) after the iter->trace
was assigned to tr->current_trace, which would be bad to free.

Link: http://lkml.kernel.org/r/1550060946-45984-1-git-send-email-yi.zhang@huawei.com

Cc: stable@vger.kernel.org
Fixes: d716ff71dd ("tracing: Remove taking of trace_types_lock in pipe files")
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-20 13:47:08 -05:00
Linus Torvalds 10f4902173 Two more tracing fixes
- Have kprobes not use copy_from_user() to access kernel addresses,
    because kprobes can legitimately poke at bad kernel memory, which
    will fault. Copy from user code should never fault in kernel space.
    Using probe_mem_read() can handle kernel address space faulting.
 
  - Put back the entries counter in the tracing output that was accidentally
    removed.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXGb7BxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qqvaAQC66gQ79frSW7xPjJ4Y+qLIm0YDV18i
 aCHowAXxDeK3qAEA3sDeELAPVupacPrzZc6zejI+bf0HArPe08n3vlHwAQw=
 =uUQj
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.0-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Two more tracing fixes

   - Have kprobes not use copy_from_user() to access kernel addresses,
     because kprobes can legitimately poke at bad kernel memory, which
     will fault. Copy from user code should never fault in kernel space.
     Using probe_mem_read() can handle kernel address space faulting.

   - Put back the entries counter in the tracing output that was
     accidentally removed"

* tag 'trace-v5.0-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix number of entries in trace header
  kprobe: Do not use uaccess functions to access kernel memory that can fault
2019-02-18 09:40:16 -08:00
Steven Rostedt (VMware) f79b3f3385 ftrace: Allow enabling of filters via index of available_filter_functions
Enabling of large number of functions by echoing in a large subset of the
functions in available_filter_functions can take a very long time. The
process requires testing all functions registered by the function tracer
(which is in the 10s of thousands), and doing a kallsyms lookup to convert
the ip address into a name, then comparing that name with the string passed
in.

When a function causes the function tracer to crash the system, a binary
bisect of the available_filter_functions can be done to find the culprit.
But this requires passing in half of the functions in
available_filter_functions over and over again, which makes it basically a
O(n^2) operation. With 40,000 functions, that ends up bing 1,600,000,000
opertions! And enabling this can take over 20 minutes.

As a quick speed up, if a number is passed into one of the filter files,
instead of doing a search, it just enables the function at the corresponding
line of the available_filter_functions file. That is:

 # echo 50 > set_ftrace_filter
 # cat set_ftrace_filter
 x86_pmu_commit_txn

 # head -50 available_filter_functions | tail -1
 x86_pmu_commit_txn

This allows setting of half the available_filter_functions to take place in
less than a second!

 # time seq 20000 > set_ftrace_filter
 real    0m0.042s
 user    0m0.005s
 sys     0m0.015s

 # wc -l set_ftrace_filter
 20000 set_ftrace_filter

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-15 13:10:09 -05:00
Quentin Perret 9e7382153f tracing: Fix number of entries in trace header
The following commit

  441dae8f2f ("tracing: Add support for display of tgid in trace output")

removed the call to print_event_info() from print_func_help_header_irq()
which results in the ftrace header not reporting the number of entries
written in the buffer. As this wasn't the original intent of the patch,
re-introduce the call to print_event_info() to restore the orginal
behaviour.

Link: http://lkml.kernel.org/r/20190214152950.4179-1-quentin.perret@arm.com

Acked-by: Joel Fernandes <joelaf@google.com>
Cc: stable@vger.kernel.org
Fixes: 441dae8f2f ("tracing: Add support for display of tgid in trace output")
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-15 12:42:26 -05:00
Changbin Du 2c4f1fcbef kprobe: Do not use uaccess functions to access kernel memory that can fault
The userspace can ask kprobe to intercept strings at any memory address,
including invalid kernel address. In this case, fetch_store_strlen()
would crash since it uses general usercopy function, and user access
functions are no longer allowed to access kernel memory.

For example, we can crash the kernel by doing something as below:

$ sudo kprobe 'p:do_sys_open +0(+0(%si)):string'

[  103.620391] BUG: GPF in non-whitelisted uaccess (non-canonical address?)
[  103.622104] general protection fault: 0000 [#1] SMP PTI
[  103.623424] CPU: 10 PID: 1046 Comm: cat Not tainted 5.0.0-rc3-00130-gd73aba1-dirty #96
[  103.625321] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-2-g628b2e6-dirty-20190104_103505-linux 04/01/2014
[  103.628284] RIP: 0010:process_fetch_insn+0x1ab/0x4b0
[  103.629518] Code: 10 83 80 28 2e 00 00 01 31 d2 31 ff 48 8b 74 24 28 eb 0c 81 fa ff 0f 00 00 7f 1c 85 c0 75 18 66 66 90 0f ae e8 48 63
 ca 89 f8 <8a> 0c 31 66 66 90 83 c2 01 84 c9 75 dc 89 54 24 34 89 44 24 28 48
[  103.634032] RSP: 0018:ffff88845eb37ce0 EFLAGS: 00010246
[  103.635312] RAX: 0000000000000000 RBX: ffff888456c4e5a8 RCX: 0000000000000000
[  103.637057] RDX: 0000000000000000 RSI: 2e646c2f6374652f RDI: 0000000000000000
[  103.638795] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  103.640556] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[  103.642297] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  103.644040] FS:  0000000000000000(0000) GS:ffff88846f000000(0000) knlGS:0000000000000000
[  103.646019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  103.647436] CR2: 00007ffc79758038 CR3: 0000000463360006 CR4: 0000000000020ee0
[  103.649147] Call Trace:
[  103.649781]  ? sched_clock_cpu+0xc/0xa0
[  103.650747]  ? do_sys_open+0x5/0x220
[  103.651635]  kprobe_trace_func+0x303/0x380
[  103.652645]  ? do_sys_open+0x5/0x220
[  103.653528]  kprobe_dispatcher+0x45/0x50
[  103.654682]  ? do_sys_open+0x1/0x220
[  103.655875]  kprobe_ftrace_handler+0x90/0xf0
[  103.657282]  ftrace_ops_assist_func+0x54/0xf0
[  103.658564]  ? __call_rcu+0x1dc/0x280
[  103.659482]  0xffffffffc00000bf
[  103.660384]  ? __ia32_sys_open+0x20/0x20
[  103.661682]  ? do_sys_open+0x1/0x220
[  103.662863]  do_sys_open+0x5/0x220
[  103.663988]  do_syscall_64+0x60/0x210
[  103.665201]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  103.666862] RIP: 0033:0x7fc22fadccdd
[  103.668034] Code: 48 89 54 24 e0 41 83 e2 40 75 32 89 f0 25 00 00 41 00 3d 00 00 41 00 74 24 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff
 ff 0f 05 <48> 3d 00 f0 ff ff 77 33 f3 c3 66 0f 1f 84 00 00 00 00 00 48 8d 44
[  103.674029] RSP: 002b:00007ffc7972c3a8 EFLAGS: 00000287 ORIG_RAX: 0000000000000101
[  103.676512] RAX: ffffffffffffffda RBX: 0000562f86147a21 RCX: 00007fc22fadccdd
[  103.678853] RDX: 0000000000080000 RSI: 00007fc22fae1428 RDI: 00000000ffffff9c
[  103.681151] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000
[  103.683489] R10: 0000000000000000 R11: 0000000000000287 R12: 00007fc22fce90a8
[  103.685774] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[  103.688056] Modules linked in:
[  103.689131] ---[ end trace 43792035c28984a1 ]---

This can be fixed by using probe_mem_read() instead, as it can handle faulting
kernel memory addresses, which kprobes can legitimately do.

Link: http://lkml.kernel.org/r/20190125151051.7381-1-changbin.du@gmail.com

Cc: stable@vger.kernel.org
Fixes: 9da3f2b740 ("x86/fault: BUG() when uaccess helpers fault on kernel addresses")
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-15 12:41:23 -05:00
Linus Torvalds b6ea7bcf77 This fixes kprobes/uprobes dynamic processing of strings, where
it processes the args but does not update the remaining length
 of the buffer that the string arguments will be placed in. It
 constantly passes in the total size of buffer used instead of
 passing in the remaining size of the buffer used. This could cause
 issues if the strings are larger than the max size of an event
 which could cause the strings to be written beyond what was reserved
 on the buffer.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXGN7BRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qoa1AQD7+6O0DncGwk5aWqRHESXKlmOWteW6
 eMFbEw3KDcvs2gEAvNLB1i2yVH6Enn50M0KpmYJMbyZK/LVn2QsPZfU/LgQ=
 =KMBZ
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "This fixes kprobes/uprobes dynamic processing of strings, where it
  processes the args but does not update the remaining length of the
  buffer that the string arguments will be placed in. It constantly
  passes in the total size of buffer used instead of passing in the
  remaining size of the buffer used.

  This could cause issues if the strings are larger than the max size of
  an event which could cause the strings to be written beyond what was
  reserved on the buffer"

* tag 'trace-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: probeevent: Correctly update remaining space in dynamic area
2019-02-13 10:28:17 -08:00
Masami Hiramatsu eeeb080bae kprobes: Prohibit probing on hardirq tracers
Since kprobes breakpoint handling involves hardirq tracer,
probing these functions cause breakpoint recursion problem.

Prohibit probing on those functions.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrea Righi <righi.andrea@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/154998802073.31052.17255044712514564153.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-13 08:16:40 +01:00
Andreas Ziegler f6675872db tracing: probeevent: Correctly update remaining space in dynamic area
Commit 9178412ddf ("tracing: probeevent: Return consumed
bytes of dynamic area") improved the string fetching
mechanism by returning the number of required bytes after
copying the argument to the dynamic area. However, this
return value is now only used to increment the pointer
inside the dynamic area but misses updating the 'maxlen'
variable which indicates the remaining space in the dynamic
area.

This means that fetch_store_string() always reads the *total*
size of the dynamic area from the data_loc pointer instead of
the *remaining* size (and passes it along to
strncpy_from_{user,unsafe}) even if we're already about to
copy data into the middle of the dynamic area.

Link: http://lkml.kernel.org/r/20190206190013.16405-1-andreas.ziegler@fau.de

Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 9178412ddf ("tracing: probeevent: Return consumed bytes of dynamic area")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-11 15:58:30 -05:00
Changbin Du 85acbb21b9 tracing: Change the function format to display function names by perf
Here is an example for this change.

$ sudo perf record -e 'ftrace:function' --filter='ip==schedule'
$ sudo perf report

The output of perf before this patch:

\# Samples: 100  of event 'ftrace:function'
\# Event count (approx.): 100
\#
\# Overhead  Trace output
\# ........  ......................................
\#
    51.00%   ffffffff81f6aaa0 <-- ffffffff81158e8d
    29.00%   ffffffff81f6aaa0 <-- ffffffff8116ccb2
     8.00%   ffffffff81f6aaa0 <-- ffffffff81f6f2ed
     4.00%   ffffffff81f6aaa0 <-- ffffffff811628db
     4.00%   ffffffff81f6aaa0 <-- ffffffff81f6ec5b
     2.00%   ffffffff81f6aaa0 <-- ffffffff81f6f21a
     1.00%   ffffffff81f6aaa0 <-- ffffffff811b04af
     1.00%   ffffffff81f6aaa0 <-- ffffffff8143ce17

After this patch:

\# Samples: 36  of event 'ftrace:function'
\# Event count (approx.): 36
\#
\# Overhead  Trace output
\# ........  ............................................
\#
    38.89%   schedule <-- schedule_hrtimeout_range_clock
    27.78%   schedule <-- worker_thread
    13.89%   schedule <-- schedule_timeout
    11.11%   schedule <-- smpboot_thread_fn
     5.56%   schedule <-- rcu_gp_kthread
     2.78%   schedule <-- exit_to_usermode_loop

Link: http://lkml.kernel.org/r/20190209161919.32350-1-changbin.du@gmail.com

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-11 14:53:43 -05:00
Linus Torvalds 27b4ad621e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "This pull request is dedicated to the upcoming snowpocalypse parts 2
  and 3 in the Pacific Northwest:

   1) Drop profiles are broken because some drivers use dev_kfree_skb*
      instead of dev_consume_skb*, from Yang Wei.

   2) Fix IWLWIFI kconfig deps, from Luca Coelho.

   3) Fix percpu maps updating in bpftool, from Paolo Abeni.

   4) Missing station release in batman-adv, from Felix Fietkau.

   5) Fix some networking compat ioctl bugs, from Johannes Berg.

   6) ucc_geth must reset the BQL queue state when stopping the device,
      from Mathias Thore.

   7) Several XDP bug fixes in virtio_net from Toshiaki Makita.

   8) TSO packets must be sent always on queue 0 in stmmac, from Jose
      Abreu.

   9) Fix socket refcounting bug in RDS, from Eric Dumazet.

  10) Handle sparse cpu allocations in bpf selftests, from Martynas
      Pumputis.

  11) Make sure mgmt frames have enough tailroom in mac80211, from Felix
      Feitkau.

  12) Use safe list walking in sctp_sendmsg() asoc list traversal, from
      Greg Kroah-Hartman.

  13) Make DCCP's ccid_hc_[rt]x_parse_options always check for NULL
      ccid, from Eric Dumazet.

  14) Need to reload WoL password into bcmsysport device after deep
      sleeps, from Florian Fainelli.

  15) Remove filter from mask before freeing in cls_flower, from Petr
      Machata.

  16) Missing release and use after free in error paths of s390 qeth
      code, from Julian Wiedmann.

  17) Fix lockdep false positive in dsa code, from Marc Zyngier.

  18) Fix counting of ATU violations in mv88e6xxx, from Andrew Lunn.

  19) Fix EQ firmware assert in qed driver, from Manish Chopra.

  20) Don't default Caivum PTP to Y in kconfig, from Bjorn Helgaas"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  net: dsa: b53: Fix for failure when irq is not defined in dt
  sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
  geneve: should not call rt6_lookup() when ipv6 was disabled
  net: Don't default Cavium PTP driver to 'y'
  net: broadcom: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: via-velocity: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: tehuti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: sun: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: fsl_ucc_hdlc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: fec_mpc52xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: smsc: epic100: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: dscc4: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: tulip: de2104x: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: defxx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net/mlx5e: Don't overwrite pedit action when multiple pedit used
  net/mlx5e: Update hw flows when encap source mac changed
  qed*: Advance drivers version to 8.37.0.20
  qed: Change verbosity for coalescing message.
  qede: Fix system crash on configuring channels.
  qed: Consider TX tcs while deriving the max num_queues for PF.
  ...
2019-02-08 11:21:54 -08:00
Miroslav Benes d325c40296 ring-buffer: Remove unused function ring_buffer_page_len()
Commit 6b7e633fe9 ("tracing: Remove extra zeroing out of the ring
buffer page") removed the only caller of ring_buffer_page_len(). The
function is now unused and may be removed.

Link: http://lkml.kernel.org/r/20181228133847.106177-1-mbenes@suse.cz

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06 11:58:33 -05:00
Changbin Du f52d569f3d tracing: Show stacktrace for wakeup tracers
This align the behavior of wakeup tracers with irqsoff latency tracer
that we record stacktrace at the beginning and end of waking up. The
stacktrace shows us what is happening in the kernel.

Link: http://lkml.kernel.org/r/20190116160249.7554-1-changbin.du@gmail.com

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06 11:56:19 -05:00
Changbin Du afbab501c6 tracing: Put a margin between flags and duration for wakeup tracers
Don't mix context flags with function duration info.

Instead of this:

 # tracer: wakeup_rt
 #
 # wakeup_rt latency trace v1.1.5 on 5.0.0-rc1-test+
 # --------------------------------------------------------------------
 # latency: 177 us, #545/545, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
 #    -----------------
 #    | task: migration/0-11 (uid:0 nice:0 policy:1 rt_prio:99)
 #    -----------------
 #
 #                                       _-----=> irqs-off
 #                                      / _----=> need-resched
 #                                     | / _---=> hardirq/softirq
 #                                     || / _--=> preempt-depth
 #                                     ||| /
 #   REL TIME      CPU  TASK/PID       ||||  DURATION                  FUNCTION CALLS
 #      |          |     |    |        ||||   |   |                     |   |   |   |
         0 us |   0)    <idle>-0    |  dNh5              |  /*      0:120:R   + [000]    11:  0:R migration/0 */
         2 us |   0)    <idle>-0    |  dNh5  0.000 us    |            (null)();
         4 us |   0)    <idle>-0    |  dNh4              |  _raw_spin_unlock() {
         4 us |   0)    <idle>-0    |  dNh4  0.304 us    |    preempt_count_sub();
         5 us |   0)    <idle>-0    |  dNh3  1.063 us    |  }
         5 us |   0)    <idle>-0    |  dNh3  0.266 us    |  ttwu_stat();
         6 us |   0)    <idle>-0    |  dNh3              |  _raw_spin_unlock_irqrestore() {
         6 us |   0)    <idle>-0    |  dNh3  0.273 us    |    preempt_count_sub();
         6 us |   0)    <idle>-0    |  dNh2  0.818 us    |  }

Show this:

 # tracer: wakeup
 #
 # wakeup latency trace v1.1.5 on 4.20.0+
 # --------------------------------------------------------------------
 # latency: 593 us, #674/674, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
 #    -----------------
 #    | task: kworker/0:1H-339 (uid:0 nice:-20 policy:0 rt_prio:0)
 #    -----------------
 #
 #                                      _-----=> irqs-off
 #                                     / _----=> need-resched
 #                                    | / _---=> hardirq/softirq
 #                                    || / _--=> preempt-depth
 #                                    ||| /
 #  REL TIME      CPU  TASK/PID       ||||     DURATION                  FUNCTION CALLS
 #     |          |     |    |        ||||      |   |                     |   |   |   |
        0 us |   0)    <idle>-0    |  dNs. |               |  /*      0:120:R   + [000]   339💯R kworker/0:1H */
        3 us |   0)    <idle>-0    |  dNs. |   0.000 us    |            (null)();
       67 us |   0)    <idle>-0    |  dNs. |   0.721 us    |  ttwu_stat();
       69 us |   0)    <idle>-0    |  dNs. |   0.607 us    |  _raw_spin_unlock_irqrestore();
       71 us |   0)    <idle>-0    |  .Ns. |   0.598 us    |  _raw_spin_lock_irq();
       72 us |   0)    <idle>-0    |  .Ns. |   0.584 us    |  _raw_spin_lock_irq();
       73 us |   0)    <idle>-0    |  dNs. | + 11.118 us   |  __next_timer_interrupt();
       75 us |   0)    <idle>-0    |  dNs. |               |  call_timer_fn() {
       76 us |   0)    <idle>-0    |  dNs. |               |    delayed_work_timer_fn() {
       76 us |   0)    <idle>-0    |  dNs. |               |      __queue_work() {
       ...

Link: http://lkml.kernel.org/r/20190101154614.8887-4-changbin.du@gmail.com

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06 11:56:19 -05:00
Changbin Du 97f0a3bcdf tracing: Show more info for funcgraph wakeup tracers
Add these info fields to funcgraph wakeup tracers:
  o Show CPU info since the waker could be on a different CPU.
  o Show function duration and overhead.
  o Show IRQ markers.

Link: http://lkml.kernel.org/r/20190101154614.8887-3-changbin.du@gmail.com

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06 11:56:19 -05:00
Steven Rostedt (VMware) 6c6dbce196 tracing: Add comment to predicate_parse() about "&&" or "||"
As the predicat_parse() code is rather complex, commenting subtleties is
important. The switch case statement should be commented to describe that it
is only looking for two '&' or '|' together, which is why the fall through
to an error is after the check.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06 11:56:19 -05:00
Mathieu Malaterre 9399ca21d2 tracing: Annotate implicit fall through in predicate_parse()
There is a plan to build the kernel with -Wimplicit-fallthrough and
this place in the code produced a warning (W=1).

This commit remove the following warning:

  kernel/trace/trace_events_filter.c:494:8: warning: this statement may fall through [-Wimplicit-fallthrough=]

Link: http://lkml.kernel.org/r/20190114203039.16535-2-malat@debian.org

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06 11:56:18 -05:00
Mathieu Malaterre 91457c018f tracing: Annotate implicit fall through in parse_probe_arg()
There is a plan to build the kernel with -Wimplicit-fallthrough and
this place in the code produced a warning (W=1).

This commit remove the following warning:

  kernel/trace/trace_probe.c:302:6: warning: this statement may fall through [-Wimplicit-fallthrough=]

Link: http://lkml.kernel.org/r/20190114203039.16535-1-malat@debian.org

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06 11:56:18 -05:00
Changbin Du 9acd8de69d function_graph: Support displaying relative timestamp
When function_graph is used for latency tracers, relative timestamp
is more straightforward than absolute timestamp as function trace
does. This change adds relative timestamp support to function_graph
and applies to latency tracers (wakeup and irqsoff).

Instead of:

 # tracer: irqsoff
 #
 # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test
 # --------------------------------------------------------------------
 # latency: 521 us, #1125/1125, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
 #    -----------------
 #    | task: swapper/2-0 (uid:0 nice:0 policy:0 rt_prio:0)
 #    -----------------
 #  => started at: __schedule
 #  => ended at:   _raw_spin_unlock_irq
 #
 #
 #                                       _-----=> irqs-off
 #                                      / _----=> need-resched
 #                                     | / _---=> hardirq/softirq
 #                                     || / _--=> preempt-depth
 #                                     ||| /
 #     TIME        CPU  TASK/PID       ||||  DURATION                  FUNCTION CALLS
 #      |          |     |    |        ||||   |   |                     |   |   |   |
   124.974306 |   2)  systemd-693   |  d..1  0.000 us    |  __schedule();
   124.974307 |   2)  systemd-693   |  d..1              |    rcu_note_context_switch() {
   124.974308 |   2)  systemd-693   |  d..1  0.487 us    |      rcu_preempt_deferred_qs();
   124.974309 |   2)  systemd-693   |  d..1  0.451 us    |      rcu_qs();
   124.974310 |   2)  systemd-693   |  d..1  2.301 us    |    }
[..]
   124.974826 |   2)    <idle>-0    |  d..2              |  finish_task_switch() {
   124.974826 |   2)    <idle>-0    |  d..2              |    _raw_spin_unlock_irq() {
   124.974827 |   2)    <idle>-0    |  d..2  0.000 us    |  _raw_spin_unlock_irq();
   124.974828 |   2)    <idle>-0    |  d..2  0.000 us    |  tracer_hardirqs_on();
   <idle>-0       2d..2  552us : <stack trace>
  => __schedule
  => schedule_idle
  => do_idle
  => cpu_startup_entry
  => start_secondary
  => secondary_startup_64

Show:

 # tracer: irqsoff
 #
 # irqsoff latency trace v1.1.5 on 5.0.0-rc1-test+
 # --------------------------------------------------------------------
 # latency: 511 us, #1053/1053, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
 #    -----------------
 #    | task: swapper/7-0 (uid:0 nice:0 policy:0 rt_prio:0)
 #    -----------------
 #  => started at: __schedule
 #  => ended at:   _raw_spin_unlock_irq
 #
 #
 #                                       _-----=> irqs-off
 #                                      / _----=> need-resched
 #                                     | / _---=> hardirq/softirq
 #                                     || / _--=> preempt-depth
 #                                     ||| /
 #   REL TIME      CPU  TASK/PID       ||||  DURATION                  FUNCTION CALLS
 #      |          |     |    |        ||||   |   |                     |   |   |   |
         0 us |   7)   sshd-1704    |  d..1  0.000 us    |  __schedule();
         1 us |   7)   sshd-1704    |  d..1              |    rcu_note_context_switch() {
         1 us |   7)   sshd-1704    |  d..1  0.611 us    |      rcu_preempt_deferred_qs();
         2 us |   7)   sshd-1704    |  d..1  0.484 us    |      rcu_qs();
         3 us |   7)   sshd-1704    |  d..1  2.599 us    |    }
[..]
       509 us |   7)    <idle>-0    |  d..2              |  finish_task_switch() {
       510 us |   7)    <idle>-0    |  d..2              |    _raw_spin_unlock_irq() {
       510 us |   7)    <idle>-0    |  d..2  0.000 us    |  _raw_spin_unlock_irq();
       512 us |   7)    <idle>-0    |  d..2  0.000 us    |  tracer_hardirqs_on();
   <idle>-0       7d..2  543us : <stack trace>
  => __schedule
  => schedule_idle
  => do_idle
  => cpu_startup_entry
  => start_secondary
  => secondary_startup_64

Link: http://lkml.kernel.org/r/20190101154614.8887-2-changbin.du@gmail.com

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06 11:56:18 -05:00
Jann Horn 01e7187b41 pipe: stop using ->can_merge
Al Viro pointed out that since there is only one pipe buffer type to which
new data can be appended, it isn't necessary to have a ->can_merge field in
struct pipe_buf_operations, we can just check for a magic type.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-01 02:01:45 -05:00
Alexei Starovoitov e16ec34039 bpf: fix potential deadlock in bpf_prog_register
Lockdep found a potential deadlock between cpu_hotplug_lock, bpf_event_mutex, and cpuctx_mutex:
[   13.007000] WARNING: possible circular locking dependency detected
[   13.007587] 5.0.0-rc3-00018-g2fa53f892422-dirty #477 Not tainted
[   13.008124] ------------------------------------------------------
[   13.008624] test_progs/246 is trying to acquire lock:
[   13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300
[   13.009770]
[   13.009770] but task is already holding lock:
[   13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[   13.010877]
[   13.010877] which lock already depends on the new lock.
[   13.010877]
[   13.011532]
[   13.011532] the existing dependency chain (in reverse order) is:
[   13.012129]
[   13.012129] -> #4 (bpf_event_mutex){+.+.}:
[   13.012582]        perf_event_query_prog_array+0x9b/0x130
[   13.013016]        _perf_ioctl+0x3aa/0x830
[   13.013354]        perf_ioctl+0x2e/0x50
[   13.013668]        do_vfs_ioctl+0x8f/0x6a0
[   13.014003]        ksys_ioctl+0x70/0x80
[   13.014320]        __x64_sys_ioctl+0x16/0x20
[   13.014668]        do_syscall_64+0x4a/0x180
[   13.015007]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.015469]
[   13.015469] -> #3 (&cpuctx_mutex){+.+.}:
[   13.015910]        perf_event_init_cpu+0x5a/0x90
[   13.016291]        perf_event_init+0x1b2/0x1de
[   13.016654]        start_kernel+0x2b8/0x42a
[   13.016995]        secondary_startup_64+0xa4/0xb0
[   13.017382]
[   13.017382] -> #2 (pmus_lock){+.+.}:
[   13.017794]        perf_event_init_cpu+0x21/0x90
[   13.018172]        cpuhp_invoke_callback+0xb3/0x960
[   13.018573]        _cpu_up+0xa7/0x140
[   13.018871]        do_cpu_up+0xa4/0xc0
[   13.019178]        smp_init+0xcd/0xd2
[   13.019483]        kernel_init_freeable+0x123/0x24f
[   13.019878]        kernel_init+0xa/0x110
[   13.020201]        ret_from_fork+0x24/0x30
[   13.020541]
[   13.020541] -> #1 (cpu_hotplug_lock.rw_sem){++++}:
[   13.021051]        static_key_slow_inc+0xe/0x20
[   13.021424]        tracepoint_probe_register_prio+0x28c/0x300
[   13.021891]        perf_trace_event_init+0x11f/0x250
[   13.022297]        perf_trace_init+0x6b/0xa0
[   13.022644]        perf_tp_event_init+0x25/0x40
[   13.023011]        perf_try_init_event+0x6b/0x90
[   13.023386]        perf_event_alloc+0x9a8/0xc40
[   13.023754]        __do_sys_perf_event_open+0x1dd/0xd30
[   13.024173]        do_syscall_64+0x4a/0x180
[   13.024519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.024968]
[   13.024968] -> #0 (tracepoints_mutex){+.+.}:
[   13.025434]        __mutex_lock+0x86/0x970
[   13.025764]        tracepoint_probe_register_prio+0x2d/0x300
[   13.026215]        bpf_probe_register+0x40/0x60
[   13.026584]        bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[   13.027042]        __do_sys_bpf+0x94f/0x1a90
[   13.027389]        do_syscall_64+0x4a/0x180
[   13.027727]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.028171]
[   13.028171] other info that might help us debug this:
[   13.028171]
[   13.028807] Chain exists of:
[   13.028807]   tracepoints_mutex --> &cpuctx_mutex --> bpf_event_mutex
[   13.028807]
[   13.029666]  Possible unsafe locking scenario:
[   13.029666]
[   13.030140]        CPU0                    CPU1
[   13.030510]        ----                    ----
[   13.030875]   lock(bpf_event_mutex);
[   13.031166]                                lock(&cpuctx_mutex);
[   13.031645]                                lock(bpf_event_mutex);
[   13.032135]   lock(tracepoints_mutex);
[   13.032441]
[   13.032441]  *** DEADLOCK ***
[   13.032441]
[   13.032911] 1 lock held by test_progs/246:
[   13.033239]  #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[   13.033909]
[   13.033909] stack backtrace:
[   13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #477
[   13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[   13.035657] Call Trace:
[   13.035859]  dump_stack+0x5f/0x8b
[   13.036130]  print_circular_bug.isra.37+0x1ce/0x1db
[   13.036526]  __lock_acquire+0x1158/0x1350
[   13.036852]  ? lock_acquire+0x98/0x190
[   13.037154]  lock_acquire+0x98/0x190
[   13.037447]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.037876]  __mutex_lock+0x86/0x970
[   13.038167]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.038600]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.039028]  ? __mutex_lock+0x86/0x970
[   13.039337]  ? __mutex_lock+0x24a/0x970
[   13.039649]  ? bpf_probe_register+0x1d/0x60
[   13.039992]  ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10
[   13.040478]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.040906]  tracepoint_probe_register_prio+0x2d/0x300
[   13.041325]  bpf_probe_register+0x40/0x60
[   13.041649]  bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[   13.042068]  ? __might_fault+0x3e/0x90
[   13.042374]  __do_sys_bpf+0x94f/0x1a90
[   13.042678]  do_syscall_64+0x4a/0x180
[   13.042975]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.043382] RIP: 0033:0x7f23b10a07f9
[   13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[   13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9
[   13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011
[   13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10
[   13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000

Since tracepoints_mutex will be taken in tracepoint_probe_register/unregister()
there is no need to take bpf_event_mutex too.
bpf_event_mutex is protecting modifications to prog array used in kprobe/perf bpf progs.
bpf_raw_tracepoints don't need to take this mutex.

Fixes: c4f6699dfc ("bpf: introduce BPF_RAW_TRACEPOINT")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-31 23:18:21 +01:00
Arnaldo Carvalho de Melo 5620196951 perf: Make perf_event_output() propagate the output() return
For the original mode of operation it isn't needed, since we report back
errors via PERF_RECORD_LOST records in the ring buffer, but for use in
bpf_perf_event_output() it is convenient to return the errors, basically
-ENOSPC.

Currently bpf_perf_event_output() returns an error indication, the last
thing it does, which is to push it to the ring buffer is that can fail
and if so, this failure won't be reported back to its users, fix it.

Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/r/20190118150938.GN5823@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21 17:00:57 -03:00