Commit Graph

933724 Commits

Author SHA1 Message Date
Masami Hiramatsu 80a6e707dd kprobes: Remove show_registers() function prototype
Remove show_registers() function prototype because this function
has been renamed by commit 57da8b960b ("x86: Avoid double stack
traces with show_regs()"), and commit 80006dbee6 ("kprobes/x86:
Remove jprobe implementation") has removed the caller in kprobes.
So this doesn't exist anymore.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-08-03 16:21:12 -04:00
Peng Fan 231621d0c5 tracing/uprobe: Remove dead code in trace_uprobe_register()
In the function trace_uprobe_register(), the statement "return 0;"
out of switch case is dead code, remove it.

Link: https://lkml.kernel.org/r/1595561064-29186-1-git-send-email-fanpeng@loongson.cn

Signed-off-by: Peng Fan <fanpeng@loongson.cn>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-08-03 16:16:46 -04:00
Muchun Song 0cb2f1372b kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler
We found a case of kernel panic on our server. The stack trace is as
follows(omit some irrelevant information):

  BUG: kernel NULL pointer dereference, address: 0000000000000080
  RIP: 0010:kprobe_ftrace_handler+0x5e/0xe0
  RSP: 0018:ffffb512c6550998 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: ffff8e9d16eea018 RCX: 0000000000000000
  RDX: ffffffffbe1179c0 RSI: ffffffffc0535564 RDI: ffffffffc0534ec0
  RBP: ffffffffc0534ec1 R08: ffff8e9d1bbb0f00 R09: 0000000000000004
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  R13: ffff8e9d1f797060 R14: 000000000000bacc R15: ffff8e9ce13eca00
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000080 CR3: 00000008453d0005 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <IRQ>
   ftrace_ops_assist_func+0x56/0xe0
   ftrace_call+0x5/0x34
   tcpa_statistic_send+0x5/0x130 [ttcp_engine]

The tcpa_statistic_send is the function being kprobed. After analysis,
the root cause is that the fourth parameter regs of kprobe_ftrace_handler
is NULL. Why regs is NULL? We use the crash tool to analyze the kdump.

  crash> dis tcpa_statistic_send -r
         <tcpa_statistic_send>: callq 0xffffffffbd8018c0 <ftrace_caller>

The tcpa_statistic_send calls ftrace_caller instead of ftrace_regs_caller.
So it is reasonable that the fourth parameter regs of kprobe_ftrace_handler
is NULL. In theory, we should call the ftrace_regs_caller instead of the
ftrace_caller. After in-depth analysis, we found a reproducible path.

  Writing a simple kernel module which starts a periodic timer. The
  timer's handler is named 'kprobe_test_timer_handler'. The module
  name is kprobe_test.ko.

  1) insmod kprobe_test.ko
  2) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}'
  3) echo 0 > /proc/sys/kernel/ftrace_enabled
  4) rmmod kprobe_test
  5) stop step 2) kprobe
  6) insmod kprobe_test.ko
  7) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}'

We mark the kprobe as GONE but not disarm the kprobe in the step 4).
The step 5) also do not disarm the kprobe when unregister kprobe. So
we do not remove the ip from the filter. In this case, when the module
loads again in the step 6), we will replace the code to ftrace_caller
via the ftrace_module_enable(). When we register kprobe again, we will
not replace ftrace_caller to ftrace_regs_caller because the ftrace is
disabled in the step 3). So the step 7) will trigger kernel panic. Fix
this problem by disarming the kprobe when the module is going away.

Link: https://lkml.kernel.org/r/20200728064536.24405-1-songmuchun@bytedance.com

Cc: stable@vger.kernel.org
Fixes: ae6aa16fdc ("kprobes: introduce ftrace based optimization")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-08-03 16:14:54 -04:00
Josef Bacik c58b6b0372 ftrace: Fix ftrace_trace_task return value
I was attempting to use pid filtering with function_graph, but it wasn't
allowing anything to make it through.  Turns out ftrace_trace_task
returns false if ftrace_ignore_pid is not-empty, which isn't correct
anymore.  We're now setting it to FTRACE_PID_IGNORE if we need to ignore
that pid, otherwise it's set to the pid (which is weird considering the
name) or to FTRACE_PID_TRACE.  Fix the check to check for !=
FTRACE_PID_IGNORE.  With this we can now use function_graph with pid
filtering.

Link: https://lkml.kernel.org/r/20200725005048.1790-1-josef@toxicpanda.com

Fixes: 717e3f5ebc ("ftrace: Make function trace pid filtering a bit more exact")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-08-03 16:12:31 -04:00
Nick Desaulniers 1c39d761ff tracepoint: Use __used attribute definitions from compiler_attributes.h
Just a small cleanup while I was touching this header.
compiler_attributes.h does feature detection of these __attributes__(())
and provides more concise ways to invoke them.

Link: https://lkml.kernel.org/r/20200730224555.2142154-3-ndesaulniers@google.com

Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-08-03 13:34:37 -04:00
Nick Desaulniers f3751ad011 tracepoint: Mark __tracepoint_string's __used
__tracepoint_string's have their string data stored in .rodata, and an
address to that data stored in the "__tracepoint_str" section. Functions
that refer to those strings refer to the symbol of the address. Compiler
optimization can replace those address references with references
directly to the string data. If the address doesn't appear to have other
uses, then it appears dead to the compiler and is removed. This can
break the /tracing/printk_formats sysfs node which iterates the
addresses stored in the "__tracepoint_str" section.

Like other strings stored in custom sections in this header, mark these
__used to inform the compiler that there are other non-obvious users of
the address, so they should still be emitted.

Link: https://lkml.kernel.org/r/20200730224555.2142154-2-ndesaulniers@google.com

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: stable@vger.kernel.org
Fixes: 102c9323c3 ("tracing: Add __tracepoint_string() to export string pointers")
Reported-by: Tim Murray <timmurray@google.com>
Reported-by: Simon MacMullen <simonmacm@google.com>
Suggested-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-08-03 13:34:04 -04:00
Zhaoyang Huang 0f69dae4d1 trace : Have tracing buffer info use kvzalloc instead of kzalloc
High order memory stuff within trace could introduce OOM, use kvzalloc instead.

Please find the bellowing for the call stack we run across in an android system.
The scenario happens when traced_probes is woken up to get a large quantity of
trace even if free memory is even higher than watermark_low. 

traced_probes invoked oom-killer: gfp_mask=0x140c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),  order=2, oom_score_adj=-1

traced_probes cpuset=system-background mems_allowed=0
CPU: 3 PID: 588 Comm: traced_probes Tainted: G        W  O    4.14.181 #1
Hardware name: Generic DT based system
(unwind_backtrace) from [<c010d824>] (show_stack+0x20/0x24)
(show_stack) from [<c0b2e174>] (dump_stack+0xa8/0xec)
(dump_stack) from [<c027d584>] (dump_header+0x9c/0x220)
(dump_header) from [<c027cfe4>] (oom_kill_process+0xc0/0x5c4)
(oom_kill_process) from [<c027cb94>] (out_of_memory+0x220/0x310)
(out_of_memory) from [<c02816bc>] (__alloc_pages_nodemask+0xff8/0x13a4)
(__alloc_pages_nodemask) from [<c02a6a1c>] (kmalloc_order+0x30/0x48)
(kmalloc_order) from [<c02a6a64>] (kmalloc_order_trace+0x30/0x118)
(kmalloc_order_trace) from [<c0223d7c>] (tracing_buffers_open+0x50/0xfc)
(tracing_buffers_open) from [<c02e6f58>] (do_dentry_open+0x278/0x34c)
(do_dentry_open) from [<c02e70d0>] (vfs_open+0x50/0x70)
(vfs_open) from [<c02f7c24>] (path_openat+0x5fc/0x169c)
(path_openat) from [<c02f75c4>] (do_filp_open+0x94/0xf8)
(do_filp_open) from [<c02e7650>] (do_sys_open+0x168/0x26c)
(do_sys_open) from [<c02e77bc>] (SyS_openat+0x34/0x38)
(SyS_openat) from [<c0108bc0>] (ret_fast_syscall+0x0/0x28)

Link: https://lkml.kernel.org/r/1596155265-32365-1-git-send-email-zhaoyang.huang@unisoc.com

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-08-03 11:52:20 -04:00
Vincent Whitchurch ee896ee805 tracing: Remove outdated comment in stack handling
This comment describes the behaviour before commit 2a820bf749
("tracing: Use percpu stack trace buffer more intelligently").  Since
that commit, interrupts and NMIs do use the per-cpu stacks so the
comment is no longer correct.  Remove it.

(Note that the FTRACE_STACK_SIZE mentioned in the comment has never
existed, it probably should have said FTRACE_STACK_ENTRIES.)

Link: https://lkml.kernel.org/r/20200727092840.18659-1-vincent.whitchurch@axis.com

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-07-30 22:54:50 -04:00
Chengming Zhou c5f51572a7 ftrace: Do not let direct or IPMODIFY ftrace_ops be added to module and set trampolines
When inserting a module, we find all ftrace_ops referencing it on the
ftrace_ops_list. But FTRACE_OPS_FL_DIRECT and FTRACE_OPS_FL_IPMODIFY
flags are special, and should not be set automatically. So warn and
skip ftrace_ops that have these two flags set and adding new code.
Also check if only one ftrace_ops references the module, in which case
we can use a trampoline as an optimization.

Link: https://lkml.kernel.org/r/20200728180554.65203-2-zhouchengming@bytedance.com

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-07-30 22:45:31 -04:00
Chengming Zhou 8a224ffb3f ftrace: Setup correct FTRACE_FL_REGS flags for module
When module loaded and enabled, we will use __ftrace_replace_code
for module if any ftrace_ops referenced it found. But we will get
wrong ftrace_addr for module rec in ftrace_get_addr_new, because
rec->flags has not been setup correctly. It can cause the callback
function of a ftrace_ops has FTRACE_OPS_FL_SAVE_REGS to be called
with pt_regs set to NULL.
So setup correct FTRACE_FL_REGS flags for rec when we call
referenced_filters to find ftrace_ops references it.

Link: https://lkml.kernel.org/r/20200728180554.65203-1-zhouchengming@bytedance.com

Cc: stable@vger.kernel.org
Fixes: 8c4f3c3fa9 ("ftrace: Check module functions being traced on reload")
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-07-30 19:35:19 -04:00
Kevin Hao 96b4833b68 tracing/hwlat: Honor the tracing_cpumask
In calculation of the cpu mask for the hwlat kernel thread, the wrong
cpu mask is used instead of the tracing_cpumask, this causes the
tracing/tracing_cpumask useless for hwlat tracer. Fixes it.

Link: https://lkml.kernel.org/r/20200730082318.42584-2-haokexin@gmail.com

Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 0330f7aa8e ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-07-30 19:35:04 -04:00
Kevin Hao a9d0ba6772 tracing/hwlat: Drop the duplicate assignment in start_kthread()
We have set 'current_mask' to '&save_cpumask' in its declaration,
so there is no need to assign again.

Link: https://lkml.kernel.org/r/20200730082318.42584-1-haokexin@gmail.com

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-07-30 19:35:04 -04:00
Wei Yang 36b8aacf2a tracing: Save one trace_event->type by using __TRACE_LAST_TYPE
Static defined trace_event->type stops at (__TRACE_LAST_TYPE - 1) and
dynamic trace_event->type starts from (__TRACE_LAST_TYPE + 1).

To save one trace_event->type index, let's use __TRACE_LAST_TYPE.

Link: https://lkml.kernel.org/r/20200703020612.12930-3-richard.weiyang@linux.alibaba.com

Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-07-09 18:14:58 -04:00
Wei Yang 746cf3459f tracing: Simplify defining of the next event id
The value to be used and compared in trace_search_list() is "last + 1".
Let's just define next to be "last + 1" instead of doing the addition
each time.

Link: https://lkml.kernel.org/r/20200703020612.12930-2-richard.weiyang@linux.alibaba.com

Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-07-09 18:00:47 -04:00
Steven Rostedt (VMware) 29ce24519c ring-buffer: Do not trigger a WARN if clock going backwards is detected
After tweaking the ring buffer to be a bit faster, a warning is triggering
on one of my machines, and causing my tests to fail. This warning is caused
when the delta (current time stamp minus previous time stamp), is larger
than the max time held by the ring buffer (59 bits).

If the clock were to go backwards slightly, this would then easily trigger
this warning. The machine that it triggered on, the clock did go backwards
by around 450 nanoseconds, and this happened after a recalibration of the
TSC clock. Now that the ring buffer is faster, it detects this, and the
delta that is used larger than the max, the warning is triggered and my test
fails.

To handle the clock going backwards, look at the saved before and after time
stamps. If they are the same, it means that the current event did not
interrupt another event, and that those timestamp are of a previous event
that was recorded. If the max delta is triggered, look at those time stamps,
make sure they are the same, then use them to compare with the current
timestamp. If the current timestamp is less than the before/after time
stamps, then that means the clock being used went backward.

Print out a message that this has happened, but do not warn about it (and
only print the message once).

Still do the warning if the delta is indeed larger than what can be used.

Also remove the unneeded KERN_WARNING from the WARN_ONCE() print.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-07-01 22:12:07 -04:00
Steven Rostedt (VMware) bbeba3e58f ring-buffer: Call trace_clock_local() directly for RETPOLINE kernels
After doing some benchmarks and examining the code, I found that the ring
buffer clock calls were quite expensive, and noticed that it uses
retpolines. This is because the ring buffer clock is programmable, and can
be set. But in most cases it simply uses the fastest ns unit clock which is
the trace_clock_local(). For RETPOLINE builds, checking if the ring buffer
clock is set to trace_clock_local() and then calling it directly has brought
the time of an event on my i7 box from an average of 93 nanoseconds an event
down to 83 nanoseconds an event, and the minimum time from 81 nanoseconds to
68 nanoseconds!

Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-07-01 22:12:07 -04:00
Steven Rostedt (VMware) 74e879373b ring-buffer: Move the add_timestamp into its own function
Make a helper function rb_add_timestamp() that moves the adding of the
extended time stamps into its own function. Also, remove the noinline and
inline for the functions it calls, as recent benchmarks appear they do not
make a difference (just let gcc decide).

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-07-01 22:12:06 -04:00
Steven Rostedt (VMware) 58fbc3c632 ring-buffer: Consolidate add_timestamp to remove some branches
Reorganize a little the logic to handle adding the absolute time stamp,
extended and forced time stamps, in such a way to remove a branch or two.
This is just a micro optimization.

Also add before and after time stamps to the rb_event_info structure to
display those values in the rb_check_timestamps() code, if something were to
go wrong.

Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-07-01 22:11:22 -04:00
Steven Rostedt (VMware) 75b21c6dfa ring-buffer: Mark the !tail (crossing a page) as unlikely
It is the uncommon case where an event crosses a sub buffer boundary (page)
mark that check at the end of reserving an event as unlikely.

Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-30 17:18:56 -04:00
Nicholas Piggin b23d7a5f4a ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU
On a 144 thread system, `perf ftrace` takes about 20 seconds to start
up, due to calling synchronize_rcu() for each CPU.

  cat /proc/108560/stack
    0xc0003e7eb336f470
    __switch_to+0x2e0/0x480
    __wait_rcu_gp+0x20c/0x220
    synchronize_rcu+0x9c/0xc0
    ring_buffer_reset_cpu+0x88/0x2e0
    tracing_reset_online_cpus+0x84/0xe0
    tracing_open+0x1d4/0x1f0

On a system with 10x more threads, it starts to become an annoyance.

Batch these up so we disable all the per-cpu buffers first, then
synchronize_rcu() once, then reset each of the buffers. This brings
the time down to about 0.5s.

Link: https://lkml.kernel.org/r/20200625053403.2386972-1-npiggin@gmail.com

Tested-by: Anton Blanchard <anton@ozlabs.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-30 17:18:56 -04:00
Steven Rostedt (VMware) 10464b4aa6 ring-buffer: Add rb_time_t 64 bit operations for speeding up 32 bit
After a discussion with the new time algorithm to have nested events still
have proper time keeping but required using local64_t atomic operations.
Mathieu was concerned about the performance this would have on 32 bit
machines, as in most cases, atomic 64 bit operations on them can be
expensive.

As the ring buffer's timing needs do not require full features of local64_t,
a wrapper is made to implement a new rb_time_t operation that uses two longs
on 32 bit machines but still uses the local64_t operations on 64 bit
machines. There's a switch that can be made in the file to force 64 bit to
use the 32 bit version just for testing purposes.

All reads do not need to succeed if a read happened while the stamp being
read is in the process of being updated. The requirement is that all reads
must succed that were done by an interrupting event (where this event was
interrupted by another event that did the write). Or if the event itself did
the write first. That is: rb_time_set(t, x) followed by rb_time_read(t) will
always succeed (even if it gets interrupted by another event that writes to
t. The result of the read will be either the previous set, or a set
performed by an interrupting event.

If the read is done by an event that interrupted another event that was in
the process of setting the time stamp, and no other event came along to
write to that time stamp, it will fail and the rb_time_read() will return
that it failed (the value to read will be undefined).

A set will always write to the time stamp and return with a valid time
stamp, such that any read after it will be valid.

A cmpxchg may fail if it interrupted an event that was in the process of
updating the time stamp just like the reads do. Other than that, it will act
like a normal cmpxchg.

The way this works is that the rb_time_t is made of of three fields. A cnt,
that gets updated atomically everyting a modification is made. A top that
represents the most significant 30 bits of the time, and a bottom to
represent the least significant 30 bits of the time. Notice, that the time
values is only 60 bits long (where the ring buffer only uses 59 bits, which
gives us 18 years of nanoseconds!).

The top two bits of both the top and bottom is a 2 bit counter that gets set
by the value of the least two significant bits of the cnt. A read of the top
and the bottom where both the top and bottom have the same most significant
top 2 bits, are considered a match and a valid 60 bit number can be created
from it. If they do not match, then the number is considered invalid, and
this must only happen if an event interrupted another event in the midst of
updating the time stamp.

This is only used for 32 bits machines as 64 bit machines can get better
performance out of the local64_t. This has been tested heavily by forcing 64
bit to use this logic.

Link: https://lore.kernel.org/r/20200625225345.18cf5881@oasis.local.home
Link: http://lkml.kernel.org/r/20200629025259.309232719@goodmis.org

Inspired-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-30 17:18:51 -04:00
Steven Rostedt (VMware) 7c4b4a5164 ring-buffer: Incorporate absolute timestamp into add_timestamp logic
Instead of calling out the absolute test for each time to check if the
ring buffer wants absolute time stamps for all its recording, incorporate it
with the add_timestamp field and turn it into flags for faster processing
between wanting a absolute tag and needing to force one.

Link: http://lkml.kernel.org/r/20200629025259.154892368@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-30 16:16:14 -04:00
Steven Rostedt (VMware) a389d86f7f ring-buffer: Have nested events still record running time stamp
Up until now, if an event is interrupted while it is recorded by an
interrupt, and that interrupt records events, the time of those events will
all be the same. This is because events only record the delta of the time
since the previous event (or beginning of a page), and to handle updating
the time keeping for that of nested events is extremely racy. After years of
thinking about this and several failed attempts, I finally have a solution
to solve this puzzle.

The problem is that you need to atomically calculate the delta and then
update the time stamp you made the delta from, as well as then record it
into the buffer, all this while at any time an interrupt can come in and
do the same thing. This is easy to solve with heavy weight atomics, but that
would be detrimental to the performance of the ring buffer. The current
state of affairs sacrificed the time deltas for nested events for
performance.

The reason for previous failed attempts at solving this puzzle was because I
was trying to completely avoid slow atomic operations like cmpxchg. I final
came to the conclusion to always avoid cmpxchg is not possible, which is why
those previous attempts always failed. But it is possible to pick one path
(the most common case) and avoid cmpxchg in that path, which is the "fast
path". The most common case is that an event will not be interrupted and
have other events added into it. An event can detect if it has
interrupted another event, and for these cases we can make it the slow
path and use the heavy operations like cmpxchg.

One more player was added to the game that made this possible, and that is
the "absolute timestamp" (by Tom Zanussi) that allows us to inject a full 59
bit time stamp. (Of course this breaks if a machine is running for more than
18 years without a reboot!).

There's barrier() placements around for being paranoid, even when they
are not needed because of other atomic functions near by. But those
should not hurt, as if they are not needed, they basically become a nop.

Note, this also makes the race window much smaller, which means there
are less slow paths to slow down the performance.

The basic idea is that there's two main paths taken.

 1) Not being interrupted between time stamps and reserving buffer space.
    In this case, the time stamps taken are true to the location in the
    buffer.

 2) Was interrupted by another path between taking time stamps and reserving
    buffer space.

The objective is to know what the delta is from the last reserved location
in the buffer.

As it is possible to detect if an event is interrupting another event before
reserving data, space is added to the length to be reserved to inject a full
time stamp along with the event being reserved.

When an event is not interrupted, the write stamp is always the time of the
last event written to the buffer.

In path 1, there's two sub paths we care about:

 a) The event did not interrupt another event.
 b) The event interrupted another event.

In case a, as the write stamp was read and known to be correct, the delta
between the current time stamp and the write stamp is the delta between the
current event and the previously recorded event.

In case b, extra space was reserved to just put the full time stamp into the
buffer. Which is done, as stated, in this path the time stamp taken is known
to match the location in the buffer.

In path 2, there's also two sub paths we care about:

 a) The event was not interrupted by another event since it reserved space
    on the buffer and re-reading the write stamp.
 b) The event was interrupted by another event.

In case a, the write stamp is that of the last event that interrupted this
event between taking the time stamps and reserving. As no event came in
after re-reading the write stamp, that event is known to be the time of the
event directly before this event and the delta can be the new time stamp and
the write stamp.

In case b, one or more events came in between reserving the event and
re-reading he write stamp. Since this event's buffer reservation is between
other events at this path, there's no way to know what the delta is. But
because an event interrupted this event after it started, its fine to just
give a zero delta, and take the same time stamp as the events that happened
within the event being recorded.

Here's the implementation of the design of this solution:

 All this is per cpu, and only needs to worry about nested events (not
 parallel events).

The players:

 write_tail: The index in the buffer where new events can be written to.
     It is incremented via local_add() to reserve space for a new event.

 before_stamp: A time stamp set by all events before reserving space.

 write_stamp: A time stamp updated by events after it has successfully
     reserved space.

	/* Save the current position of write */
 [A]	w = local_read(write_tail);
	barrier();
	/* Read both before and write stamps before touching anything */
	before = local_read(before_stamp);
	after = local_read(write_stamp);
	barrier();

	/*
	 * If before and after are the same, then this event is not
	 * interrupting a time update. If it is, then reserve space for adding
	 * a full time stamp (this can turn into a time extend which is
	 * just an extended time delta but fill up the extra space).
	 */
	if (after != before)
		abs = true;

	ts = clock();

	/* Now update the before_stamp (everyone does this!) */
 [B]	local_set(before_stamp, ts);

	/* Now reserve space on the buffer */
 [C]	write = local_add_return(len, write_tail);

	/* Set tail to be were this event's data is */
	tail = write - len;

 	if (w == tail) {

		/* Nothing interrupted this between A and C */
 [D]		local_set(write_stamp, ts);
		barrier();
 [E]		save_before = local_read(before_stamp);

 		if (!abs) {
			/* This did not interrupt a time update */
			delta = ts - after;
		} else {
			delta = ts; /* The full time stamp will be in use */
		}
		if (ts != save_before) {
			/* slow path - Was interrupted between C and E */
			/* The update to write_stamp could have overwritten the update to
			 * it by the interrupting event, but before and after should be
			 * the same for all completed top events */
			after = local_read(write_stamp);
			if (save_before > after)
				local_cmpxchg(write_stamp, after, save_before);
		}
	} else {
		/* slow path - Interrupted between A and C */

		after = local_read(write_stamp);
		temp_ts = clock();
		barrier();
 [F]		if (write == local_read(write_tail) && after < temp_ts) {
			/* This was not interrupted since C and F
			 * The last write_stamp is still valid for the previous event
			 * in the buffer. */
			delta = temp_ts - after;
			/* OK to keep this new time stamp */
			ts = temp_ts;
		} else {
			/* Interrupted between C and F
			 * Well, there's no use to try to know what the time stamp
			 * is for the previous event. Just set delta to zero and
			 * be the same time as that event that interrupted us before
			 * the reservation of the buffer. */

			delta = 0;
		}
		/* No need to use full timestamps here */
		abs = 0;
	}

Link: https://lkml.kernel.org/r/20200625094454.732790f7@oasis.local.home
Link: https://lore.kernel.org/r/20200627010041.517736087@goodmis.org
Link: http://lkml.kernel.org/r/20200629025258.957440797@goodmis.org

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-30 14:29:33 -04:00
Steven Rostedt (VMware) 7ef282e051 tracing: Move pipe reference to trace array instead of current_tracer
If a process has the trace_pipe open on a trace_array, the current tracer
for that trace array should not be changed. This was original enforced by a
global lock, but when instances were introduced, it was moved to the
current_trace. But this structure is shared by all instances, and a
trace_pipe is for a single instance. There's no reason that a process that
has trace_pipe open on one instance should prevent another instance from
changing its current tracer. Move the reference counter to the trace_array
instead.

This is marked as "Fixes" but is more of a clean up than a true fix.
Backport if you want, but its not critical.

Fixes: cf6ab6d914 ("tracing: Add ref count to tracer for when they are being read by pipe")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-30 14:29:33 -04:00
Wei Yang e6bc5b3f42 tracing: not necessary to define DEFINE_EVENT_PRINT to be empty again
After the previous cleanup, DEFINE_EVENT_PRINT's definition has no
relationship with DEFINE_EVENT. So After we re-define DEFINE_EVENT, it
is not necessary to define DEFINE_EVENT_PRINT to be empty again.

Link: http://lkml.kernel.org/r/20200612092844.56107-5-richard.weiyang@linux.alibaba.com

Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-30 14:29:32 -04:00
Wei Yang 61df16fcaf tracing: define DEFINE_EVENT_PRINT not related to DEFINE_EVENT
Current definition define DEFINE_EVENT_PRINT to be DEFINE_EVENT.
Actually, at this point DEFINE_EVENT is already an empty macro. Let's
cut the relationship between DEFINE_EVENT_PRINT and DEFINE_EVENT.

Link: http://lkml.kernel.org/r/20200612092844.56107-4-richard.weiyang@linux.alibaba.com

Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-30 14:29:32 -04:00
Wei Yang b6f9eb8707 tracing: not necessary re-define DEFINE_EVENT_PRINT
The definition of DEFINE_EVENT_PRINT is not changed after previous one,
so not necessary to re-define is as the same form.

Link: http://lkml.kernel.org/r/20200612092844.56107-3-richard.weiyang@linux.alibaba.com

Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-30 14:29:32 -04:00
Wei Yang e8cf9c8c4c tracing: not necessary to undefine DEFINE_EVENT again
After un-define DEFINE_EVENT in Stage 2, DEFINE_EVENT is not defined to a
specific form. It is not necessary to un-define it again.

Let's skip this.

Link: http://lkml.kernel.org/r/20200612092844.56107-2-richard.weiyang@linux.alibaba.com

Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-30 14:29:32 -04:00
Steven Rostedt (VMware) fe58acefd5 x86/ftrace: Do not jump to direct code in created trampolines
When creating a trampoline based on the ftrace_regs_caller code, nop out the
jnz test that would jmup to the code that would return to a direct caller
(stored in the ORIG_RAX field) and not back to the function that called it.

Link: http://lkml.kernel.org/r/20200422162750.638839749@goodmis.org

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-29 11:42:48 -04:00
Steven Rostedt (VMware) 5da7cd11d0 x86/ftrace: Only have the builtin ftrace_regs_caller call direct hooks
If a direct hook is attached to a function that ftrace also has a function
attached to it, then it is required that the ftrace_ops_list_func() is used
to iterate over the registered ftrace callbacks. This will also include the
direct ftrace_ops helper, that tells ftrace_regs_caller where to return to
(the direct callback and not the function that called it).

As this direct helper is only to handle the case of ftrace callbacks
attached to the same function as the direct callback, the ftrace callback
allocated trampolines (used to only call them), should never be used to
return back to a direct callback.

Only copy the portion of the ftrace_regs_caller that will return back to
what called it, and not the portion that returns back to the direct caller.

The direct ftrace_ops must then pick the ftrace_regs_caller builtin function
as its own trampoline to ensure that it will never have one allocated for
it (which would not include the handling of direct callbacks).

Link: http://lkml.kernel.org/r/20200422162750.495903799@goodmis.org

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-29 11:42:47 -04:00
Steven Rostedt (VMware) 0b4f8ddc0c x86/ftrace: Make non direct case the default in ftrace_regs_caller
If a direct function is hooked along with one of the ftrace registered
functions, then the ftrace_regs_caller is attached to the function that
shares the direct hook as well as the ftrace hook. The ftrace_regs_caller
will call ftrace_ops_list_func() that iterates through all the registered
ftrace callbacks, and if there's a direct callback attached to that
function, the direct ftrace_ops callback is called to notify that
ftrace_regs_caller to return to the direct caller instead of going back to
the function that called it.

But this is a very uncommon case. Currently, the code has it as the default
case. Modify ftrace_regs_caller to make the default case (the non jump) to
just return normally, and have the jump to the handling of the direct
caller.

Link: http://lkml.kernel.org/r/20200422162750.350373278@goodmis.org

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-29 11:42:47 -04:00
Steven Rostedt (VMware) c791cc4b1f tracing: Only allow trace_array_printk() to be used by instances
To prevent default "trace_printks()" from spamming the top level tracing
ring buffer, only allow trace instances to use trace_array_printk() (which
can be used without the trace_printk() start up warning).

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-29 09:01:02 -04:00
Linus Torvalds 9ebcfadb06 Linux 5.8-rc3 2020-06-28 15:00:24 -07:00
Linus Torvalds f7db192b2d ARM: OMAP fixes for v5.8
The OMAP developers are particularly active at hunting down regressions,
 so this is a separate branch with OMAP specific fixes for the v5.8:
 
 As Tony explains
  "The recent display subsystem (DSS) related platform data changes
   caused display related regressions for suspend and resume. Looks like
   I only tested suspend and resume before dropping the legacy platform
   data, and forgot to test it after dropping it. Turns out the main issue
   was that we no longer have platform code calling pm_runtime_suspend
   for DSS like we did for the legacy platform data case, and that fix
   is still being discussed on the dri-devel list and will get merged
   separately. The DSS related testing exposed a pile other other display
   related issues that also need fixing though":
 
  - Fix ti-sysc optional clock handling and reset status checks
    for devices that reset automatically in idle like DSS
 
  - Ignore ti-sysc clockactivity bit unless separately requested
    to avoid unexpected performance issues
 
  - Init ti-sysc framedonetv_irq to true and disable for am4
 
  - Avoid duplicate DSS reset for legacy mode with dts data
 
  - Remove LCD timings for am4 as they cause warnings now that we're
    using generic panels
 
 Other OMAP changes from Tony include:
 
  - Fix omap_prm reset deassert as we still have drivers setting the
    pm_runtime_irq_safe() flag
 
  - Flush posted write for ti-sysc enable and disable
 
  - Fix droid4 spi related errors with spi flags
 
  - Fix am335x USB range and a typo for softreset
 
  - Fix dra7 timer nodes for clocks for IPU and DSP
 
  - Drop duplicate mailboxes after mismerge for dra7
 
  - Prevent pocketgeagle header line signal from accidentally setting
    micro-SD write protection signal by removing the default mux
 
  - Fix NFSroot flakeyness after resume for duover by switching the
    smsc911x gpio interrupt to back to level sensitive
 
  - Fix regression for omap4 clockevent source after recent system
    timer changes
 
  - Yet another ethernet regression fix for the "rgmii" vs "rgmii-rxid"
    phy-mode
 
  - One patch to convert am3/am4 DT files to use the regular sdhci-omap
    driver instead of the old hsmmc driver, this was meant for the
    merge window but got lost in the process.
 
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAl74u60ACgkQmmx57+YA
 GNmxdQ/+IOpnASgGXCurT9E6fqst5mLTxlZiPDc+lxyIwMVW2naTf0ByzJgjekVV
 tn7qXbKn+WUpzcuxuAg14FF88WsSgzV64aLjbUfTde4vqtYpTpd3SArM6EQ26JMH
 1LBSjmwMnF7OjTYYZF6ImHe9qGlPHPR+Dyg6n3bGC04r6gmV9OJsE9X30RUeqAGg
 arC6bJqn/4zDak1015WpnxXjEXVD+APOkZgJpu1ll15aiSFmICTiLvBTqXf1r91P
 jSDs6Z46E/gPtz1EvJ0GNXnRw+PBLyUMiAa/CzdELfqIOjQ2aHvZLY4ht4EtWY0P
 zd/FHibO+KrFklySpjUbgHfGpoANSkFIjtGyBKqEH05P44H3DpCIyGF+6lcE1O44
 7R1ztRcUa49Z1druc4n3KFQ2G9Ts8px4hcUQZLjPAR9wgCxBs2gi43joQdW+AnTr
 xqbHo4aePook/vu1r96KQbw6PSrMmd4Poy7gmrQeBZlAI2cLeRvvgJMukpDEsGXA
 1Vw7RhM0lzGHegZ58zvBwCwcYQO6c/q722SFGBD6fJ4SG+9yRim/hsu+QFy629a/
 pphGUsTDTkgasxY/aQ/k7NLpxMHITliRBGUx5KC8fnVPkRGVCxsOYt4HFBrk2GtO
 WoO5d9qRbnWzXQAUzk8bdjlE6NXJACGyJ5Swgzzm7tCZ1tbufjI=
 =FFTv
 -----END PGP SIGNATURE-----

Merge tag 'arm-omap-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM OMAP fixes from Arnd Bergmann:
 "The OMAP developers are particularly active at hunting down
  regressions, so this is a separate branch with OMAP specific
  fixes for v5.8:

  As Tony explains
    "The recent display subsystem (DSS) related platform data changes
     caused display related regressions for suspend and resume. Looks
     like I only tested suspend and resume before dropping the legacy
     platform data, and forgot to test it after dropping it. Turns out
     the main issue was that we no longer have platform code calling
     pm_runtime_suspend for DSS like we did for the legacy platform data
     case, and that fix is still being discussed on the dri-devel list
     and will get merged separately. The DSS related testing exposed a
     pile other other display related issues that also need fixing
     though":

   - Fix ti-sysc optional clock handling and reset status checks for
     devices that reset automatically in idle like DSS

   - Ignore ti-sysc clockactivity bit unless separately requested to
     avoid unexpected performance issues

   - Init ti-sysc framedonetv_irq to true and disable for am4

   - Avoid duplicate DSS reset for legacy mode with dts data

   - Remove LCD timings for am4 as they cause warnings now that we're
     using generic panels

  Other OMAP changes from Tony include:

   - Fix omap_prm reset deassert as we still have drivers setting the
     pm_runtime_irq_safe() flag

   - Flush posted write for ti-sysc enable and disable

   - Fix droid4 spi related errors with spi flags

   - Fix am335x USB range and a typo for softreset

   - Fix dra7 timer nodes for clocks for IPU and DSP

   - Drop duplicate mailboxes after mismerge for dra7

   - Prevent pocketgeagle header line signal from accidentally setting
     micro-SD write protection signal by removing the default mux

   - Fix NFSroot flakeyness after resume for duover by switching the
     smsc911x gpio interrupt to back to level sensitive

   - Fix regression for omap4 clockevent source after recent system
     timer changes

   - Yet another ethernet regression fix for the "rgmii" vs "rgmii-rxid"
     phy-mode

   - One patch to convert am3/am4 DT files to use the regular sdhci-omap
     driver instead of the old hsmmc driver, this was meant for the
     merge window but got lost in the process"

* tag 'arm-omap-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (21 commits)
  ARM: dts: am5729: beaglebone-ai: fix rgmii phy-mode
  ARM: dts: Fix omap4 system timer source clocks
  ARM: dts: Fix duovero smsc interrupt for suspend
  ARM: dts: am335x-pocketbeagle: Fix mmc0 Write Protect
  Revert "bus: ti-sysc: Increase max softreset wait"
  ARM: dts: am437x-epos-evm: remove lcd timings
  ARM: dts: am437x-gp-evm: remove lcd timings
  ARM: dts: am437x-sk-evm: remove lcd timings
  ARM: dts: dra7-evm-common: Fix duplicate mailbox nodes
  ARM: dts: dra7: Fix timer nodes properly for timer_sys_ck clocks
  ARM: dts: Fix am33xx.dtsi ti,sysc-mask wrong softreset flag
  ARM: dts: Fix am33xx.dtsi USB ranges length
  bus: ti-sysc: Increase max softreset wait
  ARM: OMAP2+: Fix legacy mode dss_reset
  bus: ti-sysc: Fix uninitialized framedonetv_irq
  bus: ti-sysc: Ignore clockactivity unless specified as a quirk
  bus: ti-sysc: Use optional clocks on for enable and wait for softreset bit
  ARM: dts: omap4-droid4: Fix spi configuration and increase rate
  bus: ti-sysc: Flush posted write on enable and disable
  soc: ti: omap-prm: use atomic iopoll instead of sleeping one
  ...
2020-06-28 14:57:14 -07:00
Linus Torvalds e44b59cd75 ARM: SoC fixes for v5.8
Here are a couple of bug fixes, mostly for devicetree files
 
 NXP i.MX:
   - Use correct voltage on some i.MX8M board device trees to
     avoid hardware damage
   - Code fixes for a compiler warning and incorrect reference
     counting, both harmless.
   - Fix the i.MX8M SoC driver to correctly identify imx8mp
   - Fix watchdog configuration in imx6ul-kontron device tree.
 
 Broadcom:
   - A small regression fix for the Raspberry-Pi firmware driver
   - A Kconfig change to use the correct timer driver on Northstar
   - A DT fix for the Luxul XWC-2000 machine
   - Two more DT fixes for NSP SoCs
 
 STmicroelectronics STI
   - Revert one broken patch for L2 cache configuration
 
 ARM Versatile Express:
   - Fix a regression by reverting a broken DT cleanup
 
 TEE drivers:
   - MAINTAINERS: change tee mailing list
 
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAl74uLEACgkQmmx57+YA
 GNnPHA/8DM6SnpONT9todCkl8NWe6SKWtjhb4hdmFyvWh2EQkau3HfTQhFsrX7N8
 1dZwPZlbyKl/np6D9MDctnk9S6LZjqUMVbJVpFbIdiBzHbD1rTE9ordBWehkymqm
 ZaxUe949QtKHgq+Duv4a3yuJUbxR8R01nyG0zsvGwWJwFYTlp/RPOoEZLK84RESK
 Kqimmg3UvjiRkaXBO+O/AqNeTUQtYyk8ru+5t2EkJW62QO/niU2107givmRikcrO
 pPfeP5hR4qtiTQGWEb2m4TWWkHlGpF0+bT2H4kzZGq99wFp1SqR46Hz+/5USCkwd
 vnIBFvJu9huQ2cQk0q9o998RAKBcg/ybG89MvuMWo5rQIy6NhXaJ6eb86yQggkei
 AnDa+mmLX4KcEi8LTI/I3n/4fHwg4tSRy+5RkzBRJBbTQQoRy9Bs3iQZhy6PE0fL
 rUm1wkNGH80nRQUOpiHxHb2PTdeGTialMAKgISqY4ltsGlSNUqneVxN1dYXytFMa
 mf7CM2ihJNe+Fyo52FFfDDt1re/A+ztbalX+oNGX8RbHlczFoZHldGYcAxGkosNE
 2MdAVWhvNR0E9nwYY8QsGN2e3IRdjnxc3hsiWPFhhHwskd1is7FDsnXdqU6qIr3I
 9HytkMhICr3cdnbqbVv/+5rTrZ89x/atQLUWJcj6PZk6WNZ/gA0=
 =P5b/
 -----END PGP SIGNATURE-----

Merge tag 'arm-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
 "Here are a couple of bug fixes, mostly for devicetree files

  NXP i.MX:
   - Use correct voltage on some i.MX8M board device trees to avoid
     hardware damage
   - Code fixes for a compiler warning and incorrect reference counting,
     both harmless.
   - Fix the i.MX8M SoC driver to correctly identify imx8mp
   - Fix watchdog configuration in imx6ul-kontron device tree.

  Broadcom:
   - A small regression fix for the Raspberry-Pi firmware driver
   - A Kconfig change to use the correct timer driver on Northstar
   - A DT fix for the Luxul XWC-2000 machine
   - Two more DT fixes for NSP SoCs

  STmicroelectronics STI
   - Revert one broken patch for L2 cache configuration

  ARM Versatile Express:
   - Fix a regression by reverting a broken DT cleanup

  TEE drivers:
   - MAINTAINERS: change tee mailing list"

* tag 'arm-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  Revert "ARM: sti: Implement dummy L2 cache's write_sec"
  soc: imx8m: fix build warning
  ARM: imx6: add missing put_device() call in imx6q_suspend_init()
  ARM: imx5: add missing put_device() call in imx_suspend_alloc_ocram()
  soc: imx8m: Correct i.MX8MP UID fuse offset
  ARM: dts: imx6ul-kontron: Change WDOG_ANY signal from push-pull to open-drain
  ARM: dts: imx6ul-kontron: Move watchdog from Kontron i.MX6UL/ULL board to SoM
  arm64: dts: imx8mm-beacon: Fix voltages on LDO1 and LDO2
  arm64: dts: imx8mn-ddr4-evk: correct ldo1/ldo2 voltage range
  arm64: dts: imx8mm-evk: correct ldo1/ldo2 voltage range
  ARM: dts: NSP: Correct FA2 mailbox node
  ARM: bcm2835: Fix integer overflow in rpi_firmware_print_firmware_revision()
  MAINTAINERS: change tee mailing list
  ARM: dts: NSP: Disable PL330 by default, add dma-coherent property
  ARM: bcm: Select ARM_TIMER_SP804 for ARCH_BCM_NSP
  ARM: dts: BCM5301X: Add missing memory "device_type" for Luxul XWC-2000
  arm: dts: vexpress: Move mcc node back into motherboard node
2020-06-28 14:55:18 -07:00
Linus Torvalds 668f532da4 A single DocBook fix.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl744rIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iVDxAAo1dPwatfWWLHNwKomMseUHUkbKJ9v+7M
 zSFHu0hOX6bcyR1HP7lp/wGCTdjFCEfTj/0A+oclSYYwdG+N/weuHzPsvis+IHJe
 mYPTNxvdAjRmqE3hoDX4ES3l/uVO/7SWmecXkVWgKwBNPSk2RHgGegN3jqF0R4ZM
 9eOeGaluY18UaBEg5jMECYv4XyMap+lHrITrhofT3+0ok0ruQI40PH+pxT5phRyp
 VsNiqLC98a33+ORK0nNWTq/z8WecMaPQSz0nKxMrYoFMc6cFCv/V/nC/FW8g0aSh
 8xhznskBqVoPcQoSXjlSO02IDMbQAf11CCPQDnWm6buLFFcMvfaKZH6kKfeIfAbd
 3TqUKcsoO81E9ZZ+lXa0a60578eQNt9665wCDEZa5TRjcwPr/2XWZLL0FIuX4rBm
 iFR/Le2z0RBs+yriPVo9BsLp8rVQjdNfwAQxfQ9ZCDR+VLLjdACCsL7/SbyQmx3f
 kit148rHY2vDUK+JbGXazIL1Gua2tzLHjWxk50x9rFhbX+9DtGGxKoBagJv7vjpB
 /tDv+mtpoXBgC2tDwun2sY04pyumzu9EOM382k6ubWffmCztZW9ytD/LENT/vM3h
 P44fYRwzbiM1O/WjJbtwzjqsr4zgyNC6lpUoqacrC5VEppedjArdC6mOOhQs/3Tn
 X+zuaWFepBw=
 =iDt5
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
 "A single DocBook fix"

* tag 'timers-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Fix kerneldoc system_device_crosststamp & al
2020-06-28 11:59:08 -07:00
Linus Torvalds ae71d4bf00 A single Kbuild dependency fix.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl744TERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gHMQ//WqRLohE/MumVpIqGKIXcF0WlsTJxZdvl
 d8QqYjErSt5tEc17zkjfOkQyb1wHEADghxjwJ0sekMJ/YR/3/OqHw8j02kBBpnMW
 Dw96ceogbssX/oCwSTA4s5Y8JVxW6gWLW5IqAEqmsZXl4QTiCIrkcxoj9yS/IuF+
 zIne+CZC95Nn8lVf4y2A1y3bwWzdcl30swP7VWg6FlP+kNVX8kxFdSkjxS7khHa+
 oQV6HwKxND9PmNMzhLYsXQC7pFfXgwml+CSGaHgTNYJqioCVqnFGUtaA8SD/qTq3
 lyVNXu2zav+L57jSsTZr4hbIEHOxtRUtS4iPPKpKGGWhD1EUzyGNY4cCAkybFIjr
 3Kak+YQKs46rRLiwzfMq0WEpBEXeK6QE9upJbmVE4GlNJKoCYT5yFlgjKg8e5Zgv
 g5UtXo7O5pp9mpbPiKP5r+fWEKWDlutN0epfkMT5O+HKBWz/BLCwmqlhC36+epvY
 eialLPrBZ8dIkWpty6NGoa1moORWLbq0yN2eBAi9j1JVb0ubN6phOaeTtE2am2j8
 WMdNXuOxYxXRbi/u10biV+YCSGyEkSBMqVlB+wWAtAFnOjYo7p+Q3XXLmJ+kRsSW
 wazF2eHUgooBHWBw8sNCJXLHNFHDIcZfoEQP1323xCT6T5xGYbbzcpULYh9X4aPl
 oiPw0JJi2f4=
 =FBBj
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Ingo Molnar:
 "A single Kbuild dependency fix"

* tag 'perf-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/rapl: Fix RAPL config variable bug
2020-06-28 11:58:14 -07:00
Linus Torvalds bc53f67d24 - Fix build regression on v4.8 and older
- Robustness fix for TPM log parsing code
 - kobject refcount fix for the ESRT parsing code
 - Two efivarfs fixes to make it behave more like an ordinary file system
 - Style fixup for zero length arrays
 - Fix a regression in path separator handling in the initrd loader
 - Fix a missing prototype warning
 - Add some kerneldoc headers for newly introduced stub routines
 - Allow support for SSDT overrides via EFI variables to be disabled
 - Report CPU mode and MMU state upon entry for 32-bit ARM
 - Use the correct stack pointer alignment when entering from mixed mode
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl74344RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1heMw//b9UPgWlkH2xnAjo9QeFvounyT8XrLLnW
 QkhkiIGDvM2qWUmRotRrxRq39P9A+AH4x0krWTZam67W1OuWleUjwQWrnYE8vhql
 xdIAJmD1oWTi07p4SFzLVA7mJvMX5xenCYvGTALoHtsGnLbOiRGSSTnuXZr1c6Kd
 2XcY89kpcZGXgw9VCNV2Ez1g0OlCHS1N5LV31WGUcFl30Q3aZpdLmnFUzKLUbRgb
 sTNMlu2mLGSs/ZaTAaOGNzFkxGVJI2+0C+ApKvmR9WR7+5n9Brs27RSLgPMViXun
 BnsTewMdxNBXITgLxcUEtngPEWIzqrwJVbLaZVeWcWez0g11GIt0+wonpRnxWjHA
 XgQm00sK4HIvs+3YWUJ1PpXyjUmiPvOKZM5um9zsCiYml+RzzIm6bznII4Lh7rQe
 4kOLXkxaww+LS4r3+si6Q16og4zd/zZs4MoxaF7frTJ6oiUWOpBJqdf92Kiz0DaS
 kfQ2I3d/PdZvWuNIiBCfX9bjd7q0zq0zyIghP7460lx88aaHb20samTtl+qjN4MM
 Wpik/soeYi5pICDRRwiAHhpgK+li4LLjP3D81rYX8pEaAiubpjCwqLxIexQ6XJCV
 UZAR4swswrYntdXfUMmRnPBsLWWLePq6sRAvlent2si2cp+65f8I1xZ0ClK7YMjr
 qXUW7jOp/88=
 =F0bv
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI fixes from Ingo Molnar:

 - Fix build regression on v4.8 and older

 - Robustness fix for TPM log parsing code

 - kobject refcount fix for the ESRT parsing code

 - Two efivarfs fixes to make it behave more like an ordinary file
   system

 - Style fixup for zero length arrays

 - Fix a regression in path separator handling in the initrd loader

 - Fix a missing prototype warning

 - Add some kerneldoc headers for newly introduced stub routines

 - Allow support for SSDT overrides via EFI variables to be disabled

 - Report CPU mode and MMU state upon entry for 32-bit ARM

 - Use the correct stack pointer alignment when entering from mixed mode

* tag 'efi-urgent-2020-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/libstub: arm: Print CPU boot mode and MMU state at boot
  efi/libstub: arm: Omit arch specific config table matching array on arm64
  efi/x86: Setup stack correctly for efi_pe_entry
  efi: Make it possible to disable efivar_ssdt entirely
  efi/libstub: Descriptions for stub helper functions
  efi/libstub: Fix path separator regression
  efi/libstub: Fix missing-prototype warning for skip_spaces()
  efi: Replace zero-length array and use struct_size() helper
  efivarfs: Don't return -EINTR when rate-limiting reads
  efivarfs: Update inode modification time for successful writes
  efi/esrt: Fix reference count leak in esre_create_sysfs_entry.
  efi/tpm: Verify event log header before parsing
  efi/x86: Fix build with gcc 4
2020-06-28 11:42:16 -07:00
Linus Torvalds 91a9a90d04 Peter Zijlstra says:
The most anticipated fix in this pull request is probably the horrible build
 fix for the RANDSTRUCT fail that didn't make -rc2. Also included is the cleanup
 that removes those BUILD_BUG_ON()s and replaces it with ugly unions.
 
 Also included is the try_to_wake_up() race fix that was first triggered by
 Paul's RCU-torture runs, but was independently hit by Dave Chinner's fstest
 runs as well.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl74tMMACgkQEsHwGGHe
 VUqpAxAAnAiwPetkmCUn53wmv10oGC/vbnxprvNzoIANo9IFJYwKLYuRviT4r4KW
 0tEmpWtsy0CkVdCTpx4yXYUqtGswbjAvxSuwk8vR3bdtottMNJ77PPBKrywL3ymZ
 uQ0tpB/W9CFTOjKx4U/OyaK2Gf4mYzvuJSqhhTbopGf4H9SWflhepLZf0C4rhYa5
 tywch3etazAcNpq+dm31jKIVUkwULyJ4mXH2VDXo+jjl1A5g6h2UliS03e1/BChD
 hX78NRv7ezySdVVpLFhLVKCRdFFj6wIbLsx0yIQjw83dYhmDHK9iqN7m9+p4pZOr
 4qz/+eRYv+zZwWZP8IqOIAE4la1S/LToKEyxAehwl2sfIjhUXx68PvM/feWr8yfd
 z2CHEsI3Dn5XfM8FdPSA+JHE9IHwUyHrDRxcVGU7Nj/9s4L2DfxdrPl6qKGA3Tzm
 F7rK4vR5MNB8Sr7bzcCWV9FOsMNcXh2WThpZcsjfCUgwJza45N3HfocsXO5m4ShC
 FQ8RjE46Msd1WgIoslAkgQT7rFohe/sUKs5xVj4SwT/5i6lz55IGYmiV+hErrxU4
 ArSzUeOys/0EwzJX8PvxiElMq3btFW2XYV65XX5dIABt9IxgRvxHcUGPJDNvQKP7
 WdKVxRIzVXcfRiKUI05vLZU6yzfJuoAjvI1kyTYo64QIbeM7H6g=
 =EGOe
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:
 "The most anticipated fix in this pull request is probably the horrible
  build fix for the RANDSTRUCT fail that didn't make -rc2. Also included
  is the cleanup that removes those BUILD_BUG_ON()s and replaces it with
  ugly unions.

  Also included is the try_to_wake_up() race fix that was first
  triggered by Paul's RCU-torture runs, but was independently hit by
  Dave Chinner's fstest runs as well"

* tag 'sched_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/cfs: change initial value of runnable_avg
  smp, irq_work: Continue smp_call_function*() and irq_work*() integration
  sched/core: s/WF_ON_RQ/WQ_ON_CPU/
  sched/core: Fix ttwu() race
  sched/core: Fix PI boosting between RT and DEADLINE tasks
  sched/deadline: Initialize ->dl_boosted
  sched/core: Check cpus_mask, not cpus_ptr in __set_cpus_allowed_ptr(), to fix mask corruption
  sched/core: Fix CONFIG_GCC_PLUGIN_RANDSTRUCT build fail
2020-06-28 10:37:39 -07:00
Linus Torvalds 098c793821 * AMD Memory bandwidth counter width fix, by Babu Moger.
* Use the proper length type in the 32-bit truncate() syscall variant,
 by Jiri Slaby.
 
 * Reinit IA32_FEAT_CTL during wakeup to fix the case where after
 resume, VMXON would #GP due to VMX not being properly enabled, by Sean
 Christopherson.
 
 * Fix a static checker warning in the resctrl code, by Dan Carpenter.
 
 * Add a CR4 pinning mask for bits which cannot change after boot, by
 Kees Cook.
 
 * Align the start of the loop of __clear_user() to 16 bytes, to improve
 performance on AMD zen1 and zen2 microarchitectures, by Matt Fleming.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl74q8kACgkQEsHwGGHe
 VUqYig/8CRyHBweLnR9naD6uZ+rF83LXiTKOGLt60WRzNPCLpkwGD5aRiUwzRmFL
 FOn9g2YLDY32+SzPRkqwJioodfxXRhvjKMnEChgnDcWAtTkWfMXWQfj2w5E8sTLE
 /9cpc9rmfCQJmZFDPkL88lfH38t+Uye4Ydcur/HMetkoR4C8hGrUOGZpkG3nR8EJ
 PGmmQ1VpMmwKMUsdD+GgKC+wgyrHbhFcrr+ZH5quU3XIzuvxXsHBiK2MlqVnN1a/
 1xKglMHfQQ1MI7tmJth8s1xLQ1/Mr+ctxhC5nyyMpheDU9/257bVNKE1uF+yz7or
 KylFUcvYje49mm7fxyEDrX+NMJGT7ZBBK/Xn7Fw5sLSsGGNY2/2HwYRbnzMSTjNO
 JzY7HDkZuQgzLxlKSIKgRvz5f1j1m8D0UaG/q+JuJ6mJoPDS5qiPyshv4cW8v8iD
 t5mzEuj++dWfiyPR4sWruP36jNKqPnbe8bUGe4j+QJ+TZL0SsSlopCFxo3TEJ4Bo
 dlHUxXZcYE2/48wlP15X+jFultKcqi0HwO+rQm8uPN7O7X1xsWcO4PbTl/lngvg6
 HxClDwmfDjoCmEXij3U9gqWvXmy++C5ljWCwhYNM60Fc1yIChfnwJHZBUvx3XGui
 DZqimVa+QIRNFwWqMVF1RmE1ZuyCMYGZulZPo68gEXNeeNZ0R6g=
 =hxkd
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - AMD Memory bandwidth counter width fix, by Babu Moger.

 - Use the proper length type in the 32-bit truncate() syscall variant,
   by Jiri Slaby.

 - Reinit IA32_FEAT_CTL during wakeup to fix the case where after
   resume, VMXON would #GP due to VMX not being properly enabled, by
   Sean Christopherson.

 - Fix a static checker warning in the resctrl code, by Dan Carpenter.

 - Add a CR4 pinning mask for bits which cannot change after boot, by
   Kees Cook.

 - Align the start of the loop of __clear_user() to 16 bytes, to improve
   performance on AMD zen1 and zen2 microarchitectures, by Matt Fleming.

* tag 'x86_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm/64: Align start of __clear_user() loop to 16-bytes
  x86/cpu: Use pinning mask for CR4 bits needing to be 0
  x86/resctrl: Fix a NULL vs IS_ERR() static checker warning in rdt_cdp_peer_get()
  x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup
  syscalls: Fix offset type of ksys_ftruncate()
  x86/resctrl: Fix memory bandwidth counter width for AMD
2020-06-28 10:35:01 -07:00
Linus Torvalds c141b30e99 Paul E. McKenney says:
A single commit that uses "arch_" atomic operations to avoid the
 instrumentation that comes with the non-"arch_" versions. In preparation
 for that commit, it also has another commit that makes these "arch_"
 atomic operations available to generic code.
 
 Without these commits, KCSAN uses can see pointless errors.
 
 Both from Peter Zijlstra.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl74qe4ACgkQEsHwGGHe
 VUruxhAApxHnsIX4IFm4cBaSMMsXCGpifM3EOd3S1PPqxWQxLfrDpc/SgW4hvJja
 y144m/HQVvHkO8DAqWaC5lNmILjZhZeR1ToRrtqsFVzedlORaXgFJQzojjOBBCWi
 kwtrqVDb4dw+RBQdj6hrknnsivdAlDVFHYCxQuBpNQ/NN4M9l0nwxPRVpTdcFtw0
 Yv6ttpDeo8/XJ12OwiFINWnQT7F1n6CoyvdH+zQayvP+2qK8sq3sYVN4DiTC2Jyk
 9YpnR9ubl4jGz78+l2IrhhHw0zcHutGy2OVMXMYYvqZVzcp7QCpXFCP7MY00R6Br
 1eyxzMJX3j9rxDcreNTFZQFqQsCSfla3SMJIHFT1PHiw2O1ZVXp4EUaHb6eCy/nb
 IMgRd37mRCQovE267+LmDMNovSbRXGFu/qhu7QPaKQizqfYTbAzGULbttHJr6P7i
 ciQRG6ZfpbqflsezlijmhDTXI/oK/prn5apo8g6IVAxVBINzpu01+xszpuOKdCg0
 CGliJRShIXwPCAPacq0aFtauRt3RVpbEWOXj3GZU4yof/8wnHOAPZ0/HmFeKmO+4
 BIaa7QASvYUfczVv/Fi0FKdU6c0jQGDCUxVi1XJpxNG0XSiayGEPyN4Y0wDNHuWg
 H+9MPAUhGoyDoMPRBjSKIVzNF7bLJ8VMe3GUBrFcJY+BVLhfXUE=
 =amVp
 -----END PGP SIGNATURE-----

Merge tag 'rcu_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU-vs-KCSAN fixes from Borislav Petkov:
 "A single commit that uses "arch_" atomic operations to avoid the
  instrumentation that comes with the non-"arch_" versions.

  In preparation for that commit, it also has another commit that makes
  these "arch_" atomic operations available to generic code.

  Without these commits, KCSAN uses can see pointless errors"

* tag 'rcu_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rcu: Fixup noinstr warnings
  locking/atomics: Provide the arch_atomic_ interface to generic code
2020-06-28 10:29:38 -07:00
Linus Torvalds 7ecb59a566 Peter Zijlstra says:
Address KCOV vs noinstr. There is no function attribute to selectively
 suppress KCOV instrumentation, instead teach objtool to NOP out the
 calls in noinstr functions.
 
 This cures a bunch of KCOV crashes (as used by syzcaller).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl74pC0ACgkQEsHwGGHe
 VUobXA//cJvRCujUriL6HjZZxmqrWKYyB4kH4yFVycJ7DRflGk3QGLpnHJifWWUL
 eG50obtNI+KOWrr/0lY7XURZgr1mVDe0L3z0tdBJH/rCiQPraDf2JPpCSRRtdq/a
 MvbRXE14z3YLeRI2CurRBH+ZmveBRu2Gv9APPym0CqGBhX3rRRKoyOOiQS95PCZB
 pehuYjbLLrLCQvFoANq3ZwHyLZzczhhwgVBSl+UgdDBwrbM5VC6ByxtEkRgcwoqt
 Tvhji0HqjV4Nqu23/PUsR53hkp+kQrdfe2vaC7IeISWxusMTXCMFOYlZNR4xnQ/f
 M7No8eZK+/j7KsI6/8hfRMvTeis21IMUCV9gRXZYpSWfbf4vKBsYFoIAMxQTNyBo
 t/7BUqwTA9eLtUoaTCZim5a/n1nNWWPnnd74DYmQ7KilGgS3HO9dDwNrPnJhDUYZ
 Ed6Wb0Jgk4s8+TxQEEx8j9bVfpxJGuL+BzcrqdRSCIHV12CRRzUigSadW5/4OR6S
 XNVzY1Si0RGKI5K3OJAZDP5YaPWNXu8SwQUzaZDXjt8qavljqvDfY7GXIdhRNPCY
 6o/H8i/iHXn5v3nSpGKrAeDBqXP8BncvP2ux1Zs3/uBdPgU1dFcYBrUEZxStjDWU
 tyX6tNIU7pGMvXSiEsKzSpb1/LkzR+zG7z//DC3WCYNqP4KdaoE=
 =0Wd6
 -----END PGP SIGNATURE-----

Merge tag 'objtool_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Borislav Petkov:
 "Three fixes from Peter Zijlstra suppressing KCOV instrumentation in
  noinstr sections.

  Peter Zijlstra says:
    "Address KCOV vs noinstr. There is no function attribute to
     selectively suppress KCOV instrumentation, instead teach objtool
     to NOP out the calls in noinstr functions"

  This cures a bunch of KCOV crashes (as used by syzcaller)"

* tag 'objtool_urgent_for_5.8_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix noinstr vs KCOV
  objtool: Provide elf_write_{insn,reloc}()
  objtool: Clean up elf_write() condition
2020-06-28 10:16:15 -07:00
Linus Torvalds a358505d8a Peter Zijlstra says:
These patches address a number of instrumentation issues that were found after
 the x86/entry overhaul. When combined with rcu/urgent and objtool/urgent, these
 patches make UBSAN/KASAN/KCSAN happy again.
 
 Part of making this all work is bumping the minimum GCC version for KASAN
 builds to gcc-8.3, the reason for this is that the __no_sanitize_address
 function attribute is broken in GCC releases before that.
 
 No known GCC version has a working __no_sanitize_undefined, however because the
 only noinstr violation that results from this happens when an UB is found, we
 treat it like WARN. That is, we allow it to violate the noinstr rules in order
 to get the warning out.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl74oWMACgkQEsHwGGHe
 VUpZCw/5AfanXrEixuh4hZLPBOJ7MtW0YI3eyBRJ8j14R8iaK+Hvn/yU4/+qC2jj
 eAlc42QS6Ckzcdknyy8VpHVDR7LR2angN0ePJmrbKsjYq0LTrnfa2H5uABcAQoiW
 0BuGFub0QBRjCkxgsOoG3llqWsTkhRrGX1928lCuuK+8L+kB0bREGMqpR36EBFaS
 wIyLodLO/Bd+YcoWDMvm4I6FvHcdyY3Oq++mzro+5ye7bE9s0PpMC5IXNzmIuGmR
 31UvST+ooRMsM6GlhxHpn6pZuCqfjygXAYuuutwdK10g1f75ESkQdYz9T9KDlHrF
 4GqzcCGtOlN4DAvk3L7KGfHw3XIhioGFxeRT+gGgKsnxoBjvJXJ8x9GrcLA9jdJi
 WeqlqiEOiAa949nclwQQ+fSrx4LgLhJ8bexyOkwiRPx7R75Y0e6OqpxZtE6GiL8O
 BA6Z6cR7U8H4uhKIzZZ0NJiLwO1cSGo5Uz/ERcyg4L23rHYKrDdaQwFSDUxXWq/s
 2lEqISD0WrSwMxJtfET3zB0B20n6IO7Uszo0FdnDFO62fck8HlStZsqV4meoT2Cc
 moqIZsYc3qnESxO9OhWHdSGGAyGS0qcE4Sq/oM8d2dIvIeL4KwHqTE6QFSmcUivi
 QYdXIIQnqJgqX4dmvLFrTuI2Whc86oS40U5/Dhv7BlHx0oewSlg=
 =fcu1
 -----END PGP SIGNATURE-----

Merge tag 'x86_entry_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 entry fixes from Borislav Petkov:
 "This is the x86/entry urgent pile which has accumulated since the
  merge window.

  It is not the smallest but considering the almost complete entry core
  rewrite, the amount of fixes to follow is somewhat higher than usual,
  which is to be expected.

  Peter Zijlstra says:
   'These patches address a number of instrumentation issues that were
    found after the x86/entry overhaul. When combined with rcu/urgent
    and objtool/urgent, these patches make UBSAN/KASAN/KCSAN happy
    again.

    Part of making this all work is bumping the minimum GCC version for
    KASAN builds to gcc-8.3, the reason for this is that the
    __no_sanitize_address function attribute is broken in GCC releases
    before that.

    No known GCC version has a working __no_sanitize_undefined, however
    because the only noinstr violation that results from this happens
    when an UB is found, we treat it like WARN. That is, we allow it to
    violate the noinstr rules in order to get the warning out'"

* tag 'x86_entry_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry: Fix #UD vs WARN more
  x86/entry: Increase entry_stack size to a full page
  x86/entry: Fixup bad_iret vs noinstr
  objtool: Don't consider vmlinux a C-file
  kasan: Fix required compiler version
  compiler_attributes.h: Support no_sanitize_undefined check with GCC 4
  x86/entry, bug: Comment the instrumentation_begin() usage for WARN()
  x86/entry, ubsan, objtool: Whitelist __ubsan_handle_*()
  x86/entry, cpumask: Provide non-instrumented variant of cpu_is_offline()
  compiler_types.h: Add __no_sanitize_{address,undefined} to noinstr
  kasan: Bump required compiler version
  x86, kcsan: Add __no_kcsan to noinstr
  kcsan: Remove __no_kcsan_or_inline
  x86, kcsan: Remove __no_kcsan_or_inline usage
2020-06-28 09:42:47 -07:00
Vincent Guittot e21cf43406 sched/cfs: change initial value of runnable_avg
Some performance regression on reaim benchmark have been raised with
  commit 070f5e860e ("sched/fair: Take into account runnable_avg to classify group")

The problem comes from the init value of runnable_avg which is initialized
with max value. This can be a problem if the newly forked task is finally
a short task because the group of CPUs is wrongly set to overloaded and
tasks are pulled less agressively.

Set initial value of runnable_avg equals to util_avg to reflect that there
is no waiting time so far.

Fixes: 070f5e860e ("sched/fair: Take into account runnable_avg to classify group")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200624154422.29166-1-vincent.guittot@linaro.org
2020-06-28 17:01:20 +02:00
Peter Zijlstra 8c4890d1c3 smp, irq_work: Continue smp_call_function*() and irq_work*() integration
Instead of relying on BUG_ON() to ensure the various data structures
line up, use a bunch of horrible unions to make it all automatic.

Much of the union magic is to ensure irq_work and smp_call_function do
not (yet) see the members of their respective data structures change
name.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lkml.kernel.org/r/20200622100825.844455025@infradead.org
2020-06-28 17:01:20 +02:00
Peter Zijlstra 739f70b476 sched/core: s/WF_ON_RQ/WQ_ON_CPU/
Use a better name for this poorly named flag, to avoid confusion...

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20200622100825.785115830@infradead.org
2020-06-28 17:01:20 +02:00
Peter Zijlstra b6e13e8582 sched/core: Fix ttwu() race
Paul reported rcutorture occasionally hitting a NULL deref:

  sched_ttwu_pending()
    ttwu_do_wakeup()
      check_preempt_curr() := check_preempt_wakeup()
        find_matching_se()
          is_same_group()
            if (se->cfs_rq == pse->cfs_rq) <-- *BOOM*

Debugging showed that this only appears to happen when we take the new
code-path from commit:

  2ebb177175 ("sched/core: Offload wakee task activation if it the wakee is descheduling")

and only when @cpu == smp_processor_id(). Something which should not
be possible, because p->on_cpu can only be true for remote tasks.
Similarly, without the new code-path from commit:

  c6e7bd7afa ("sched/core: Optimize ttwu() spinning on p->on_cpu")

this would've unconditionally hit:

  smp_cond_load_acquire(&p->on_cpu, !VAL);

and if: 'cpu == smp_processor_id() && p->on_cpu' is possible, this
would result in an instant live-lock (with IRQs disabled), something
that hasn't been reported.

The NULL deref can be explained however if the task_cpu(p) load at the
beginning of try_to_wake_up() returns an old value, and this old value
happens to be smp_processor_id(). Further assume that the p->on_cpu
load accurately returns 1, it really is still running, just not here.

Then, when we enqueue the task locally, we can crash in exactly the
observed manner because p->se.cfs_rq != rq->cfs_rq, because p's cfs_rq
is from the wrong CPU, therefore we'll iterate into the non-existant
parents and NULL deref.

The closest semi-plausible scenario I've managed to contrive is
somewhat elaborate (then again, actual reproduction takes many CPU
hours of rcutorture, so it can't be anything obvious):

					X->cpu = 1
					rq(1)->curr = X

	CPU0				CPU1				CPU2

					// switch away from X
					LOCK rq(1)->lock
					smp_mb__after_spinlock
					dequeue_task(X)
					  X->on_rq = 9
					switch_to(Z)
					  X->on_cpu = 0
					UNLOCK rq(1)->lock

									// migrate X to cpu 0
									LOCK rq(1)->lock
									dequeue_task(X)
									set_task_cpu(X, 0)
									  X->cpu = 0
									UNLOCK rq(1)->lock

									LOCK rq(0)->lock
									enqueue_task(X)
									  X->on_rq = 1
									UNLOCK rq(0)->lock

	// switch to X
	LOCK rq(0)->lock
	smp_mb__after_spinlock
	switch_to(X)
	  X->on_cpu = 1
	UNLOCK rq(0)->lock

	// X goes sleep
	X->state = TASK_UNINTERRUPTIBLE
	smp_mb();			// wake X
					ttwu()
					  LOCK X->pi_lock
					  smp_mb__after_spinlock

					  if (p->state)

					  cpu = X->cpu; // =? 1

					  smp_rmb()

	// X calls schedule()
	LOCK rq(0)->lock
	smp_mb__after_spinlock
	dequeue_task(X)
	  X->on_rq = 0

					  if (p->on_rq)

					  smp_rmb();

					  if (p->on_cpu && ttwu_queue_wakelist(..)) [*]

					  smp_cond_load_acquire(&p->on_cpu, !VAL)

					  cpu = select_task_rq(X, X->wake_cpu, ...)
					  if (X->cpu != cpu)
	switch_to(Y)
	  X->on_cpu = 0
	UNLOCK rq(0)->lock

However I'm having trouble convincing myself that's actually possible
on x86_64 -- after all, every LOCK implies an smp_mb() there, so if ttwu
observes ->state != RUNNING, it must also observe ->cpu != 1.

(Most of the previous ttwu() races were found on very large PowerPC)

Nevertheless, this fully explains the observed failure case.

Fix it by ordering the task_cpu(p) load after the p->on_cpu load,
which is easy since nothing actually uses @cpu before this.

Fixes: c6e7bd7afa ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200622125649.GC576871@hirez.programming.kicks-ass.net
2020-06-28 17:01:20 +02:00
Juri Lelli 740797ce3a sched/core: Fix PI boosting between RT and DEADLINE tasks
syzbot reported the following warning:

 WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
 enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504

At deadline.c:628 we have:

 623 static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
 624 {
 625 	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
 626 	struct rq *rq = rq_of_dl_rq(dl_rq);
 627
 628 	WARN_ON(dl_se->dl_boosted);
 629 	WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
        [...]
     }

Which means that setup_new_dl_entity() has been called on a task
currently boosted. This shouldn't happen though, as setup_new_dl_entity()
is only called when the 'dynamic' deadline of the new entity
is in the past w.r.t. rq_clock and boosted tasks shouldn't verify this
condition.

Digging through the PI code I noticed that what above might in fact happen
if an RT tasks blocks on an rt_mutex hold by a DEADLINE task. In the
first branch of boosting conditions we check only if a pi_task 'dynamic'
deadline is earlier than mutex holder's and in this case we set mutex
holder to be dl_boosted. However, since RT 'dynamic' deadlines are only
initialized if such tasks get boosted at some point (or if they become
DEADLINE of course), in general RT 'dynamic' deadlines are usually equal
to 0 and this verifies the aforementioned condition.

Fix it by checking that the potential donor task is actually (even if
temporary because in turn boosted) running at DEADLINE priority before
using its 'dynamic' deadline value.

Fixes: 2d3d891d33 ("sched/deadline: Add SCHED_DEADLINE inheritance logic")
Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Tested-by: Daniel Wagner <dwagner@suse.de>
Link: https://lkml.kernel.org/r/20181119153201.GB2119@localhost.localdomain
2020-06-28 17:01:20 +02:00
Juri Lelli ce9bc3b27f sched/deadline: Initialize ->dl_boosted
syzbot reported the following warning triggered via SYSC_sched_setattr():

  WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 setup_new_dl_entity /kernel/sched/deadline.c:594 [inline]
  WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 enqueue_dl_entity /kernel/sched/deadline.c:1370 [inline]
  WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 enqueue_task_dl+0x1c17/0x2ba0 /kernel/sched/deadline.c:1441

This happens because the ->dl_boosted flag is currently not initialized by
__dl_clear_params() (unlike the other flags) and setup_new_dl_entity()
rightfully complains about it.

Initialize dl_boosted to 0.

Fixes: 2d3d891d33 ("sched/deadline: Add SCHED_DEADLINE inheritance logic")
Reported-by: syzbot+5ac8bac25f95e8b221e7@syzkaller.appspotmail.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Daniel Wagner <dwagner@suse.de>
Link: https://lkml.kernel.org/r/20200617072919.818409-1-juri.lelli@redhat.com
2020-06-28 17:01:20 +02:00
Scott Wood fd844ba9ae sched/core: Check cpus_mask, not cpus_ptr in __set_cpus_allowed_ptr(), to fix mask corruption
This function is concerned with the long-term CPU mask, not the
transitory mask the task might have while migrate disabled.  Before
this patch, if a task was migrate-disabled at the time
__set_cpus_allowed_ptr() was called, and the new mask happened to be
equal to the CPU that the task was running on, then the mask update
would be lost.

Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200617121742.cpxppyi7twxmpin7@linutronix.de
2020-06-28 17:01:20 +02:00