Commit Graph

826 Commits

Author SHA1 Message Date
Paul E. McKenney ad4e25a3a1 Merge branches 'doc.2017.10.20a', 'fixes.2017.10.19a', 'stall.2017.10.09a' and 'torture.2017.10.09a' into HEAD
doc.2017.10.20a: Documentation updates.
fixes.2017.10.19a: Miscellaneous fixes.
stall.2017.10.09a: RCU CPU stall-warning updates.
torture.2017.10.09a: Torture-test updates.
2017-10-20 11:11:15 -07:00
Paul E. McKenney e4d0b679a8 srcu: Add parameters to SRCU docbook comments
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-20 11:09:33 -07:00
Paul E. McKenney 27fdb35fe9 doc: Fix various RCU docbook comment-header problems
Because many of RCU's files have not been included into docbook, a
number of errors have accumulated.  This commit fixes them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-19 22:26:11 -04:00
Sebastian Andrzej Siewior 56628a7fc8 rcu/segcblist: Include rcupdate.h
The RT build on ARM complains about non-existing ULONG_CMP_LT.
This commit therefore includes rcupdate.h into rcu_segcblist.c.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-19 12:13:36 -07:00
Paul E. McKenney c0da313e09 rcu: Add extended-quiescent-state testing advice
If you add or remove calls to rcu_idle_enter(), rcu_user_enter(),
rcu_irq_exit(), rcu_irq_exit_irqson(), rcu_idle_exit(), rcu_user_exit(),
rcu_irq_enter(), rcu_irq_enter_irqson(), rcu_nmi_enter(), or
rcu_nmi_exit(), you should run a full set of tests on a kernel built
with CONFIG_RCU_EQS_DEBUG=y.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-19 12:13:36 -07:00
Paul E. McKenney 02a7c234e5 rcu: Suppress lockdep false-positive ->boost_mtx complaints
RCU priority boosting uses rt_mutex_init_proxy_locked() to initialize an
rt_mutex structure in locked state held by some other task.  When that
other task releases it, lockdep complains (quite accurately, but a bit
uselessly) that the other task never acquired it.  This complaint can
suppress other, more helpful, lockdep complaints, and in any case it is
a false positive.

This commit therefore switches from rt_mutex_unlock() to
rt_mutex_futex_unlock(), thereby avoiding the lockdep annotations.
Of course, if lockdep ever learns about rt_mutex_init_proxy_locked(),
addtional adjustments will be required.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-19 12:13:36 -07:00
Sebastian Andrzej Siewior b88697810d rcu: Do not include rtmutex_common.h unconditionally
This commit adjusts include files and provides definitions in preparation
for suppressing lockdep false-positive ->boost_mtx complaints.  Without
this preparation, architectures not supporting rt_mutex will get build
failures.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-19 12:12:06 -07:00
Paul E. McKenney 0032f4e889 rcutorture: Dump writer stack if stalled
Right now, rcutorture warns if an rcu_torture_writer() kthread stalls,
but this warning is not always all that helpful.  This commit therefore
makes the first such warning include a stack dump.

This in turn requires that sched_show_task() be exported to GPL modules,
so this commit makes that change as well.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-09 14:26:09 -07:00
Paul E. McKenney 2b1516e55f rcutorture: Add interrupt-disable capability to stall-warning tests
When rcutorture sees the rcutorture.stall_cpu kernel boot parameter,
it loops with preemption disabled, which does in fact normally
generate an RCU CPU stall warning message.  However, there are test
scenarios that need the stalling CPU to have interrupts disabled.
This commit therefore adds an rcutorture.stall_cpu_irqsoff kernel
boot parameter that causes the stalling CPU to disable interrupts.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-09 14:26:08 -07:00
Paul E. McKenney f22ce09157 rcu: Suppress RCU CPU stall warnings while dumping trace
Currently, RCU emits Suppress RCU CPU stall warnings during its
automatically initiated ftrace_dump() calls after detecting an error
condition, which can result in excessively excessive console output
and lost trace events.  This commit therefore suppresses RCU CPU stall
warnings across any of these ftrace_dump() calls.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-09 14:25:17 -07:00
Paul E. McKenney 83b6ca1fed rcu: Turn off tracing before dumping trace
Currently, RCU allows tracing to continue when it automatically does
ftrace_dump() after detecting an error condition, which can result in
excessively large traces and lost trace events.  This commit therefore
does a tracing_off() before any of these ftrace_dump() calls.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-09 14:25:17 -07:00
Paul E. McKenney 9b9500da81 rcu: Make RCU CPU stall warnings check for irq-disabled CPUs
One common question upon seeing an RCU CPU stall warning is "did
the stalled CPUs have interrupts disabled?"  However, the current
stall warnings are silent on this point.  This commit therefore
uses irq_work to check whether stalled CPUs still respond to IPIs,
and flags this state in the RCU CPU stall warning console messages.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-09 14:25:17 -07:00
Paul E. McKenney f79c3ad618 sched,rcu: Make cond_resched() provide RCU quiescent state
There is some confusion as to which of cond_resched() or
cond_resched_rcu_qs() should be added to long in-kernel loops.
This commit therefore eliminates the decision by adding RCU quiescent
states to cond_resched().  This commit also simplifies the code that
used to interact with cond_resched_rcu_qs(), and that now interacts with
cond_resched(), to reduce its overhead.  This reduction is necessary to
allow the heavier-weight cond_resched_rcu_qs() mechanism to be invoked
everywhere that cond_resched() is invoked.

Part of that reduction in overhead converts the jiffies_till_sched_qs
kernel parameter to read-only at runtime, thus eliminating the need for
bounds checking.

Reported-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
[ paulmck: Keep PREEMPT=n cond_resched a no-op, per Peter Zijlstra. ]
2017-10-09 14:25:17 -07:00
Paul E. McKenney c63eb17ff0 rcu: Create call_rcu_tasks() kthread at boot time
Currently the call_rcu_tasks() kthread is created upon first
invocation of call_rcu_tasks().  This has the advantage of avoiding
creation if there are never any invocations of call_rcu_tasks() and of
synchronize_rcu_tasks(), but it requires an unreliable heuristic to
determine when it is safe to create the kthread.  For example, it is
not safe to create the kthread when call_rcu_tasks() is invoked with
a spinlock held, but there is no good way to detect this in !PREEMPT
kernels.

This commit therefore creates this kthread unconditionally at
core_initcall() time.  If you don't want this kthread created, then
build with CONFIG_TASKS_RCU=n.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-10-09 14:24:14 -07:00
Neeraj Upadhyay 135bd1a230 rcu: Fix up pending cbs check in rcu_prepare_for_idle
The pending-callbacks check in rcu_prepare_for_idle() is backwards.
It should accelerate if there are pending callbacks, but the check
rather uselessly accelerates only if there are no callbacks.  This commit
therefore inverts this check.

Fixes: 15fecf89e4 ("srcu: Abstract multi-tail callback list handling")
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # 4.12.x
2017-10-09 14:24:14 -07:00
Linus Torvalds 013a8ee628 Two updates.
- A memory fix with left over code from spliting out ftrace_ops
    and function graph tracer, where the function graph tracer could
    reset the trampoline pointer, leaving the old trampoline not to
    be freed (memory leak).
 
  - The update to Paul's patch that added the unnecessary READ_ONCE().
    This removes the unnecessary READ_ONCE() instead of having to rebase
    the branch to update the patch that added it.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEQEw9Eu0DdyUUkuUUybkF8mrZjcsFAlnU++sUHHJvc3RlZHRA
 Z29vZG1pcy5vcmcACgkQybkF8mrZjcujzgf/ebIzGKe5vQKNrL4ITAcIz0T7Hvzl
 pWw4uJp8kqO9x9EHMnztAkltQigvjvgDKZozJpUGgtNsFLuvdgQSBMK24YV8vLHs
 UmXEnQ2tSB/2Sg2ccEnpjVXaMzL9aqlbeTmACbdd9UgZnvPiUYPejq2jFfECFQjb
 k/gZT911ukBtx4mXYKzGFbTEZHdc/YUs6Y/wzB1ox5BBIUh71ZDZXxQTUHfXHlwS
 Cst69/9dKl4nBEGDGas6/95iR+ORVv85osI/pqPtjSj4EkRnWfVRotaH1kNuSQil
 gDIHSoy35NfXJx77/5IFHfrjFBAkr0IYRNL/jZaWazwM7rdqfAN8TwMQuA==
 =4CtF
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixlets from Steven Rostedt:
 "Two updates:

   - A memory fix with left over code from spliting out ftrace_ops and
     function graph tracer, where the function graph tracer could reset
     the trampoline pointer, leaving the old trampoline not to be freed
     (memory leak).

   - The update to Paul's patch that added the unnecessary READ_ONCE().
     This removes the unnecessary READ_ONCE() instead of having to
     rebase the branch to update the patch that added it"

* tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  rcu: Remove extraneous READ_ONCE()s from rcu_irq_{enter,exit}()
  ftrace: Fix kmemleak in unregister_ftrace_graph
2017-10-04 08:34:01 -07:00
Paul E. McKenney f39b536ce9 rcu: Remove extraneous READ_ONCE()s from rcu_irq_{enter,exit}()
The read of ->dynticks_nmi_nesting in rcu_irq_enter() and rcu_irq_exit()
is currently protected with READ_ONCE().  However, this protection is
unnecessary because (1) ->dynticks_nmi_nesting is updated only by the
current CPU, (2) Although NMI handlers can update this field, they reset
it back to its old value before return, and (3) Interrupts are disabled,
so nothing else can modify it.  The value of ->dynticks_nmi_nesting is
thus effectively constant, and so no protection is required.

This commit therefore removes the READ_ONCE() protection from these
two accesses.

Link: http://lkml.kernel.org/r/20170926031902.GA2074@linux.vnet.ibm.com

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-10-03 10:27:32 -04:00
Linus Torvalds ac0a36461f Stack tracing and RCU has been having issues with each other and lockdep
has been pointing out constant problems. The changes have been going into
 the stack tracer, but it has been discovered that the problem isn't
 with the stack tracer itself, but it is with calling save_stack_trace()
 from within the internals of RCU. The stack tracer is the one that
 can trigger the issue the easiest, but examining the problem further,
 it could also happen from a WARN() in the wrong place, or even if
 an NMI happened in this area and it did an rcu_read_lock().
 
 The critical area is where RCU is not watching. Which can happen while
 going to and from idle, or bringing up or taking down a CPU.
 
 The final fix was to put the protection in kernel_text_address() as it
 is the one that requires RCU to be watching while doing the stack trace.
 
 To make this work properly, Paul had to allow rcu_irq_enter() happen after
 rcu_nmi_enter(). This should have been done anyway, since an NMI can
 page fault (reading vmalloc area), and a page fault triggers rcu_irq_enter().
 
 One patch is just a consolidation of code so that the fix only needed
 to be done in one location.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEQEw9Eu0DdyUUkuUUybkF8mrZjcsFAlnGyXoUHHJvc3RlZHRA
 Z29vZG1pcy5vcmcACgkQybkF8mrZjctKtwf8CeKGqOdlqkZEafIpWaIASXmAVMO/
 WE+hQK+rCydWFvzADgb/rOmsR0ou8WGEXcuUPxVxmvMyqhKhZ6AU1hE/7Y8P0pMq
 F4bev+j3lAJC65ezFAh+ZQcIjaRIH4MFVPsUTaibSPSN7xziMNIpbf9VOVfpUm8A
 jf9p6YAmyhFVi6DstCc29SWnywEVwC2ZWRVKRPXKry8/dPxjfVcLclGX680Eqi9I
 EnYaOdC/mGbtvHPOUSs/P0cfxExHmyEErQHeOV8FPymj6KJ6+KoYIiELNlTHUBj/
 eeKzrHc/b3j+lz0RPlA8WxYmpmEm4SE5cV3vRebdBNUBrABSN1RxeOozyQ==
 =1KkS
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Stack tracing and RCU has been having issues with each other and
  lockdep has been pointing out constant problems.

  The changes have been going into the stack tracer, but it has been
  discovered that the problem isn't with the stack tracer itself, but it
  is with calling save_stack_trace() from within the internals of RCU.

  The stack tracer is the one that can trigger the issue the easiest,
  but examining the problem further, it could also happen from a WARN()
  in the wrong place, or even if an NMI happened in this area and it did
  an rcu_read_lock().

  The critical area is where RCU is not watching. Which can happen while
  going to and from idle, or bringing up or taking down a CPU.

  The final fix was to put the protection in kernel_text_address() as it
  is the one that requires RCU to be watching while doing the stack
  trace.

  To make this work properly, Paul had to allow rcu_irq_enter() happen
  after rcu_nmi_enter(). This should have been done anyway, since an NMI
  can page fault (reading vmalloc area), and a page fault triggers
  rcu_irq_enter().

  One patch is just a consolidation of code so that the fix only needed
  to be done in one location"

* tag 'trace-v4.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Remove RCU work arounds from stack tracer
  extable: Enable RCU if it is not watching in kernel_text_address()
  extable: Consolidate *kernel_text_address() functions
  rcu: Allow for page faults in NMI handlers
2017-09-25 15:22:31 -07:00
Paul E. McKenney 28585a8326 rcu: Allow for page faults in NMI handlers
A number of architecture invoke rcu_irq_enter() on exception entry in
order to allow RCU read-side critical sections in the exception handler
when the exception is from an idle or nohz_full CPU.  This works, at
least unless the exception happens in an NMI handler.  In that case,
rcu_nmi_enter() would already have exited the extended quiescent state,
which would mean that rcu_irq_enter() would (incorrectly) cause RCU
to think that it is again in an extended quiescent state.  This will
in turn result in lockdep splats in response to later RCU read-side
critical sections.

This commit therefore causes rcu_irq_enter() and rcu_irq_exit() to
take no action if there is an rcu_nmi_enter() in effect, thus avoiding
the unscheduled return to RCU quiescent state.  This in turn should
make the kernel safe for on-demand RCU voyeurism.

Link: http://lkml.kernel.org/r/20170922211022.GA18084@linux.vnet.ibm.com

Cc: stable@vger.kernel.org
Fixes: 0be964be0 ("module: Sanitize RCU usage and locking")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-09-23 16:49:42 -04:00
Alexey Dobriyan 9b130ad5bb treewide: make "nr_cpu_ids" unsigned
First, number of CPUs can't be negative number.

Second, different signnnedness leads to suboptimal code in the following
cases:

1)
	kmalloc(nr_cpu_ids * sizeof(X));

"int" has to be sign extended to size_t.

2)
	while (loff_t *pos < nr_cpu_ids)

MOVSXD is 1 byte longed than the same MOV.

Other cases exist as well. Basically compiler is told that nr_cpu_ids
can't be negative which can't be deduced if it is "int".

Code savings on allyesconfig kernel: -3KB

	add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
	function                                     old     new   delta
	coretemp_cpu_online                          450     512     +62
	rcu_init_one                                1234    1272     +38
	pci_device_probe                             374     399     +25

				...

	pgdat_reclaimable_pages                      628     556     -72
	select_fallback_rq                           446     369     -77
	task_numa_find_cpu                          1923    1807    -116

Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:48 -07:00
Paul E. McKenney 656e7c0c0a Merge branches 'doc.2017.08.17a', 'fixes.2017.08.17a', 'hotplug.2017.07.25b', 'misc.2017.08.17a', 'spin_unlock_wait_no.2017.08.17a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD
doc.2017.08.17a: Documentation updates.
fixes.2017.08.17a: RCU fixes.
hotplug.2017.07.25b: CPU-hotplug updates.
misc.2017.08.17a: Miscellaneous fixes outside of RCU (give or take conflicts).
spin_unlock_wait_no.2017.08.17a: Remove spin_unlock_wait().
srcu.2017.07.27c: SRCU updates.
torture.2017.07.24c: Torture-test updates.
2017-08-17 08:10:04 -07:00
Paul E. McKenney 16c0b10607 rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter()
The rcu_idle_exit() and rcu_idle_enter() functions are exported because
they were originally used by RCU_NONIDLE(), which was intended to
be usable from modules.  However, RCU_NONIDLE() now instead uses
rcu_irq_enter_irqson() and rcu_irq_exit_irqson(), which are not
exported, and there have been no complaints.

This commit therefore removes the exports from rcu_idle_exit() and
rcu_idle_enter().

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:25 -07:00
Paul E. McKenney d4db30af51 rcu: Add warning to rcu_idle_enter() for irqs enabled
All current callers of rcu_idle_enter() have irqs disabled, and
rcu_idle_enter() relies on this, but doesn't check.  This commit
therefore adds a RCU_LOCKDEP_WARN() to add some verification to the trust.
While we are there, pass "true" rather than "1" to rcu_eqs_enter().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:25 -07:00
Peter Zijlstra (Intel) 3a60799269 rcu: Make rcu_idle_enter() rely on callers disabling irqs
All callers to rcu_idle_enter() have irqs disabled, so there is no
point in rcu_idle_enter disabling them again.  This commit therefore
replaces the irq disabling with a RCU_LOCKDEP_WARN().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:24 -07:00
Paul E. McKenney 2dee9404fa rcu: Add assertions verifying blocked-tasks list
This commit adds assertions verifying the consistency of the rcu_node
structure's ->blkd_tasks list and its ->gp_tasks, ->exp_tasks, and
->boost_tasks pointers.  In particular, the ->blkd_tasks lists must be
empty except for leaf rcu_node structures.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:23 -07:00
Masami Hiramatsu 35fe723bda rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit()
Set disable_rcu_irq_enter on not only rcu_eqs_enter_common() but also
rcu_eqs_exit(), since rcu_eqs_exit() suffers from the same issue as was
fixed for rcu_eqs_enter_common() by commit 03ecd3f48e ("rcu/tracing:
Add rcu_disabled to denote when rcu_irq_enter() will not work").

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:23 -07:00
Paul E. McKenney d8db2e86d8 rcu: Add TPS() protection for _rcu_barrier_trace strings
The _rcu_barrier_trace() function is a wrapper for trace_rcu_barrier(),
which needs TPS() protection for strings passed through the second
argument.  However, it has escaped prior TPS()-ification efforts because
it _rcu_barrier_trace() does not start with "trace_".  This commit
therefore adds the needed TPS() protection

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-17 07:26:22 -07:00
Luis R. Rodriguez d5374226c3 rcu: Use idle versions of swait to make idle-hack clear
These RCU waits were set to use interruptible waits to avoid the kthreads
contributing to system load average, even though they are not interruptible
as they are spawned from a kthread. Use the new TASK_IDLE swaits which makes
our goal clear, and removes confusion about these paths possibly being
interruptible -- they are not.

When the system is idle the RCU grace-period kthread will spend all its time
blocked inside the swait_event_interruptible(). If the interruptible() was
not used, then this kthread would contribute to the load average. This means
that an idle system would have a load average of 2 (or 3 if PREEMPT=y),
rather than the load average of 0 that almost fifty years of UNIX has
conditioned sysadmins to expect.

The same argument applies to swait_event_interruptible_timeout() use. The
RCU grace-period kthread spends its time blocked inside this call while
waiting for grace periods to complete. In particular, if there was only one
busy CPU, but that CPU was frequently invoking call_rcu(), then the RCU
grace-period kthread would spend almost all its time blocked inside the
swait_event_interruptible_timeout(). This would mean that the load average
would be 2 rather than the expected 1 for the single busy CPU.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:15 -07:00
Paul E. McKenney c5ebe66ce7 rcu: Add event tracing to ->gp_tasks update at GP start
There is currently event tracing to track when a task is preempted
within a preemptible RCU read-side critical section, and also when that
task subsequently reaches its outermost rcu_read_unlock(), but none
indicating when a new grace period starts when that grace period must
wait on pre-existing readers that have been been preempted at least once
since the beginning of their current RCU read-side critical sections.

This commit therefore adds an event trace at grace-period start in
the case where there are such readers.  Note that only the first
reader in the list is traced.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-17 07:26:06 -07:00
Paul E. McKenney 7414fac050 rcu: Move rcu.h to new trivial-function style
This commit saves a few lines in kernel/rcu/rcu.h by moving to single-line
definitions for trivial functions, instead of the old style where the
two curly braces each get their own line.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:06 -07:00
Paul E. McKenney bedbb648ef rcu: Add TPS() to event-traced strings
Strings used in event tracing need to be specially handled, for example,
using the TPS() macro.  Without the TPS() macro, although output looks
fine from within a running kernel, extracting traces from a crash dump
produces garbage instead of strings.  This commit therefore adds the TPS()
macro to some unadorned strings that were passed to event-tracing macros.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-17 07:26:05 -07:00
Paul E. McKenney ccdd29ffff rcu: Create reasonable API for do_exit() TASKS_RCU processing
Currently, the exit-time support for TASKS_RCU is open-coded in do_exit().
This commit creates exit_tasks_rcu_start() and exit_tasks_rcu_finish()
APIs for do_exit() use.  This has the benefit of confining the use of the
tasks_rcu_exit_srcu variable to one file, allowing it to become static.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17 07:26:05 -07:00
Paul E. McKenney 7e42776d5e rcu: Drive TASKS_RCU directly off of PREEMPT
The actual use of TASKS_RCU is only when PREEMPT, otherwise RCU-sched
is used instead.  This commit therefore makes synchronize_rcu_tasks()
and call_rcu_tasks() available always, but mapped to synchronize_sched()
and call_rcu_sched(), respectively, when !PREEMPT.  This approach also
allows some #ifdefs to be removed from rcutorture.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
2017-08-17 07:26:04 -07:00
Paul E. McKenney 35732cf9dd srcu: Provide ordering for CPU not involved in grace period
Tree RCU guarantees that every online CPU has a memory barrier between
any given grace period and any of that CPU's RCU read-side sections that
must be ordered against that grace period.  Since RCU doesn't always
know where read-side critical sections are, the actual implementation
guarantees order against prior and subsequent non-idle non-offline code,
whether in an RCU read-side critical section or not.  As a result, there
does not need to be a memory barrier at the end of synchronize_rcu()
and friends because the ordering internal to the grace period has
ordered every CPU's post-grace-period execution against each CPU's
pre-grace-period execution, again for all non-idle online CPUs.

In contrast, SRCU can have non-idle online CPUs that are completely
uninvolved in a given SRCU grace period, for example, a CPU that
never runs any SRCU read-side critical sections and took no part in
the grace-period processing.  It is in theory possible for a given
synchronize_srcu()'s wakeup to be delivered to a CPU that was completely
uninvolved in the prior SRCU grace period, which could mean that the
code following that synchronize_srcu() would end up being unordered with
respect to both the grace period and any pre-existing SRCU read-side
critical sections.

This commit therefore adds an smp_mb() to the end of __synchronize_srcu(),
which prevents this scenario from occurring.

Reported-by: Lance Roy <ldr709@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Lance Roy <ldr709@gmail.com>
Cc: <stable@vger.kernel.org> # 4.12.x
2017-07-27 15:53:04 -07:00
Paul E. McKenney 09efeeee17 rcu: Move callback-list warning to irq-disable region
After adopting callbacks from a newly offlined CPU, the adopting CPU
checks to make sure that its callback list's count is zero only if the
list has no callbacks and vice versa.  Unfortunately, it does so after
enabling interrupts, which means that false positives are possible due to
interrupt handlers invoking call_rcu().  Although these false positives
are improbable, rcutorture did make it happen once.

This commit therefore moves this check to an irq-disabled region of code,
thus suppressing the false positive.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:50 -07:00
Paul E. McKenney aed4e04686 rcu: Remove unused RCU list functions
Given changes to callback migration, rcu_cblist_head(),
rcu_cblist_tail(), rcu_cblist_count_cbs(), rcu_segcblist_segempty(),
rcu_segcblist_dequeued_lazy(), and rcu_segcblist_new_cbs() are
no longer used.  This commit therefore removes them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:49 -07:00
Paul E. McKenney f2dbe4a562 rcu: Localize rcu_state ->orphan_pend and ->orphan_done
Given that the rcu_state structure's >orphan_pend and ->orphan_done
fields are used only during migration of callbacks from the recently
offlined CPU to a surviving CPU, if rcu_send_cbs_to_orphanage() and
rcu_adopt_orphan_cbs() are combined, these fields can become local
variables in the combined function.  This commit therefore combines
rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() into a new
rcu_segcblist_merge() function and removes the ->orphan_pend and
->orphan_done fields.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:49 -07:00
Paul E. McKenney 21cc248384 rcu: Advance callbacks after migration
When migrating callbacks from a newly offlined CPU, we are already
holding the root rcu_node structure's lock, so it costs almost nothing
to advance and accelerate the newly migrated callbacks.  This patch
therefore makes this advancing and acceleration happen.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:48 -07:00
Paul E. McKenney 537b85c870 rcu: Eliminate rcu_state ->orphan_lock
The ->orphan_lock is acquired and released only within the
rcu_migrate_callbacks() function, which now acquires the root rcu_node
structure's ->lock.  This commit therefore eliminates the ->orphan_lock
in favor of the root rcu_node structure's ->lock.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:48 -07:00
Paul E. McKenney 9fa46fb8c9 rcu: Advance outgoing CPU's callbacks before migrating them
It is possible that the outgoing CPU is unaware of recent grace periods,
and so it is also possible that some of its pending callbacks are actually
ready to be invoked.  The current callback-migration code would needlessly
force these callbacks to pass through another grace period.  This commit
therefore invokes rcu_advance_cbs() on the outgoing CPU's callbacks in
order to give them full credit for having passed through any recent
grace periods.

This also fixes an odd theoretical bug where there are no callbacks in
the system except for those on the outgoing CPU, none of those callbacks
have yet been associated with a grace-period number, there is never again
another callback registered, and the surviving CPU never again takes a
scheduling-clock interrupt, never goes idle, and never enters nohz_full
userspace execution.  Yes, this is (just barely) possible.  It requires
that the surviving CPU be a nohz_full CPU, that its scheduler-clock
interrupt be shut off, and that it loop forever in the kernel.  You get
bonus points if you can make this one happen!  ;-)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:47 -07:00
Paul E. McKenney b1a2d79fe7 rcu: Make NOCB CPUs migrate CBs directly from outgoing CPU
RCU's CPU-hotplug callback-migration code first moves the outgoing
CPU's callbacks to ->orphan_done and ->orphan_pend, and only then
moves them to the NOCB callback list.  This commit avoids the
extra step (and simplifies the code) by moving the callbacks directly
from the outgoing CPU's callback list to the NOCB callback list.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:47 -07:00
Paul E. McKenney 95335c0355 rcu: Check for NOCB CPUs and empty lists earlier in CB migration
The current CPU-hotplug RCU-callback-migration code checks
for the source (newly offlined) CPU being a NOCBs CPU down in
rcu_send_cbs_to_orphanage().  This commit simplifies callback migration a
bit by moving this check up to rcu_migrate_callbacks().  This commit also
adds a check for the source CPU having no callbacks, which eases analysis
of the rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() functions.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:46 -07:00
Paul E. McKenney c47e067a3c rcu: Remove orphan/adopt event-tracing fields
The rcu_node structure's ->n_cbs_orphaned and ->n_cbs_adopted fields
are updated, but never read.  This commit therefore removes them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:46 -07:00
Paul E. McKenney 313517fc44 rcu: Make expedited GPs correctly handle hardware CPU insertion
The update of the ->expmaskinitnext and of ->ncpus are unsynchronized,
with the value of ->ncpus being incremented long before the corresponding
->expmaskinitnext mask is updated.  If an RCU expedited grace period
sees ->ncpus change, it will update the ->expmaskinit masks from the new
->expmaskinitnext masks.  But it is possible that ->ncpus has already
been updated, but the ->expmaskinitnext masks still have their old values.
For the current expedited grace period, no harm done.  The CPU could not
have been online before the grace period started, so there is no need to
wait for its non-existent pre-existing readers.

But the next RCU expedited grace period is in a world of hurt.  The value
of ->ncpus has already been updated, so this grace period will assume
that the ->expmaskinitnext masks have not changed.  But they have, and
they won't be taken into account until the next never-been-online CPU
comes online.  This means that RCU will be ignoring some CPUs that it
should be paying attention to.

The solution is to update ->ncpus and ->expmaskinitnext while holding
the ->lock for the rcu_node structure containing the ->expmaskinitnext
mask.  Because smp_store_release() is now used to update ->ncpus and
smp_load_acquire() is now used to locklessly read it, if the expedited
grace period sees ->ncpus change, then the updating CPU has to
already be holding the corresponding ->lock.  Therefore, when the
expedited grace period later acquires that ->lock, it is guaranteed
to see the new value of ->expmaskinitnext.

On the other hand, if the expedited grace period loads ->ncpus just
before an update, earlier full memory barriers guarantee that
the incoming CPU isn't far enough along to be running any RCU readers.

This commit therefore makes the required change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25 13:04:45 -07:00
Paul E. McKenney a58163d8ca rcu: Migrate callbacks earlier in the CPU-offline timeline
RCU callbacks must be migrated away from an outgoing CPU, and this is
done near the end of the CPU-hotplug operation, after the outgoing CPU is
long gone.  Unfortunately, this means that other CPU-hotplug callbacks
can execute while the outgoing CPU's callbacks are still immobilized
on the long-gone CPU's callback lists.  If any of these CPU-hotplug
callbacks must wait, either directly or indirectly, for the invocation
of any of the immobilized RCU callbacks, the system will hang.

This commit avoids such hangs by migrating the callbacks away from the
outgoing CPU immediately upon its departure, shortly after the return
from __cpu_die() in takedown_cpu().  Thus, RCU is able to advance these
callbacks and invoke them, which allows all the after-the-fact CPU-hotplug
callbacks to wait on these RCU callbacks without risk of a hang.

While in the neighborhood, this commit also moves rcu_send_cbs_to_orphanage()
and rcu_adopt_orphan_cbs() under a pre-existing #ifdef to avoid including
dead code on the one hand and to avoid define-without-use warnings on the
other hand.

Reported-by: Jeffrey Hugo <jhugo@codeaurora.org>
Link: http://lkml.kernel.org/r/db9c91f6-1b17-6136-84f0-03c3c2581ab4@codeaurora.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Richard Weinberger <richard@nod.at>
2017-07-25 13:03:43 -07:00
Paul E. McKenney 8be6e1b15c rcu: Use timer as backstop for NOCB deferred wakeups
The handling of RCU's no-CBs CPUs has a maintenance headache, namely
that if call_rcu() is invoked with interrupts disabled, the rcuo kthread
wakeup must be defered to a point where we can be sure that scheduler
locks are not held.  Of course, there are a lot of code paths leading
from an interrupts-disabled invocation of call_rcu(), and missing any
one of these can result in excessive callback-invocation latency, and
potentially even system hangs.

This commit therefore uses a timer to guarantee that the wakeup will
eventually occur.  If one of the deferred-wakeup points kicks in, then
the timer is simply cancelled.

This commit also fixes up an incomplete removal of commits that were
intended to plug remaining exit paths, which should have the added
benefit of reducing the overhead of RCU's context-switch hooks.  In
addition, it simplifies leader-to-follower callback-list handoff by
introducing locking.  The call_rcu()-to-leader handoff continues to
use atomic operations in order to maintain good real-time latency for
common-case use of call_rcu().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Dan Carpenter fix for mod_timer() usage bug found by smatch. ]
2017-07-25 09:53:09 -07:00
Paul E. McKenney f34c8585ed rcutorture: Invoke call_rcu() from timer handler
The Linux kernel invokes call_rcu() from various interrupt/softirq
handlers, but rcutorture does not.  This commit therefore adds this
behavior to rcutorture's repertoire.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:19 -07:00
Paul E. McKenney 96036c4306 rcu: Add last-CPU to GP-kthread starvation messages
This commit augments the grace-period-kthread starvation debugging
messages by adding the last CPU that ran the kthread.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:18 -07:00
Paul E. McKenney a3b7b6c273 rcutorture: Eliminate unused ts_rem local from rcu_trace_clock_local()
This commit removes an unused local variable named ts_rem that is
marked __maybe_unused.  Yes, the variable was assigned to, but it
was never used beyond that point, hence not needed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:17 -07:00
Paul E. McKenney 808de39cf4 rcutorture: Add task's CPU for rcutorture writer stalls
It appears that at least some of the rcutorture writer stall messages
coincide with unusually long CPU-online operations, for example, no
fewer than 205 seconds in a recent test.  It is of course possible that
the writer stall is not unrelated to this unusually long CPU-hotplug
operation, and so this commit adds the rcutorture writer task's CPU to
the stall message to gain more information about this possible connection.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:17 -07:00
Paul E. McKenney b3c983142d rcutorture: Place event-traced strings into trace buffer
Strings used in event tracing need to be specially handled, for example,
being copied to the trace buffer instead of being pointed to by the trace
buffer.  Although the TPS() macro can be used to "launder" pointed-to
strings, this might not be all that effective within a loadable module.
This commit therefore copies rcutorture's strings to the trace buffer.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2017-07-24 16:04:12 -07:00
Paul E. McKenney 5e741fa9e9 rcutorture: Enable SRCU readers from timer handler
Now that it is legal to invoke srcu_read_lock() and srcu_read_unlock()
for a given srcu_struct from both process context and {soft,}irq
handlers, it is time to test it.  This commit therefore enables
testing of SRCU readers from rcutorture's timer handler, using in_task()
to determine whether or not it is safe to sleep in the SRCU read-side
critical sections.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:11 -07:00
Paul E. McKenney f1dbc54b92 rcu: Remove CONFIG_TASKS_RCU ifdef from rcuperf.c
The synchronize_rcu_tasks() and call_rcu_tasks() APIs are now available
regardless of kernel configuration, so this commit removes the
CONFIG_TASKS_RCU ifdef from rcuperf.c.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:09 -07:00
Paul E. McKenney ac3748c604 rcutorture: Print SRCU lock/unlock totals
This commit adds printing of SRCU lock/unlock totals, which are just
the sums of the per-CPU counts.  Saves a bit of mental arithmetic.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:08 -07:00
Paul E. McKenney 115a1a5285 rcutorture: Move SRCU status printing to SRCU implementations
This commit gets rid of some ugly #ifdefs in rcutorture.c by moving
the SRCU status printing to the SRCU implementations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:08 -07:00
Paul E. McKenney 0d8a1e831e srcu: Make process_srcu() be static
The function process_srcu() is not invoked outside of srcutree.c, so
this commit makes it static and drops the EXPORT_SYMBOL_GPL().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:03:23 -07:00
Paul E. McKenney 825c5bd2fd srcu: Move rcu_scheduler_starting() from Tiny RCU to Tiny SRCU
Other than lockdep support, Tiny RCU has no need for the
scheduler status.  However, Tiny SRCU will need this to control
boot-time behavior independent of lockdep.  Therefore, this commit
moves rcu_scheduler_starting() from kernel/rcu/tiny_plugin.h to
kernel/rcu/srcutiny.c.  This in turn allows the complete removal of
kernel/rcu/tiny_plugin.h.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:03:22 -07:00
Paul E. McKenney 6d48152eaf rcu: Remove RCU CPU stall warnings from Tiny RCU
Tiny RCU's job is to be tiny, so this commit removes its RCU CPU
stall warning code.  After this, there is no longer any need for
rcu_sched_ctrlblk and rcu_bh_ctrlblk to be in tiny_plugin.h, so this
commit also moves them to tiny.c.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:45 -07:00
Paul E. McKenney c23484f0e7 rcu: Remove event tracing from Tiny RCU
This commit saves a few lines by getting rid of Tiny RCU's event tracing.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:45 -07:00
Paul E. McKenney 43a0a2a7d7 rcu: Move RCU debug Kconfig options to kernel/rcu
RCU's debugging Kconfig options are in the unintuitive location
lib/Kconfig.debug, and there are enough of them that it would be good for
them to be more centralized.  This commit therefore extracts RCU's Kconfig
options from init/Kconfig into a new kernel/rcu/Kconfig.debug file.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:44 -07:00
Paul E. McKenney 0af92d4609 rcu: Move RCU non-debug Kconfig options to kernel/rcu
RCU's Kconfig options are scattered, and there are enough of them
that it would be good for them to be more centralized.  This commit
therefore extracts RCU's Kconfig options from init/Kconfig into a new
kernel/rcu/Kconfig file.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:44 -07:00
Paul E. McKenney 44c65ff2e3 rcu: Eliminate NOCBs CPU-state Kconfig options
The CONFIG_RCU_NOCB_CPU_ALL, CONFIG_RCU_NOCB_CPU_NONE, and
CONFIG_RCU_NOCB_CPU_ZERO Kconfig options are used only in testing and
are redundant with the rcu_nocbs= boot parameter.  This commit therefore
removes these three Kconfig options and adjusts the rcutorture scripts
to use the boot parameter instead.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:43 -07:00
Paul E. McKenney ae91aa0adb rcu: Remove debugfs tracing
RCU's debugfs tracing used to be the only reasonable low-level debug
information available, but ftrace and event tracing has since surpassed
the RCU debugfs level of usefulness.  This commit therefore removes
RCU's debugfs tracing.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:43 -07:00
Paul E. McKenney bd8cc5a062 srcu: Remove Classic SRCU
Classic SRCU was only ever intended to be a fallback in case of issues
with Tree/Tiny SRCU, and the latter two are doing quite well in testing.
This commit therefore removes Classic SRCU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:42 -07:00
Paul E. McKenney 7f0cd63330 srcu: Fix rcutorture-statistics typo
The function srcutorture_get_gp_data() duplicated the check for
sp->batch_check0.head instead of also checking sp->batch_check1.head.
The only effect of this typo would be for rcutorture statistics to
understate the fraction of time that an SRCU grace period was in flight,
and only for Classic SRCU.  This commit fixes this typo.

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:42 -07:00
Paul E. McKenney c4a09ff752 rcu: Remove the now-obsolete PROVE_RCU_REPEATEDLY Kconfig option
The PROVE_RCU_REPEATEDLY Kconfig option was initially added due to
the volume of messages from PROVE_RCU: Doing just one per boot would
have required excessive numbers of boots to locate them all.  However,
PROVE_RCU messages are now relatively rare, so there is no longer any
reason to need more than one such message per boot.  This commit therefore
removes the PROVE_RCU_REPEATEDLY Kconfig option.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
2017-06-08 18:52:41 -07:00
Paul E. McKenney 4e4bea7427 rcu: Remove typecheck() from RCU locking wrapper functions
Because raw_spin_lock_irqsave() and raw_spin_unlock_irqrestore()
both do typecheck() on their flags argument, there is no point in
duplicating this check in raw_spin_lock_irqsave_rcu_node() and
raw_spin_unlock_irqrestore_rcu_node().  This commit therefore saves
a few lines by removing this duplicated check.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:40 -07:00
Paul E. McKenney fe5ac724d8 rcu: Remove nohz_full full-system-idle state machine
The NO_HZ_FULL_SYSIDLE full-system-idle capability was added in 2013
by commit 0edd1b1784 ("nohz_full: Add full-system-idle state machine"),
but has not been used.  This commit therefore removes it.

If it turns out to be needed later, this commit can always be reverted.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-08 18:52:39 -07:00
Paul E. McKenney f7a10a9750 rcu: Remove the RCU_KTHREAD_PRIO Kconfig option
Anything that can be done with the RCU_KTHREAD_PRIO Kconfig option can
also be done with the rcutree.kthread_prio kernel boot parameter.
This commit therefore removes this Kconfig option.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
2017-06-08 18:52:39 -07:00
Paul E. McKenney 90040c9e30 rcu: Remove *_SLOW_* Kconfig options
The RCU_TORTURE_TEST_SLOW_PREINIT, RCU_TORTURE_TEST_SLOW_PREINIT_DELAY,
RCU_TORTURE_TEST_SLOW_PREINIT_DELAY, RCU_TORTURE_TEST_SLOW_INIT,
RCU_TORTURE_TEST_SLOW_INIT_DELAY, RCU_TORTURE_TEST_SLOW_CLEANUP,
and RCU_TORTURE_TEST_SLOW_CLEANUP_DELAY Kconfig options are only
useful for torture testing, and there are the rcutree.gp_cleanup_delay,
rcutree.gp_init_delay, and rcutree.gp_preinit_delay kernel boot parameters
that rcutorture can use instead.  The effect of these parameters is to
artificially slow down grace period initialization and cleanup in order
to make some types of race conditions happen more often.

This commit therefore simplifies Tree RCU a bit by removing the Kconfig
options and adding the corresponding kernel parameters to rcutorture's
.boot files instead.  However, this commit also leaves out the kernel
parameters for TREE02, TREE04, and TREE07 in order to have about the
same number of tests slowed as not slowed.  TREE01, TREE03, TREE05,
and TREE06 are slowed, and the rest are not slowed.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:38 -07:00
Paul E. McKenney a3883df393 srcu: Use rnp->lock wrappers to replace explicit memory barriers
This commit uses TREE RCU's rnp->lock wrappers to replace a few explicit
memory barriers.  This change also has the advantage of making SRCU's
memory-ordering properties be implemented in roughly the same way as they
are in Tree RCU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:38 -07:00
Paul E. McKenney 83d40bd3bc rcu: Move rnp->lock wrappers for SRCU use
This commit moves the now-generic rnp->lock wrapper macros from
kernel/rcu/tree.h to kernel/rcu/rcu.h, thus allowing SRCU to use them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:38 -07:00
Paul E. McKenney bf32c76540 rcu: Convert rnp->lock wrappers to macros for SRCU use
Use of smp_mb__after_unlock_lock() would allow SRCU to omit a full
memory barrier during callback execution, so this commit converts
raw_spin_lock_rcu_node() from inline functions to type-generic macros
to allow them to handle locks in srcu_node structures as well as
rcu_node structures.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:37 -07:00
Paul E. McKenney 2464dd940e srcu: Apply trivial callback lists to shrink Tiny SRCU
The rcu_segcblist structure provides quite a bit of functionality, and
Tiny SRCU needs almost none of it.  So this commit replaces Tiny SRCU's
uses of rcu_segcblist with a simple singly linked list with tail pointer.
This change significantly reduces Tiny SRCU's memory footprint, more
than making up for the growth caused by the creation of rcu_segcblist.c

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:35 -07:00
Paul E. McKenney 5a0465e17a srcu: Shrink srcu.h by moving docbook and private function
The call_srcu() docbook entry is currently in include/linux/srcu.h,
which causes needless processing for each include point.  This commit
therefore moves this entry to kernel/rcu/srcutree.c, which the compiler
reads only once.  In addition, the srcu_batches_completed() function is
used only within RCU and its torture-test suites.  This commit therefore
also moves this function's declaration from include/linux/srcutiny.h,
include/linux/srcutree.h, and include/linux/srcuclassic.h to
kernel/rcu/rcu.h.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:35 -07:00
Paul E. McKenney c350c00829 srcu: Prevent sdp->srcu_gp_seq_needed counter wrap
If a given CPU never happens to ever start an SRCU grace period, the
grace-period sequence counter might wrap.  If this CPU were to decide to
finally start a grace period, the state of its sdp->srcu_gp_seq_needed
might make it appear that it has already requested this grace period,
which would prevent starting the grace period.  If no other CPU ever started
a grace period again, this would look like a grace-period hang.  Even
if some other CPU took pity and started the needed grace period, the
leaf rcu_node structure's ->srcu_data_have_cbs field won't have record
of the fact that this CPU has a callback pending, which would look like
a very localized grace-period hang.

This might seem very unlikely, but SRCU grace periods can take less than
a microsecond on small systems, which means that overflow can happen
in much less than an hour on a 32-bit embedded system.  And embedded
systems are especially likely to have long-term idle CPUs.  Therefore,
it makes sense to prevent this scenario from happening.

This commit therefore scans each srcu_data structure occasionally,
with frequency controlled by the srcutree.counter_wrap_check kernel
boot parameter.  This parameter can be set to something like 255
in order to exercise the counter-wrap-prevention code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:34 -07:00
Paul E. McKenney fe21a27e8c rcu: Move rcu_request_urgent_qs_task() out of rcutiny.h and rcutree.h
The rcu_request_urgent_qs_task() function is used only within RCU,
so there is no point in exporting it to the rest of the kernel from
nclude/linux/rcutiny.h and include/linux/rcutree.h.  This commit therefore
moves this function to kernel/rcu/rcu.h.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:33 -07:00
Paul E. McKenney e3c8d51e1a rcu: Move torture-related functions out of rcutiny.h and rcutree.h
The various functions similar to rcu_batches_started(), the
function show_rcu_gp_kthreads(), the various functions similar to
rcu_force_quiescent_state(), and the variables rcutorture_testseq and
rcutorture_vernum are used only within RCU.  There is therefore no point
in exporting them to the kernel at large from include/linux/rcutiny.h
and include/linux/rcutree.h.  This commit therefore moves all of these
to kernel/rcu/rcu.h.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:33 -07:00
Paul E. McKenney b8989b7605 rcu: Move rcu_ftrace_dump() from rcupdate.h to rcu.h
The rcu_ftrace_dump() function is used only internally to RCU.  This
commit therefore moves its declaration from include/linux/rcupdate.h
to kernel/rcu/rcu.h.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:32 -07:00
Paul E. McKenney 3d54f7983f rcu: Move rcu_is_nocb_cpu() from rcupdate.h to rcu.h
The rcu_is_nocb_cpu() function is used only internally to RCU.  This
commit therefore moves its declaration from include/linux/rcupdate.h
to kernel/rcu/rcu.h.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:31 -07:00
Paul E. McKenney fa3c664769 rcu: Improve __call_rcu() debug-objects error message
The "__call_rcu(): Leaked duplicate callback" error message from
__call_rcu() has proven to be unhelpful.  This commit therefore changes
it to "__call_rcu(): Double-freed CB" and adds the value of the pointer
passed in.  The value of the pointer improves debuggability by allowing
correlation with tracing output, for example, the rcu:rcu_callback trace
event.

Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:31 -07:00
Paul E. McKenney 82118249d0 rcu: Move the RCU_SCHEDULER_ definitions from rcupdate.h
The RCU_SCHEDULER_INACTIVE, RCU_SCHEDULER_INIT, and RCU_SCHEDULER_RUNNING
definitions are used only within RCU, so this commit moves them from
include/linux/rcupdate.h to kernel/rcu/rcu.h.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:30 -07:00
Paul E. McKenney 791875d16e rcu: Eliminate the unused __rcu_is_watching() function
The __rcu_is_watching() function is currently not used, aside from
to implement the rcu_is_watching() function.  This commit therefore
eliminates __rcu_is_watching(), which has the beneficial side-effect
of shrinking include/linux/rcupdate.h a bit.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:30 -07:00
Paul E. McKenney cad7b38972 rcu: Move torture-related definitions from rcupdate.h to rcu.h
The include/linux/rcupdate.h file contains a number of definitions that
are used only to communicate between rcutorture, rcuperf, and the RCU code
itself.  There is no point in having these definitions exposed globally
throughout the kernel, so this commit moves them to kernel/rcu/rcu.h.
This change has the added benefit of shrinking rcupdate.h.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:28 -07:00
Paul E. McKenney 25c36329a3 rcu: Move expediting-related access/control out of rcupdate.h
The rcu_gp_is_normal(), rcu_gp_is_expedited(), rcu_expedite_gp(), and
rcu_unexpedite_gp() functions are intended only for use within the
RCU implementation itself -- the sysfs access is what should be used
outside of RCU.  This commit therefore moves the declarations for
these functions to kernel/rcu/rcu.h, and also includes this file into
kernel/rcu/rcutorture.c and kernel/rcu/rcuperf.c.  This also has the
beneficial effect of shrinking rcupdate.c a bit.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:28 -07:00
Paul E. McKenney 3caec62fbb rcu: Move rcu_expedited and rcu_normal externs from rcupdate.h
The rcu_expedited and rcu_normal variables are used only by sysctl
and kernel/rcu/update.c, so it does not make sense to their extern
declarations in rcupdate.h.  This commit therefore moves these
extern declarations to update.c.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:27 -07:00
Paul E. McKenney a68a2bb28b rcu: Move docbook comments out of rcupdate.h
The include/linux/rcupdate.h file is included by more than 200
files, so shrinking it should provide some build-time benefits.
This commit therefore moves several docbook comments from rcupdate.h to
kernel/rcu/update.c, kernel/rcu/tree.c, and kernel/rcu/tree_plugin.h, thus
reducing the number of times that the compiler has to scan these comments.
This likely provides only a small benefit, but every little bit helps.

This commit also fixes a malformed bulleted list noted by the 0day
Test Robot.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:27 -07:00
Paul E. McKenney 6b5fc3a133 rcu: Add memory barriers for NOCB leader wakeup
Wait/wakeup operations do not guarantee ordering on their own.  Instead,
either locking or memory barriers are required.  This commit therefore
adds memory barriers to wake_nocb_leader() and nocb_leader_wait().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Krister Johansen <kjlx@templeofstupid.com>
Cc: <stable@vger.kernel.org> # 4.6.x
2017-06-08 18:51:59 -07:00
Paul E. McKenney 511324e462 rcu: Use RCU_NOCB_WAKE rather than RCU_NOGP_WAKE
The RCU_NOGP_WAKE_NOT, RCU_NOGP_WAKE, and RCU_NOGP_WAKE_FORCE flags
are used to mediate wakeups for the no-CBs CPU kthreads.  The "NOGP"
really doesn't make any sense, so this commit does s/NOGP/NOCB/.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:40 -07:00
Paul E. McKenney 68ab0b4263 rcu: Make synchronize_rcu_mult() check for duplicates
Currently, doing synchronize_rcu_mult(call_rcu, call_rcu) might
(or might not) wait for two RCU grace periods.  One approach is
of course "don't do that!", but in CONFIG_PREEMPT=n kernels,
synchronize_rcu_mult(call_rcu, call_rcu_sched) does exactly that.
This results in an ugly #ifdef in sched_cpu_deactivate().

This commit therefore makes __wait_rcu_gp() check for duplicates,
which in turn allows duplicates to be passed to synchronize_rcu_mult()
without risk of waiting twice on the same type of grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:39 -07:00
Paul E. McKenney a602538e46 srcu: Add DEBUG_OBJECTS_RCU_HEAD functionality
This commit adds DEBUG_OBJECTS_RCU_HEAD checking to detect call_srcu()
counterparts to double-free bugs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:39 -07:00
Paul E. McKenney d4efe6c5ad srcu: Shrink Tiny SRCU a bit
In Tiny SRCU, __srcu_read_lock() is a trivial function, outweighed by
its EXPORT_SYMBOL_GPL(), and on many architectures, its call sequence.
This commit therefore moves it to srcutiny.h so that it can be inlined.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:38 -07:00
Paul E. McKenney ea9b0c8a26 rcu: Add lockdep_assert_held() teeth to tree_plugin.h
Comments can be helpful, but assertions carry more force.  This commit
therefore adds lockdep_assert_held() and RCU_LOCKDEP_WARN() calls to
enforce lock-held and interrupt-disabled preconditions.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:37 -07:00
Paul E. McKenney c0b334c5bf rcu: Add lockdep_assert_held() teeth to tree.c
Comments can be helpful, but assertions carry more force.  This
commit therefore adds lockdep_assert_held() and RCU_LOCKDEP_WARN()
calls to enforce lock-held and interrupt-disabled preconditions.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:37 -07:00
Paul E. McKenney 0c8e0e3c37 srcu: Print non-default exp_holdoff values at boot time
This commit makes srcu_bootup_announce() check for non-default values
of the auto-expedite holdoff time exp_holdoff and print a message if so.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:36 -07:00
Paul E. McKenney b5815e6cd3 srcu: Make exp_holdoff module parameter be static
Because exp_holdoff is not used outside of srcutree.c, it can be static.
This commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:36 -07:00
Paul E. McKenney 17c7798bea rcu: Update rcu_bootup_announce_oddness()
This commit updates rcu_bootup_announce_oddness() to check additional
Kconfig options and module/boot parameters.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:35 -07:00
Paul E. McKenney 59d80fd835 rcu: Print out rcupdate.c non-default boot-time settings
This commit adds a rcupdate_announce_bootup_oddness() function to
print out non-default values of significant kernel boot parameter
settings to aid in debugging.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:35 -07:00
Paul E. McKenney f4687d2637 rcu: Add preemptibility checks in rcu_sched_qs() and rcu_bh_qs()
This commit adds WARN_ON_ONCE() calls that trigger if either
rcu_sched_qs() or rcu_bh_qs() are invoked with preemption enabled.
In the immortal words of Peter Zijlstra: "these are much harder to ignore
than comments".

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:34 -07:00
Paul E. McKenney 820687a7b9 rcuperf: Add writer_holdoff boot parameter
This commit adds a writer_holdoff boot parameter to rcuperf, which is
intended to be used to test Tree SRCU's auto-expediting.  This
boot parameter is in microseconds, and defaults to zero (that is,
disabled).  Set it to a bit larger than srcutree.exp_holdoff,
keeping the nanosecond/microsecond conversion, to force Tree SRCU
to auto-expedite more aggressively.

This commit also adds documentation for this parameter, and fixes some
alphabetization while in the neighborhood.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:32 -07:00
Paul E. McKenney 492b95e597 rcuperf: Set more user-friendly defaults
Common-case use of rcuperf must set rcuperf.nreaders=0 and if not built
as a module, rcuperf.shutdown.  This commit therefore sets the default
for rcuperf.nreaders to zero and sets the default for rcuperf.shutdown
to zero if rcuperf is built as a module and to one otherwise.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:31 -07:00
Paul E. McKenney 3ddf20c953 srcu: Shrink Tiny SRCU a bit more
This commit rearranges Tiny SRCU's srcu_struct structure, substitutes
u8 for bool, and shrinks counters down to short.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:31 -07:00
Paul E. McKenney 1f4f6da1c8 srcu: Make Classic and Tree SRCU announce themselves at bootup
Currently, the only way to tell whether a given kernel is running
Classic, Tiny, or Tree SRCU is to look at the .config file, which
can easily be lost or associated with the wrong kernel.  This commit
therefore has Classic and Tree SRCU identify themselves at boot time.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:30 -07:00
Paul E. McKenney f60cb4d4c8 rcuperf: Add test for dynamically initialized srcu_struct
This commit adds a perf_type of "srcud", which species that rcuperf
test SRCU on a dynamically initialized srcu_struct.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:28 -07:00
Paul E. McKenney dcfc315b7b rcu: Make sync_rcu_preempt_exp_done() return bool
The sync_rcu_preempt_exp_done() function returns a logical expression,
but its return type is nevertheless int.  This commit therefore changes
the return type to bool.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:27 -07:00
Paul E. McKenney 881ed593a3 rcuperf: Add ability to performance-test call_rcu() and friends
This commit upgrades rcuperf so that it can do performance testing on
asynchronous grace-period primitives such as call_srcu().  There is
a new rcuperf.gp_async module parameter that specifies this new behavior,
with the pre-existing rcuperf.gp_exp testing expedited grace periods such as
synchronize_rcu_expedited, and with the default being to test synchronous
non-expedited grace periods such as synchronize_rcu().

There is also a new rcuperf.gp_async_max module parameter that specifies
the maximum number of outstanding callbacks per writer kthread, defaulting
to 1,000.  When this limit is exceeded, the writer thread invokes the
appropriate flavor of rcu_barrier() to wait for callbacks to drain.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Removed the redundant initialization noted by Arnd Bergmann. ]
2017-06-08 08:25:26 -07:00
Paul E. McKenney e28371c891 rcu: Remove obsolete reference to synchronize_kernel()
The synchronize_kernel() primitive was removed in favor of
synchronize_sched() more than a decade ago, and it seems likely that
rather few kernel hackers are familiar with it.  Its continued presence
is therefore providing more confusion than enlightenment.  This commit
therefore removes the reference from the synchronize_sched() header
comment, and adds the corresponding information to the synchronize_rcu(0
header comment.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:25 -07:00
Paul E. McKenney 9683937df9 rcuperf: Defer expedited/normal check to end of test
Current rcuperf startup checks to see if the user asked to measure
only expedited grace periods, yet constrained all grace periods to be
normal, or if the user asked to measure only normal grace periods, yet
constrained all grace periods to be expedited.  Useless tests of this
sort are aborted.

Unfortunately, making RCU work through the mid-boot dead zone [1] puts
RCU into expedited-only mode during that zone.  Which happens to also
be the exact time that rcuperf carries out the aforementioned check.
So if the user asks rcuperf to measure only normal grace periods (the
default), rcuperf will now always complain and terminate the test.

This commit therefore moves the checks to rcu_perf_cleanup().  This has
the disadvantage of failing to abort useless tests, but avoids the need to
create yet another kthread and the need to do fiddly checks involving the
holdoff time.  (Yes, another approach is to do the checks in a late-stage
init function, but that would require some way to communicate badness
to rcuperf's kthreads, and seems not worth the bother.)

[1] https://lwn.net/Articles/716148/

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:24 -07:00
Paul E. McKenney 5b72f9643b rcu: Complain if blocking in preemptible RCU read-side critical section
Although preemptible RCU allows its read-side critical sections to be
preempted, general blocking is forbidden.  The reason for this is that
excessive preemption times can be handled by CONFIG_RCU_BOOST=y, but a
voluntarily blocked task doesn't care how high you boost its priority.
Because preemptible RCU is a global mechanism, one ill-behaved reader
hurts everyone.  Hence the prohibition against general blocking in
RCU-preempt read-side critical sections.  Preemption yes, blocking no.

This commit enforces this prohibition.

There is a special exception for the -rt patchset (which they kindly
volunteered to implement):  It is OK to block (as opposed to merely being
preempted) within an RCU-preempt read-side critical section, but only if
the blocking is subject to priority inheritance.  This exception permits
CONFIG_RCU_BOOST=y to get -rt RCU readers out of trouble.

Why doesn't this exception also apply to mainline's rt_mutex?  Because
of the possibility that someone does general blocking while holding
an rt_mutex.  Yes, the priority boosting will affect the rt_mutex,
but it won't help with the task doing general blocking while holding
that rt_mutex.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:24 -07:00
Paul E. McKenney 881ec9d209 srcu: Eliminate possibility of destructive counter overflow
Earlier versions of Tree SRCU were subject to a counter overflow bug that
could theoretically result in too-short grace periods.  This commit
eliminates this problem by adding an update-side memory barrier.
The short explanation is that if the updater sums the unlock counts
too late to see a given __srcu_read_unlock() increment, that CPU's
next __srcu_read_lock() must see the new value of ->srcu_idx, thus
incrementing the other bank of counters.  This eliminates the possibility
of destructive counter overflow as long as the srcu_read_lock() nesting
level does not exceed floor(ULONG_MAX/NR_CPUS/2), which should be an
eminently reasonable nesting limit, especially on 64-bit systems.

Reported-by: Lance Roy <ldr709@gmail.com>
Suggested-by: Lance Roy <ldr709@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:23 -07:00
Paul E. McKenney f92c734f02 rcu: Prevent rcu_barrier() from starting needless grace periods
Currently rcu_barrier() uses call_rcu() to enqueue new callbacks
on each CPU with a non-empty callback list.  This works, but means
that rcu_barrier() forces grace periods that are not otherwise needed.
The key point is that rcu_barrier() never needs to wait for a grace
period, but instead only for all pre-existing callbacks to be invoked.
This means that rcu_barrier()'s new callbacks should be placed in
the callback-list segment containing the last pre-existing callback.

This commit makes this change using the new rcu_segcblist_entrain()
function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:22 -07:00
Paolo Bonzini 1123a60416 srcu: Allow use of Classic SRCU from both process and interrupt context
Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
down a guest running iperf on a VFIO assigned device.  This happens
because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
context, while a worker thread does the same inside kvm_set_irq().  If the
interrupt happens while the worker thread is executing __srcu_read_lock(),
updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
->srcu_lock_count[] field can be lost.

The docs say you are not supposed to call srcu_read_lock() and
srcu_read_unlock() from irq context, but KVM interrupt injection happens
from (host) interrupt context and it would be nice if SRCU supported the
use case.  KVM is using SRCU here not really for the "sleepable" part,
but rather due to its IPI-free fast detection of grace periods.  It is
therefore not desirable to switch back to RCU, which would effectively
revert commit 719d93cd5f ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
2014-01-16).

However, the docs are overly conservative.  You can have an SRCU instance
only has users in irq context, and you can mix process and irq context
as long as process context users disable interrupts.  In addition,
__srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
Classic SRCU.  For those two implementations, only srcu_read_lock()
is unsafe.

When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
in commit 5a41344a3d ("srcu: Simplify __srcu_read_unlock() via
this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
the caller.  Tree SRCU however only does one increment, so on most
architectures it is more efficient for __srcu_read_lock() to use
this_cpu_inc(), and any performance differences appear to be down in
the noise.

Cc: stable@vger.kernel.org
Fixes: 719d93cd5f ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
Reported-by: Linu Cherian <linuc.decode@gmail.com>
Suggested-by: Linu Cherian <linuc.decode@gmail.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:19 -07:00
Paolo Bonzini cdf7abc461 srcu: Allow use of Tiny/Tree SRCU from both process and interrupt context
Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
down a guest running iperf on a VFIO assigned device.  This happens
because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
context, while a worker thread does the same inside kvm_set_irq().  If the
interrupt happens while the worker thread is executing __srcu_read_lock(),
updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
->srcu_lock_count[] field can be lost.

The docs say you are not supposed to call srcu_read_lock() and
srcu_read_unlock() from irq context, but KVM interrupt injection happens
from (host) interrupt context and it would be nice if SRCU supported the
use case.  KVM is using SRCU here not really for the "sleepable" part,
but rather due to its IPI-free fast detection of grace periods.  It is
therefore not desirable to switch back to RCU, which would effectively
revert commit 719d93cd5f ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
2014-01-16).

However, the docs are overly conservative.  You can have an SRCU instance
only has users in irq context, and you can mix process and irq context
as long as process context users disable interrupts.  In addition,
__srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
Classic SRCU.  For those two implementations, only srcu_read_lock()
is unsafe.

When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
in commit 5a41344a3d ("srcu: Simplify __srcu_read_unlock() via
this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
the caller.  Tree SRCU however only does one increment, so on most
architectures it is more efficient for __srcu_read_lock() to use
this_cpu_inc(), and any performance differences appear to be down in
the noise.

Unlike Classic and Tree SRCU, Tiny SRCU does increments and decrements on
a single variable.  Therefore, as Peter Zijlstra pointed out, Tiny SRCU's
implementation already supports mixed-context use of srcu_read_lock()
and srcu_read_unlock(), at least as long as uses of srcu_read_lock()
and srcu_read_unlock() in each handler are nested and paired properly.
In other words, it is still illegal to (say) invoke srcu_read_lock()
in an interrupt handler and to invoke the matching srcu_read_unlock()
in a softirq handler.  Therefore, the only change required for Tiny SRCU
is to its comments.

Fixes: 719d93cd5f ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
Reported-by: Linu Cherian <linuc.decode@gmail.com>
Suggested-by: Linu Cherian <linuc.decode@gmail.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-08 08:24:26 -07:00
Linus Torvalds de4d195308 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes are:

   - Debloat RCU headers

   - Parallelize SRCU callback handling (plus overlapping patches)

   - Improve the performance of Tree SRCU on a CPU-hotplug stress test

   - Documentation updates

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
  rcu: Open-code the rcu_cblist_n_lazy_cbs() function
  rcu: Open-code the rcu_cblist_n_cbs() function
  rcu: Open-code the rcu_cblist_empty() function
  rcu: Separately compile large rcu_segcblist functions
  srcu: Debloat the <linux/rcu_segcblist.h> header
  srcu: Adjust default auto-expediting holdoff
  srcu: Specify auto-expedite holdoff time
  srcu: Expedite first synchronize_srcu() when idle
  srcu: Expedited grace periods with reduced memory contention
  srcu: Make rcutorture writer stalls print SRCU GP state
  srcu: Exact tracking of srcu_data structures containing callbacks
  srcu: Make SRCU be built by default
  srcu: Fix Kconfig botch when SRCU not selected
  rcu: Make non-preemptive schedule be Tasks RCU quiescent state
  srcu: Expedite srcu_schedule_cbs_snp() callback invocation
  srcu: Parallelize callback handling
  kvm: Move srcu_struct fields to end of struct kvm
  rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
  rcu: Use true/false in assignment to bool
  rcu: Use bool value directly
  ...
2017-05-10 10:30:46 -07:00
Paul E. McKenney 933dfbd7c4 rcu: Open-code the rcu_cblist_n_lazy_cbs() function
Because the rcu_cblist_n_lazy_cbs() just samples the ->len_lazy counter,
and because the rcu_cblist structure is quite straightforward, it makes
sense to open-code rcu_cblist_n_lazy_cbs(p) as p->len_lazy, cutting out
a level of indirection.  This commit makes this change.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-02 09:22:48 -07:00
Paul E. McKenney 4b27f20b40 rcu: Open-code the rcu_cblist_n_cbs() function
Because the rcu_cblist_n_cbs() just samples the ->len counter, and
because the rcu_cblist structure is quite straightforward, it makes
sense to open-code rcu_cblist_n_cbs(p) as p->len, cutting out a level
of indirection.  This commit makes this change.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-02 09:21:59 -07:00
Paul E. McKenney 8ef0f37efb rcu: Open-code the rcu_cblist_empty() function
Because the rcu_cblist_empty() just samples the ->head pointer, and
because the rcu_cblist structure is quite straightforward, it makes
sense to open-code rcu_cblist_empty(p) as !p->head, cutting out a
level of indirection.  This commit makes this change.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-02 08:18:40 -07:00
Paul E. McKenney 98059b9861 rcu: Separately compile large rcu_segcblist functions
This commit creates a new kernel/rcu/rcu_segcblist.c file that
contains non-trivial segcblist functions.  Trivial functions
remain as static inline functions in kernel/rcu/rcu_segcblist.h

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
2017-05-02 07:21:02 -07:00
Ingo Molnar 45753c5f31 srcu: Debloat the <linux/rcu_segcblist.h> header
Linus noticed that the <linux/rcu_segcblist.h> has huge inline functions
which should not be inline at all.

As a first step in cleaning this up, move them all to kernel/rcu/ and
only keep an absolute minimum of data type defines in the header:

  before:   -rw-r--r-- 1 mingo mingo 22284 May  2 10:25 include/linux/rcu_segcblist.h
   after:   -rw-r--r-- 1 mingo mingo  3180 May  2 10:22 include/linux/rcu_segcblist.h

More can be done, such as uninlining the large functions, which inlining
is unjustified even if it's an RCU internal matter.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-05-02 06:29:22 -07:00
Paul E. McKenney b5fe223a4b srcu: Adjust default auto-expediting holdoff
The default value for the kernel boot parameter srcutree.exp_holdoff
is 50 microseconds, which is too long for good Tree SRCU performance
(compared to Classic SRCU) on the workloads tested by Mike Galbraith.
This commit therefore sets the default value to 25 microseconds, which
shows excellent results in Mike's testing.

Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-27 08:35:24 -07:00
Paul E. McKenney 22607d66bb srcu: Specify auto-expedite holdoff time
On small systems, in the absence of readers, expedited SRCU grace
periods can complete in less than a microsecond.  This means that an
eight-CPU system can have all CPUs doing synchronize_srcu() in a tight
loop and almost always expedite.  This might actually be desirable in
some situations, but in general it is a good way to needlessly burn
CPU cycles.  And in those situations where it is desirable, your friend
is the function synchronize_srcu_expedited().

For other situations, this commit adds a kernel parameter that specifies
a holdoff between completing the last SRCU grace period and auto-expediting
the next.  If the next grace period starts before the holdoff expires,
auto-expediting is disabled.  The holdoff is 50 microseconds by default,
and can be tuned to the desired number of nanoseconds.  A value of zero
disables auto-expediting.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26 16:32:17 -07:00
Paul E. McKenney 2da4b2a7fd srcu: Expedite first synchronize_srcu() when idle
Classic SRCU in effect expedites the first synchronize_srcu() when SRCU
is idle, and Mike Galbraith demonstrated that some use cases do in fact
rely on this behavior.  In particular, Mike showed that Steven Rostedt's
hotplug stress script takes 55 seconds with Classic SRCU and more than
16 -minutes- when running Tree SRCU.  Assuming that each Tree SRCU's call
to synchronize_srcu() takes four milliseconds, this implies that Steven's
test invokes synchronize_srcu() in isolation, but more than once per
200 microseconds.  Mike used ftrace to demonstrate that the time between
successive calls to synchronize_srcu() ranged from 118 to 342 microseconds,
with one outlier at 80 milliseconds.  This data clearly indicates that
Tree SRCU needs to expedite the first invocation of synchronize_srcu()
during an SRCU idle period.

This commit therefor introduces a srcu_might_be_idle() function that
probabilistically checks whether or not SRCU is idle.  This function is
used by synchronize_rcu() as an additional criterion in deciding whether
or not to expedite.

(Hat trick to Peter Zijlstra for his earlier suggestion that this might
in fact be a problem.  Which for all I know might have motivated Mike to
look into it.)

Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26 16:32:16 -07:00
Paul E. McKenney 1e9a038b7f srcu: Expedited grace periods with reduced memory contention
Commit f60d231a87 ("srcu: Crude control of expedited grace periods")
introduced a per-srcu_struct atomic counter to track outstanding
requests for grace periods.  This works, but represents a memory-contention
bottleneck.  This commit therefore uses the srcu_node combining tree
to remove this bottleneck.

This commit adds new ->srcu_gp_seq_needed_exp fields to the
srcu_data, srcu_node, and srcu_struct structures, which track the
farthest-in-the-future grace period that must be expedited, which in
turn requires that all nearer-term grace periods also be expedited.
Requests for expediting start with the srcu_data structure, run up
through the srcu_node tree, and end at the srcu_struct structure.
Note that it may be necessary to expedite a grace period that just
now started, and this is handled by a new srcu_funnel_exp_start()
function, which is invoked when the grace period itself is already
in its way, but when that grace period was not marked as expedited.

A new srcu_get_delay() function returns zero if there is at least one
expedited SRCU grace period in flight, or SRCU_INTERVAL otherwise.
This function is used to calculate delays:  Normal grace periods
are allowed to extend in order to cover more requests with a given
grace-period computation, which decreases per-request overhead.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26 16:32:16 -07:00
Paul E. McKenney 7f6733c3c6 srcu: Make rcutorture writer stalls print SRCU GP state
In the past, SRCU was simple enough that there was little point in
making the rcutorture writer stall messages print the SRCU grace-period
number state.  With the advent of Tree SRCU, this has changed.  This
commit therefore makes Classic, Tiny, and Tree SRCU report this state
to rcutorture as needed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26 11:23:28 -07:00
Paul E. McKenney c7e88067c1 srcu: Exact tracking of srcu_data structures containing callbacks
The current Tree SRCU implementation schedules a workqueue for every
srcu_data covered by a given leaf srcu_node structure having callbacks,
even if only one of those srcu_data structures actually contains
callbacks.  This is clearly inefficient for workloads that don't feature
callbacks everywhere all the time.  This commit therefore adds an array
of masks that are used by the leaf srcu_node structures to track exactly
which srcu_data structures contain callbacks.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26 11:23:12 -07:00
Paul E. McKenney f2094107ac Merge branches 'doc.2017.04.12a', 'fixes.2017.04.19a' and 'srcu.2017.04.21a' into HEAD
doc.2017.04.12a: Documentation updates
fixes.2017.04.19a: Miscellaneous fixes
srcu.2017.04.21a: Parallelize SRCU callback handling
2017-04-21 06:00:13 -07:00
Paul E. McKenney bcbfdd01dc rcu: Make non-preemptive schedule be Tasks RCU quiescent state
Currently, a call to schedule() acts as a Tasks RCU quiescent state
only if a context switch actually takes place.  However, just the
call to schedule() guarantees that the calling task has moved off of
whatever tracing trampoline that it might have been one previously.
This commit therefore plumbs schedule()'s "preempt" parameter into
rcu_note_context_switch(), which then records the Tasks RCU quiescent
state, but only if this call to schedule() was -not- due to a preemption.

To avoid adding overhead to the common-case context-switch path,
this commit hides the rcu_note_context_switch() check under an existing
non-common-case check.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-21 05:59:27 -07:00
Paul E. McKenney 0497b489b8 srcu: Expedite srcu_schedule_cbs_snp() callback invocation
Although Tree SRCU does reduce delays when there is at least one
synchronize_srcu_expedited() invocation pending, srcu_schedule_cbs_snp()
still waits for SRCU_INTERVAL before invoking callbacks.  Since
synchronize_srcu_expedited() now posts a callback and waits for
that callback to do a wakeup, this destroys the expedited nature of
synchronize_srcu_expedited().  This destruction became apparent to
Marc Zyngier in the guise of a guest-OS bootup slowdown from five
seconds to no fewer than forty seconds.

This commit therefore invokes callbacks immediately at the end of the
grace period when there is at least one synchronize_srcu_expedited()
invocation pending.  This brought Marc's guest-OS bootup times back
into the realm of reason.

Reported-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
2017-04-21 05:59:27 -07:00
Paul E. McKenney da915ad5cf srcu: Parallelize callback handling
Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1,2],
however, there are workloads that could result in a high volume of
concurrent invocations of call_srcu(), which with current SRCU would
result in excessive lock contention on the srcu_struct structure's
->queue_lock, which protects SRCU's callback lists.  This commit therefore
moves SRCU to per-CPU callback lists, thus greatly reducing contention.

Because a given SRCU instance no longer has a single centralized callback
list, starting grace periods and invoking callbacks are both more complex
than in the single-list Classic SRCU implementation.  Starting grace
periods and handling callbacks are now handled using an srcu_node tree
that is in some ways similar to the rcu_node trees used by RCU-bh,
RCU-preempt, and RCU-sched (for example, the srcu_node tree shape is
controlled by exactly the same Kconfig options and boot parameters that
control the shape of the rcu_node tree).

In addition, the old per-CPU srcu_array structure is now named srcu_data
and contains an rcu_segcblist structure named ->srcu_cblist for its
callbacks (and a spinlock to protect this).  The srcu_struct gets
an srcu_gp_seq that is used to associate callback segments with the
corresponding completion-time grace-period number.  These completion-time
grace-period numbers are propagated up the srcu_node tree so that the
grace-period workqueue handler can determine whether additional grace
periods are needed on the one hand and where to look for callbacks that
are ready to be invoked.

The srcu_barrier() function must now wait on all instances of the per-CPU
->srcu_cblist.  Because each ->srcu_cblist is protected by ->lock,
srcu_barrier() can remotely add the needed callbacks.  In theory,
it could also remotely start grace periods, but in practice doing so
is complex and racy.  And interestingly enough, it is never necessary
for srcu_barrier() to start a grace period because srcu_barrier() only
enqueues a callback when a callback is already present--and it turns out
that a grace period has to have already been started for this pre-existing
callback.  Furthermore, it is only the callback that srcu_barrier()
needs to wait on, not any particular grace period.  Therefore, a new
rcu_segcblist_entrain() function enqueues the srcu_barrier() function's
callback into the same segment occupied by the last pre-existing callback
in the list.  The special case where all the pre-existing callbacks are
on a different list (because they are in the process of being invoked)
is handled by enqueuing srcu_barrier()'s callback into the RCU_DONE_TAIL
segment, relying on the done-callbacks check that takes place after all
callbacks are inovked.

Note that the readers use the same algorithm as before.  Note that there
is a separate srcu_idx that tells the readers what counter to increment.
This unfortunately cannot be combined with srcu_gp_seq because they
need to be incremented at different times.

This commit introduces some ugly #ifdefs in rcutorture.  These will go
away when I feel good enough about Tree SRCU to ditch Classic SRCU.

Some crude performance comparisons, courtesy of a quickly hacked rcuperf
asynchronous-grace-period capability:

			Callback Queuing Overhead
			-------------------------
	# CPUS		Classic SRCU	Tree SRCU
	------          ------------    ---------
	     2              0.349 us     0.342 us
	    16             31.66  us     0.4   us
	    41             ---------     0.417 us

The times are the 90th percentiles, a statistic that was chosen to reject
the overheads of the occasional srcu_barrier() call needed to avoid OOMing
the test machine.  The rcuperf test hangs when running Classic SRCU at 41
CPUs, hence the line of dashes.  Despite the hacks to both the rcuperf code
and that statistics, this is a convincing demonstration of Tree SRCU's
performance and scalability advantages.

[1] https://lwn.net/Articles/309030/
[2] https://patchwork.kernel.org/patch/5108281/

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix initialization if synchronize_srcu_expedited() called first. ]
2017-04-21 05:59:26 -07:00
Paul E. McKenney bfd090be14 rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
This commit just changes a "the the" to "the" to reduce repetition.

Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-19 09:29:20 -07:00
Nicholas Mc Guire 5455a7f6a8 rcu: Use true/false in assignment to bool
This commit makes the parse_rcu_nocb_poll() function assign true
(rather than the constant 1) to the bool variable rcu_nocb_poll.

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-19 09:29:20 -07:00
Nicholas Mc Guire 50dc7def4a rcu: Use bool value directly
The beenonline variable is declared bool so there is no need for an
explicit comparison, especially not against the constant zero.

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-19 09:29:19 -07:00
Paul E. McKenney deb34f3643 rcu: Improve comments for hotplug/suspend/hibernate functions
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-19 09:29:18 -07:00
Paul E. McKenney d1e4f01d09 rcu: Remove obsolete comment from rcu_future_gp_cleanup() header
The rcu_nocb_gp_cleanup() function is now invoked elsewhere, so this
commit drags this comment into the year 2017.

Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-19 09:29:17 -07:00
Paul E. McKenney dad81a2026 srcu: Introduce CLASSIC_SRCU Kconfig option
The TREE_SRCU rewrite is large and a bit on the non-simple side, so
this commit helps reduce risk by allowing the old v4.11 SRCU algorithm
to be selected using a new CLASSIC_SRCU Kconfig option that depends
on RCU_EXPERT.  The default is to use the new TREE_SRCU and TINY_SRCU
algorithms, in order to help get these the testing that they need.
However, if your users do not require the update-side scalability that
is to be provided by TREE_SRCU, select RCU_EXPERT and then CLASSIC_SRCU
to revert back to the old classic SRCU algorithm.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:23 -07:00
Paul E. McKenney 32071141b2 srcutorture: Print Tiny SRCU reader statistics
The srcu_torture_stats() function is adapted to the specific srcu_struct
layout traditionally used by SRCU.  This commit therefore adds support
for Tiny SRCU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:22 -07:00
Paul E. McKenney d8be81735a srcu: Create a tiny SRCU
In response to automated complaints about modifications to SRCU
increasing its size, this commit creates a tiny SRCU that is
used in SMP=n && PREEMPT=n builds.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:22 -07:00
Paul E. McKenney f60d231a87 srcu: Crude control of expedited grace periods
SRCU's implementation of expedited grace periods has always assumed
that the SRCU instance is idle when the expedited request arrives.
This commit improves this a bit by maintaining a count of the number
of outstanding expedited requests, thus allowing prior non-expedited
grace periods accommodate these requests by shifting to expedited mode.
However, any non-expedited wait already in progress will still wait for
the full duration.

Improved control of expedited grace periods is planned, but one step
at a time.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:22 -07:00
Paul E. McKenney 80a7956fe3 srcu: Merge ->srcu_state into ->srcu_gp_seq
Updating ->srcu_state and ->srcu_gp_seq will lead to extremely complex
race conditions given multiple callback queues, so this commit takes
advantage of the two-bit state now available in rcu_seq counters to
store the state in the bottom two bits of ->srcu_gp_seq.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:22 -07:00
Paul E. McKenney f1ec57a462 srcu: Allow a second bit in rcu_seq for SRCU state
This commit increases the number of reserved bits at the bottom of an
rcu_seq grace-period counter from one to two, as will be needed to
accommodate SRCU's three-state grace periods.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:21 -07:00
Paul E. McKenney 031aeee000 srcu: Improve rcu_seq grace-period-counter abstraction
The expedited grace-period code contains several open-coded shifts
know the format of an rcu_seq grace-period counter, which is not
particularly good style.  This commit therefore creates a new
rcu_seq_ctr() function that extracts the counter portion of the
counter, and an rcu_seq_state() function that extracts the low-order
state bit.  This commit prepares for SRCU callback parallelization,
which will require two state bits.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:21 -07:00
Paul E. McKenney 91e27c356c srcu: Fix bogus try_check_zero() comment
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:21 -07:00
Paul E. McKenney e95d68d212 srcu: Make num_rcu_lvl[] array be external
This commit makes the num_rcu_lvl[] array external so that SRCU can
make use of it for initializing its upcoming srcu_node tree.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:21 -07:00
Paul E. McKenney efbe451d46 srcu: Move rcu_node traversal macros to rcu.h
This commit moves rcu_for_each_node_breadth_first(),
rcu_for_each_nonleaf_node_breadth_first(), and
rcu_for_each_leaf_node() from kernel/rcu/tree.h to
kernel/rcu/rcu.h so that SRCU can access them.
This commit is code-movement only.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:21 -07:00
Paul E. McKenney 41f5c63178 rcu: Remove redundant levelcnt[] array from rcu_init_one()
The levelcnt[] array is identical to num_rcu_lvl[], so this commit
removes levelcnt[].

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:21 -07:00
Paul E. McKenney 2b34c43cc1 srcu: Move rcu_init_levelspread() to rcu_tree_node.h
This commit moves the rcu_init_levelspread() function from
kernel/rcu/tree.c to kernel/rcu/rcu.h so that SRCU can access it.  This is
another step towards enabling SRCU to create its own combining tree.
This commit is code-movement only, give or take knock-on adjustments.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:20 -07:00
Paul E. McKenney f2425b4efb srcu: Move combining-tree definitions for SRCU's benefit
This commit moves the C preprocessor code that defines the default shape
of the rcu_node combining tree to a new include/linux/rcu_node_tree.h
file as a first step towards enabling SRCU to create its own combining
tree, which in turn enables SRCU to implement per-CPU callback handling,
thus avoiding contention on the lock currently guarding the single list
of callbacks.  Note that users of SRCU still need to know the size of
the srcu_struct structure, hence include/linux rather than kernel/rcu.

This commit is code-movement only.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:20 -07:00
Paul E. McKenney 8660b7d8a5 srcu: Use rcu_segcblist to track SRCU callbacks
This commit switches SRCU from custom-built callback queues to the new
rcu_segcblist structure.  This change associates grace-period sequence
numbers with groups of callbacks, which will be needed for efficient
processing of per-CPU callbacks.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:20 -07:00
Paul E. McKenney ac367c1c62 srcu: Add grace-period sequence numbers
This commit adds grace-period sequence numbers, which will be used to
handle mid-boot grace periods and per-CPU callback lists.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:20 -07:00
Paul E. McKenney c2a8ec0778 srcu: Move to state-based grace-period sequencing
The current SRCU grace-period processing might never reach the last
portion of srcu_advance_batches().  This is OK given the current
implementation, as the first portion, up to the try_check_zero()
following the srcu_flip() is sufficient to drive grace periods forward.
However, it has the unfortunate side-effect of making it impossible to
determine when a given grace period has ended, and it will be necessary
to efficiently trace ends of grace periods in order to efficiently handle
per-CPU SRCU callback lists.

This commit therefore adds states to the SRCU grace-period processing,
so that the end of a given SRCU grace period is marked by the transition
to the SRCU_STATE_DONE state.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:20 -07:00