The RCU CPU stall warnings have morphed significantly since the last
update, so this commit brings the documentation up to date.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
If a fast system has a worst-case grace-period duration of (say) ten
seconds, then running the same workload on a system ten times as slow
will get you an RCU CPU stall warning given default stall-warning
timeout settings. This commit therefore adds this possibility to
stallwarn.txt.
Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
If a periodic interrupt's handler takes longer to execute than the period
between successive interrupts, RCU's kthreads and softirq handlers can
be prevented from executing, resulting in otherwise inexplicable RCU
CPU stall warnings. This commit therefore calls out this possibility
in Documentation/RCU/stallwarn.txt.
Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit provides text and diagrams showing how Tree RCU implements
its grace-period memory ordering guarantees.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
This commit documents the situations in which RCU needs the
scheduling-clock interrupt to be enabled, along with the consequences
of failing to meet RCU's needs in this area.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There are too many ways for the compiler to optimize (that is, break)
dependencies carried via integer values, so it is now permissible to
carry dependencies only via pointers. This commit catches up some of
the documentation on this point.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
RCU's debugfs tracing used to be the only reasonable low-level debug
information available, but ftrace and event tracing has since surpassed
the RCU debugfs level of usefulness. This commit therefore removes
RCU's debugfs tracing.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The sparse-based checking for non-RCU accesses to RCU-protected pointers
has been around for a very long time, and it is now the only type of
sparse-based checking that is optional. This commit therefore makes
it unconditional.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
The NO_HZ_FULL_SYSIDLE full-system-idle capability was added in 2013
by commit 0edd1b1784 ("nohz_full: Add full-system-idle state machine"),
but has not been used. This commit therefore removes it.
If it turns out to be needed later, this commit can always be reverted.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit classifies tail recursion as an alternative way to write
a loop, with similar limitations.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit documents the auto-expediting requirement satisfied by
commits 2da4b2a7fd ("srcu: Expedite first synchronize_srcu() when idle")
and 22607d66bb ("srcu: Specify auto-expedite holdoff time").
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
A group of Linux kernel hackers reported chasing a bug that resulted
from their assumption that SLAB_DESTROY_BY_RCU provided an existence
guarantee, that is, that no block from such a slab would be reallocated
during an RCU read-side critical section. Of course, that is not the
case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
slab of blocks.
However, there is a phrase for this, namely "type safety". This commit
therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
to avoid future instances of this sort of confusion.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
[ paulmck: Add comments mentioning the old name, as requested by Eric
Dumazet, in order to help people familiar with the old name find
the new one. ]
Acked-by: David Rientjes <rientjes@google.com>
The rcu_all_qs() and rcu_note_context_switch() do a series of checks,
taking various actions to supply RCU with quiescent states, depending
on the outcomes of the various checks. This is a bit much for scheduling
fastpaths, so this commit creates a separate ->rcu_urgent_qs field in
the rcu_dynticks structure that acts as a global guard for these checks.
Thus, in the common case, rcu_all_qs() and rcu_note_context_switch()
check the ->rcu_urgent_qs field, find it false, and simply return.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
The rcu_momentary_dyntick_idle() function scans the RCU flavors, checking
that one of them still needs a quiescent state before doing an expensive
atomic operation on the ->dynticks counter. However, this check reduces
overhead only after a rare race condition, and increases complexity. This
commit therefore removes the scan and the mechanism enabling the scan.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_qs_ctr variable is yet another isolated per-CPU variable,
so this commit pulls it into the pre-existing rcu_dynticks per-CPU
structure.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_sched_qs_mask variable is yet another isolated per-CPU variable,
so this commit pulls it into the pre-existing rcu_dynticks per-CPU
structure.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
When an RCU-protected pointer is fetched but never dereferenced
rcu_access_pointer() should be used in place of rcu_dereference().
This commit explicitly records this very fact in Documentation/
RCU/rcu_dereference.txt, in order to prevent the usage of
rcu_dereference() in comparisons.
Signed-off-by: Michalis Kokologiannakis <mixaskok@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_assign_pointer() macro has changed over time, and the version
in Documentation/RCU/whatisRCU.txt has not kept up. This commit brings
it into 2017, albeit in a simplified fashion.
Reported-by: Andrea Parri <parri.andrea@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
These changes include lighter-weight expedited grace periods, the fact
that expedited grace periods and rcu_barrier() no longer block CPU
hotplug, some HTML font fixups, noting that rcu_barrier() need not wait
for a grace period (even if callbacks are posted), the fact that SRCU
read-side critical sections can be used from offline CPUs, and the fact
that SRCU now maintains per-CPU callback lists.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_segcblist data structure, which contains segmented lists
of RCU callbacks, was recently added. This commit updates the
documentation accordingly.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit rearranges the Documentation/RCU/stallwarn.txt file to
put the list of issues that can cause RCU CPU stall warnings near
the beginning of the document.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds a description of how expedited grace periods operate
during the mid-boot "dead zone", which starts when the scheduler spawns
the first kthread and ends when all of RCU's kthreads have been spawned.
In short, before mid-boot, synchronous grace periods can be a no-op.
After the end of mid-boot, workqueues may be used. During mid-boot,
the requesting task drivees the expedited grace period.
For more detail, see https://lwn.net/Articles/716148/.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit updates the "Early Boot" section of the RCU requirements
to describe how synchronous RCU grace periods are now legal throughout
the boot process.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Expedited grace periods no longer fall back to normal grace periods
in response to lock contention, given that expedited grace periods
now use the rcu_node tree so as to avoid contention. This commit
therfore removes the expedited_normal counter.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Now that quick-quiz answers are inline, there is no separate section
containing those answers. This commit therefore removes the dangling
reference from the RCU data-structures design documentation.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
This commit adds design documentation for expedited grace periods.
This documentation is in HTML rather than the new documentation
format because (1) I have prototype documentation already in HTML,
and (2) Attempting to learn the new documentation format while
creating the design documentation seems likely to result in neither
happening in a timely fashion.
Once the design documentation is complete, we can start a conversion effort.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
deference should actually be dereference.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Recent memory-model work deduces the relationships of RCU read-side
critical sections and grace periods based on the relationships of
accesses within a critical section and accesses preceding and following
the grace period. This commit therefore adds this viewpoint.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
A good practice is to prefix the names of functions by the name
of the subsystem.
The kthread worker API is a mix of classic kthreads and workqueues. Each
worker has a dedicated kthread. It runs a generic function that process
queued works. It is implemented as part of the kthread subsystem.
This patch renames the existing kthread worker API to use
the corresponding name from the workqueues API prefixed by
kthread_:
__init_kthread_worker() -> __kthread_init_worker()
init_kthread_worker() -> kthread_init_worker()
init_kthread_work() -> kthread_init_work()
insert_kthread_work() -> kthread_insert_work()
queue_kthread_work() -> kthread_queue_work()
flush_kthread_work() -> kthread_flush_work()
flush_kthread_worker() -> kthread_flush_worker()
Note that the names of DEFINE_KTHREAD_WORK*() macros stay
as they are. It is common that the "DEFINE_" prefix has
precedence over the subsystem names.
Note that INIT() macros and init() functions use different
naming scheme. There is no good solution. There are several
reasons for this solution:
+ "init" in the function names stands for the verb "initialize"
aka "initialize worker". While "INIT" in the macro names
stands for the noun "INITIALIZER" aka "worker initializer".
+ INIT() macros are used only in DEFINE() macros
+ init() functions are used close to the other kthread()
functions. It looks much better if all the functions
use the same scheme.
+ There will be also kthread_destroy_worker() that will
be used close to kthread_cancel_work(). It is related
to the init() function. Again it looks better if all
functions use the same naming scheme.
+ there are several precedents for such init() function
names, e.g. amd_iommu_init_device(), free_area_init_node(),
jump_label_init_type(), regmap_init_mmio_clk(),
+ It is not an argument but it was inconsistent even before.
[arnd@arndb.de: fix linux-next merge conflict]
Link: http://lkml.kernel.org/r/20160908135724.1311726-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/1470754545-17632-3-git-send-email-pmladek@suse.com
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is an assertion in __call_rcu() that checks only the bottom
bit of the rcu_head pointer, rather than the bottom two (as might be
expected for 32-bit systems) or the bottom three (as might be expected
for 64-bit systems). This choice might be a bit surprising in these days
of ubiquitous 32-bit and 64-bit systems. This commit therefore records
the reason for this odd alignment check, namely that m68k guarantees
only two-byte alignment despite being a 32-bit architectures.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CONFIG_RCU_TORTURE_TEST_RUNNABLE was removed by commit 4e9a073f60
("torture: Remove CONFIG_RCU_TORTURE_TEST_RUNNABLE, simplify code"),
but the documentation was not updated accordingly. This commit therefore
updates the documentation to reflect CONFIG_RCU_TORTURE_TEST_RUNNABLE's
removal and to add a description for the alternative module parameter.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
infrastructural work to allow documents to be written using restructured
text. Maybe someday, in a galaxy far far away, we'll be able to eliminate
the DocBook dependency and have a much better integrated set of kernel
docs. Someday.
Beyond that, there's a new document on security hardening from Kees, the
movement of some sample code over to samples/, a number of improvements to
the serial docs from Geert, and the usual collection of corrections, typo
fixes, etc.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXPf/VAAoJEI3ONVYwIuV60pkP/3brq+CavbwptWppESoyZaf7
mpVSH7sOKicMcfHYYIXHmmg0K5gM4e22ATl39+izUCRZRwRnObXvroH++G5mARLs
MUDxLvkc/QxDDuCZnUBq5E2gPtuyYpgj1q9fMGB+70ucc/EXYp5cxUhDmbNVrpSG
KBMoZqKaW/Cf8/4fvRQG/glSR0iwyaQuvvoFAWLHgf8uWN/JPM2Cnv9V2zGQCtzP
4B4Jzayu2BGKowBd65WUYdpGnccc7OAJFSJDY/Z9x7kVxKyD+VTn7VgxGnXxs88v
uNmUEMENUpswzuoYEnDHoR0Y2o7jUi2doFKv+eacSmPaMLWL5EMDzcooZ+Vi7HWH
mvp6GtAZ5qs96OGjsi+gFIw4kY8HGdnpzs7qk/uEdAndfAif5v24YLSQRG2rUCJM
LxomnAWOJEIWGKJtuJnl16aZkgOcn6soecXw3PJmpxzhwd8BnQzwyZIdaZ98kwjA
7Enq2Mmw5NBQwGIV2ODUxzoQ3Axj7aJJsDra2n6lPGTGXONGdgNFzk/hGmtQSuIp
Aeatiy66FF0qKomzs2+EACOFP+eH/IId0yvW83Pj0o9nV25YZiPsw0Z1Tae5n3+g
zgTFycalaowIwE3YzyH6BwvnMrluiPpUTjSLsmEaviJxE7/o+zrjOvMvallUIVUn
YkJcia/DtSuc7u7LYkWe
=2O+a
-----END PGP SIGNATURE-----
Merge tag 'docs-for-linus' of git://git.lwn.net/linux
Pull Documentation updates from Jon Corbet:
"A bit busier this time around.
The most interesting thing (IMO) this time around is some beginning
infrastructural work to allow documents to be written using
restructured text. Maybe someday, in a galaxy far far away, we'll be
able to eliminate the DocBook dependency and have a much better
integrated set of kernel docs. Someday.
Beyond that, there's a new document on security hardening from Kees,
the movement of some sample code over to samples/, a number of
improvements to the serial docs from Geert, and the usual collection
of corrections, typo fixes, etc"
* tag 'docs-for-linus' of git://git.lwn.net/linux: (55 commits)
doc: self-protection: provide initial details
serial: doc: Use port->state instead of info
serial: doc: Always refer to tty_port->mutex
Documentation: vm: Spelling s/paltform/platform/g
Documentation/memcg: update kmem limit doc as codes behavior
docproc: print a comment about autogeneration for rst output
docproc: add support for reStructuredText format via --rst option
docproc: abstract terminating lines at first space
docproc: abstract docproc directive detection
docproc: reduce unnecessary indentation
docproc: add variables for subcommand and filename
kernel-doc: use rst C domain directives and references for types
kernel-doc: produce RestructuredText output
kernel-doc: rewrite usage description, remove duplicated comments
Doc: correct the location of sysrq.c
Documentation: fix common spelling mistakes
samples: v4l: from Documentation to samples directory
samples: connector: from Documentation to samples directory
Documentation: xillybus: fix spelling mistake
Documentation: x86: fix spelling mistakes
...
This fixes several spelling mistakes in the Documentation/ tree, which
are caught by checkpatch.pl's spell checking.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This commit adds documentation for RCU's major data structures,
including rcu_state, rcu_node, rcu_data, rcu_dynticks, and rcu_head.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Commit #cdacbe1f91264 ("rcu: Add fastpath bypassing funnel locking")
turns out to be a pessimization at high load because it forces a tree
full of tasks to wait for an expedited grace period that they probably
do not need. This commit therefore removes this optimization.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Although call_rcu()'s fastpath works just fine on an idle CPU,
some branches of the slowpath invoke the scheduler, which uses
RCU. Therefore, this commit emphasizes the fact that call_rcu()
must not be invoked from an idle CPU.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit uses colors to obscure the quick-quiz answers, thus getting
rid of the .htmlx file. Use your mouse to select the answer in order
to see the text. Alternatively, use your favorite scripting language
to remove all occurences of "<font color="ffffff">" from the file.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit removes a cutesy cartoon and also a diagram that can
just as easily be represented by text.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There is already a blanket statement about no member of RCU's API
being legal from an offline CPU, but add an explicit note where it
states that it is illegal to invoke call_rcu() from an NMI handler.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds a Quick Quiz whose answer explains why the compiler
code reordering enabled by CONFIG_PREEMPT=n's empty rcu_read_lock()
and rcu_read_unlock() functions does not hinder RCU's ability to figure
out which RCU read-side critical sections have completed and not.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit records RCU's responsibility to avoid degrading latencies
of CPUs running tight loops within properly configured workloads,
both in kernel and in userspace.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
In the chapter 'analogy with reader-writer locking', the sample
code uses spinlock_t in reader-writer case. Just correct it so
that we can read the document easily.
Signed-off-by: Yao Dongdong <yaodongdong@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Because RCU-sched expedited grace periods now use IPIs and interact
with rcu_read_unlock(), it is no longer sufficient to disable preemption
across RCU read-side critical sections that acquire and hold scheduler
locks. It is now necessary to instead disable interrupts. This commit
documents this change.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The RCU requirements do not make it absolutely clear that the
memory-barrier requirements are not intended to replace the fundamental
requirement that all pre-existing RCU readers complete before a grace
period completes. This commit therefore pulls the memory-barrier
requirements into a separate section and explicitly calls out the
relationship between the memory-barrier requirements and the fundamental
requirement.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds a second option for avoiding scheduler/RCU deadlocks,
namely that preemption be disabled across the entire RCU read-side
critical section in question.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit expands on RCU's composability by comparing it to that of
transactional memory and of locking.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds verbiage on boot and sysfs parameters that can be
used to control RCU CPU stall warnings, both to change the timeout
and to suppress these warnings entirely.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit records RCU's guarantee that the bottom bit of the rcu_head
structure's ->next field will remain zero for callbacks posted via
call_rcu(), but not necessarily for <tt>kfree_rcu()</tt> or some
possible future call_rcu_lazy() variant that might one day be created
for energy-efficiency purposese.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Updates URLs as suggested by Josh Triplett. ]
This commit adds RCU requirements as published in a 2015 LWN series.
Bringing these requirements in-tree allows them to be updated as changes
are discovered.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Updates to charset and URLs as suggested by Josh Triplett. ]
This commit rids the documentation of long-obsolete torture_type options
such as rcu_sync and adds new ones such as tasks. Also add verbiage
noting the fact that rcutorture now concurrently tests the asynchrounous,
synchronous, expedited synchronous, and polling grace-period primitives.
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Miller <davem@davemloft.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
As there is lots of misinformation and outdated information on the
Internet about nearly all topics related to the kernel, I thought it
would be best if I based my RCU code on the guidelines of the examples
in the Documentation/ tree of the latest kernel. One thing that stuck
out when reading the whatisRCU.txt document was, "interesting how we
don't need any function to dereference rcu protected pointers when doing
updates if a lock is held. I wonder how static analyzers will work with
that." Then, a few weeks later, upon discovering sparse's __rcu support,
I ran it over my code, and lo and behold, things weren't done right.
Examining other RCU usages in the kernel reveal consistent usage of
rcu_dereference_protected, passing in lockdep_is_held as the
conditional. So, this patch adds that idiom to the documentation, so
that others ahead of me won't endure the same exercise.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
The Linux kernel outputs copious text during boot, and a slow serial
console can result in stall warnings, particularly when messages are
printed with interrupts disabled. This commit adds this to the list
of causes of RCU CPU stall warning messages.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
This commit inverts the sense of the rcu_data structure's ->passed_quiesce
field and renames it to ->cpu_no_qs. This will allow a later commit to
use an "aggregate OR" operation to test expedited as well as normal grace
periods without added overhead.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for
consistency with the WARN() series of macros. This also requires
inverting the sense of the conditional, which this commit also does.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
In the common case, there will be only one expedited grace period in
the system at a given time, in which case it is not helpful to use
funnel locking. This commit therefore adds a fastpath that bypasses
funnel locking when the root ->exp_funnel_mutex is not held.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The CONFIG_RCU_CPU_STALL_INFO has been default-y for a couple of
releases with no complaints, so it is time to eliminate this Kconfig
option entirely, so that the long-form RCU CPU stall warnings cannot
be disabled. This commit does just that.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The main thing here is Ingo's big subdirectory documenting feature support
for each architecture. Beyond that, it's the usual pile of fixes, tweaks,
and small additions.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVi0g2AAoJEI3ONVYwIuV6Me4QAIfa79z05ABSjlyWaKw46plH
lULR9cyHdR59JVPHKjSOfT9/c+GOdoz6kkXQoe/TgVyj5fRB8seUW5GJXCASndkk
aVd4c6yKFH1NISXsSdVQC0JbpgAURgcSR6x59It++fG3NINvXronFTWGMBHMLKcI
A2hM2jNP914Dy5r4ipWZKzF1KxIlqK9kmLxlNoE6/LoQfBhh1dMdnyfuM11sguAy
s5pr9JeCPbWC0RE7st/qEivXF4lpj6hd3XoYfM2Y+oukj5xEPQevLTLHOgtesnx9
guUAul5Sw27n+Dx8I0Qxf1n+5SkrijoAa72g5vAxTs+ilOey67qba012NaYSy7RK
s15XOIZ/1JTS9JjkO7GR5NbG6AiIIAH5P+Y501ivCIrsWciTOgKj7cOzakIEV8/P
NX4120Lh5lbBrWeYkl8WbgMO0Me8cThbALC+rncF/wjvGyREKyxNlZ9qvBqmHYjG
5Et2DT+rANaDmmblgMK3tX/zI1g3pN51e+CRF+Hzh1jZD3MZ/i+KS4qgfGFDzMIj
uoniO5VfyD4zRbyv4Grg7XMpXiP8xFxKDypglYiXzzwlkarUgbMGOoFE7AkiPOKB
t9gLPetbDsDyU/bSpzHlfObZp+q+pCxHPhyLS7hxEi3gBxYajIMbkpHHJugnE0+H
TfkIhy6QQm1vAPTpRXaE
=ODt8
-----END PGP SIGNATURE-----
Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6
Pull documentation updates from Jonathan Corbet:
"The main thing here is Ingo's big subdirectory documenting feature
support for each architecture. Beyond that, it's the usual pile of
fixes, tweaks, and small additions"
* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (79 commits)
doc:md: fix typo in md.txt.
Documentation/mic/mpssd: don't build x86 userspace when cross compiling
Documentation/prctl: don't build tsc tests when cross compiling
Documentation/vDSO: don't build tests when cross compiling
Doc:ABI/testing: Fix typo in sysfs-bus-fcoe
Doc: Docbook: Change wikipedia's URL from http to https in scsi.tmpl
Doc: Change wikipedia's URL from http to https
Documentation/kernel-parameters: add missing pciserial to the earlyprintk
Doc:pps: Fix typo in pps.txt
kbuild : Fix documentation of INSTALL_HDR_PATH
Documentation: filesystems: updated struct file_operations documentation in vfs.txt
kbuild: edit explanation of clean-files variable
Doc: ja_JP: Fix typo in HOWTO
Move freefall program from Documentation/ to tools/
Documentation: ARM: EXYNOS: Describe boot loaders interface
Doc:nfc: Fix typo in nfc-hci.txt
vfs: Minor documentation fix
Doc: networking: txtimestamp: fix printf format warning
Documentation, intel_pstate: Improve legacy mode internal governors description
Documentation: extend use case for EXPORT_SYMBOL_GPL()
...
Recently wikipedia announced to secure access to the servers.
Now all http access re-route to https.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Make a note stating that repeated calls of rcu_dereference() may not
return the same pointer if update happens while in critical section.
Reported-by: Jeff Haran <jeff.haran@citrix.com>
Signed-off-by: Milos Vyletel <milos@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit provides another caveat for the care and feeding of pointers
returned by rcu_dereference() that was pointed out in discussions within
the C++ standards committee.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
This commit adds a message that is printed if the relevant grace-period
kthread has not been able to run for the two seconds preceding the
stall warning. (The two seconds is double the maximum interval between
successive bouts of quiescent-state forcing.)
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
in places where it would be useful for it to apply to the normal RCU
flavors, rcu_preempt, rcu_sched, and rcu_bh. This is especially the
case for workloads that aggressively overload the system, particularly
those that generate large numbers of RCU updates on systems running
NO_HZ_FULL CPUs. This commit therefore communicates quiescent states
from cond_resched_rcu_qs() to the normal RCU flavors.
Note that it is unfortunately necessary to leave the old ->passed_quiesce
mechanism in place to allow quiescent states that apply to only one
flavor to be recorded. (Yes, we could decrement ->rcu_qs_ctr_snap in
that case, but that is not so good for debugging of RCU internals.)
In addition, if one of the RCU flavor's grace period has stalled, this
will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight
quiescent state visible from other CPUs.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu()
was used in preemptible code. ]
The current RCU CPU stall warning code will print "Stall ended before
state dump start" any time that the stall-warning code is triggered on
a CPU that has already reported a quiescent state for the current grace
period and if all quiescent states have been reported for the current
grace period. However, a true stall can result in these symptoms, for
example, by preventing RCU's grace-period kthreads from ever running
This commit therefore checks for this condition, reporting the end of
the stall only if one of the grace-period counters has actually advanced.
Otherwise, it reports the last time that the grace-period kthread made
meaningful progress. (In normal situations, the grace-period kthread
should make meaningful progress at least every jiffies_till_next_fqs
jiffies.)
Reported-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Miroslav Benes <mbenes@suse.cz>
Commit 6bfc09e232 ("rcu: Provide RCU CPU stall warnings for tiny RCU")
moved the rcu_cpu_stall_timeout module parameter from rcutree.c to
rcupdate.c, but failed to update Documentation/RCU/stallwarn.txt. This
commit therefore repairs this omission.
commit 96224daa16 ("documentation: Update sysfs path for rcu_cpu_stall_suppress")
updated the path for rcu_cpu_stall_suppress, but failed to update for
rcu_cpu_stall_timeout.
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
PREEMPT_RCU and TREE_PREEMPT_RCU serve the same function after
TINY_PREEMPT_RCU has been removed. This patch removes TREE_PREEMPT_RCU
and uses PREEMPT_RCU config option in its place.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The CONFIG_RCU_CPU_STALL_VERBOSE Kconfig parameter causes preemptible
RCU's CPU stall warnings to dump out any preempted tasks that are blocking
the current RCU grace period. This information is useful, and the default
has been CONFIG_RCU_CPU_STALL_VERBOSE=y for some years. It is therefore
time for this commit to remove this Kconfig parameter, so that future
kernel builds will always act as if CONFIG_RCU_CPU_STALL_VERBOSE=y.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit documents RCU-tasks stall warning messages and also describes
when to use the new cond_resched_rcu_qs() API.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
All other add functions for lists have the new item as first argument
and the position where it is added as second argument. This was changed
for no good reason in this function and makes using it unnecessary
confusing.
The name was changed to hlist_add_behind() to cause unconverted code to
generate a compile error instead of using the wrong parameter order.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Ken Helias <kenhelias@firemail.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [intel driver bits]
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
The kerneltrap.org site no longer works, so this commit updates it to
a working reference, namely gmane.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
This commit updates the list of RCU API members in whatisRCU.txt.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Recent LKML discussings (see http://lwn.net/Articles/586838/ and
http://lwn.net/Articles/588300/ for the LWN writeups) brought out
some ways of misusing the return value from rcu_dereference() that
are not necessarily completely intuitive. This commit therefore
documents what can and cannot safely be done with these values.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Commit 6bfc09e232 (rcu: Provide RCU CPU stall warnings for tiny RCU)
moved the rcu_cpu_stall_suppress module parameter from rcutree.c to
rcupdate.c, but failed to update Documentation/RCU/stallwarn.txt. This
commit therefore repairs this omission.
Reported-by: Kirill Tkhai <tkhai@yandex.ru>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Some of the early history leaves out some citations and vice versa. This
commit fixes these up.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
The call_rcu() family of primitives will take action to accelerate
grace periods when the number of callbacks pending on a given CPU
becomes excessive. Although this safety mechanism can be useful,
it is no substitute for users of call_rcu() having rate-limit controls
in place. This commit adds this nuance to the documentation.
Reported-by: "Michael S. Tsirkin" <mst@redhat.com>
Reported-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Some of the 00-INDEX files are somewhat outdated and some folders does
not contain 00-INDEX at all. Only outdated (with the notably exception
of spi) indexes are touched here, the 169 folders without 00-INDEX has
not been touched.
New 00-INDEX
- spi/* was added in a series of commits dating back to 2006
Added files (missing in (*/)00-INDEX)
- dmatest.txt was added by commit 851b7e16a0 ("dmatest: run test via
debugfs")
- this_cpu_ops.txt was added by commit a1b2a555d6 ("percpu: add
documentation on this_cpu operations")
- ww-mutex-design.txt was added by commit 040a0a3710 ("mutex: Add
support for wound/wait style locks")
- bcache.txt was added by commit cafe563591 ("bcache: A block layer
cache")
- kernel-per-CPU-kthreads.txt was added by commit 49717cb404
("kthread: Document ways of reducing OS jitter due to per-CPU
kthreads")
- phy.txt was added by commit ff76496347 ("drivers: phy: add generic
PHY framework")
- block/null_blk was added by commit 12f8f4fc03 ("null_blk:
documentation")
- module-signing.txt was added by commit 3cafea3076 ("Add
Documentation/module-signing.txt file")
- assoc_array.txt was added by commit 3cb989501c ("Add a generic
associative array implementation.")
- arm/IXP4xx was part of the initial repo
- arm/cluster-pm-race-avoidance.txt was added by commit 7fe31d28e8
("ARM: mcpm: introduce helpers for platform coherency exit/setup")
- arm/firmware.txt was added by commit 7366b92a77 ("ARM: Add
interface for registering and calling firmware-specific operations")
- arm/kernel_mode_neon.txt was added by commit 2afd0a0524 ("ARM:
7825/1: document the use of NEON in kernel mode")
- arm/tcm.txt was added by commit bc581770cf ("ARM: 5580/2: ARM TCM
(Tightly-Coupled Memory) support v3")
- arm/vlocks.txt was added by commit 9762f12d3e ("ARM: mcpm: Add
baremetal voting mutexes")
- blackfin/gptimers-example.c, Makefile was added by commit
4b60779d5e ("Blackfin: add an example showing how to use the
gptimers API")
- devicetree/usage-model.txt was added by commit 31134efc68 ("dt:
Linux DT usage model documentation")
- fb/api.txt was added by commit fb21c2f428 ("fbdev: Add FOURCC-based
format configuration API")
- fb/sm501.txt was added by commit e6a0498071 ("video, sm501: add
edid and commandline support")
- fb/udlfb.txt was added by commit 96f8d864af ("fbdev: move udlfb out
of staging.")
- filesystems/Makefile was added by commit 1e0051ae48
("Documentation/fs/: split txt and source files")
- filesystems/nfs/nfsd-admin-interfaces.txt was added by commit
8a4c6e19cf ("nfsd: document kernel interfaces for nfsd
configuration")
- ide/warm-plug-howto.txt was added by commit f74c91413e ("ide: add
warm-plug support for IDE devices (take 2)")
- laptops/Makefile was added by commit d49129accc
("Documentation/laptop/: split txt and source files")
- leds/leds-blinkm.txt was added by commit b54cf35a7f ("LEDS: add
BlinkM RGB LED driver, documentation and update MAINTAINERS")
- leds/ledtrig-oneshot.txt was added by commit 5e417281cd ("leds: add
oneshot trigger")
- leds/ledtrig-transient.txt was added by commit 44e1e9f8e7 ("leds:
add new transient trigger for one shot timer activation")
- m68k/README.buddha was part of the initial repo
- networking/LICENSE.(qla3xxx|qlcnic|qlge) was added by commits
40839129f7, c4e84bde1d, 5a4faa8737
- networking/Makefile was added by commit 3794f3e812 ("docsrc: build
Documentation/ sources")
- networking/i40evf.txt was added by commit 105bf2fe6b ("i40evf: add
driver to kernel build system")
- networking/ipsec.txt was added by commit b3c6efbc36 ("xfrm: Add
file to document IPsec corner case")
- networking/mac80211-auth-assoc-deauth.txt was added by commit
3cd7920a2b ("mac80211: add auth/assoc/deauth flow diagram")
- networking/netlink_mmap.txt was added by commit 5683264c39
("netlink: add documentation for memory mapped I/O")
- networking/nf_conntrack-sysctl.txt was added by commit c9f9e0e159
("netfilter: doc: add nf_conntrack sysctl api documentation") lan)
- networking/team.txt was added by commit 3d249d4ca7 ("net: introduce
ethernet teaming device")
- networking/vxlan.txt was added by commit d342894c5d ("vxlan:
virtual extensible lan")
- power/runtime_pm.txt was added by commit 5e928f77a0 ("PM: Introduce
core framework for run-time PM of I/O devices (rev. 17)")
- power/charger-manager.txt was added by commit 3bb3dbbd56
("power_supply: Add initial Charger-Manager driver")
- RCU/lockdep-splat.txt was added by commit d7bd2d68aa ("rcu:
Document interpretation of RCU-lockdep splats")
- s390/kvm.txt was added by 5ecee4b (KVM: s390: API documentation)
- s390/qeth.txt was added by commit b4d72c08b3 ("qeth: bridgeport
support - basic control")
- scheduler/sched-bwc.txt was added by commit 88ebc08ea9 ("sched: Add
documentation for bandwidth control")
- scsi/advansys.txt was added by commit 4bd6d7f356 ("[SCSI] advansys:
Move documentation to Documentation/scsi")
- scsi/bfa.txt was added by commit 1ec90174bd ("[SCSI] bfa: add
readme file")
- scsi/bnx2fc.txt was added by commit 12b8fc10ea ("[SCSI] bnx2fc: Add
driver documentation")
- scsi/cxgb3i.txt was added by commit c3673464eb ("[SCSI] cxgb3i: Add
cxgb3i iSCSI driver.")
- scsi/hpsa.txt was added by commit 992ebcf14f ("[SCSI] hpsa: Add
hpsa.txt to Documentation/scsi")
- scsi/link_power_management_policy.txt was added by commit
ca77329fb7 ("[libata] Link power management infrastructure")
- scsi/osd.txt was added by commit 78e0c621de ("[SCSI] osd:
Documentation for OSD library")
- scsi/scsi-parameter.txt was created/moved by commit 163475fb11
("Documentation: move SCSI parameters to their own text file")
- serial/driver was part of the initial repo
- serial/n_gsm.txt was added by commit 323e84122e ("n_gsm: add a
documentation")
- timers/Makefile was added by commit 3794f3e812 ("docsrc: build
Documentation/ sources")
- virt/kvm/s390.txt was added by commit d9101fca3d ("KVM: s390:
diagnose call documentation")
- vm/split_page_table_lock was added by commit 49076ec2cc ("mm:
dynamically allocate page->ptl if it cannot be embedded to struct
page")
- w1/slaves/w1_ds28e04 was added by commit fbf7f7b4e2 ("w1: Add
1-wire slave device driver for DS28E04-100")
- w1/masters/omap-hdq was added by commit e0a29382c6 ("hdq:
documentation for OMAP HDQ")
- x86/early-microcode.txt was added by commit 0d91ea86a8 ("x86, doc:
Documentation for early microcode loading")
- x86/earlyprintk.txt was added by commit a1aade4788 ("x86/doc:
mini-howto for using earlyprintk=dbgp")
- x86/entry_64.txt was added by commit 8b4777a4b5 ("x86-64: Document
some of entry_64.S")
- x86/pat.txt was added by commit d27554d874 ("x86: PAT
documentation")
Moved files
- arm/kernel_user_helpers.txt was moved out of arch/arm/kernel by
commit 37b8304642 ("ARM: kuser: move interface documentation out of
the source code")
- efi-stub.txt was moved out of x86/ and down into Documentation/ in
commit 4172fe2f8a ("EFI stub documentation updates")
- laptops/hpfall.c was moved out of hwmon/ and into laptops/ in commit
efcfed9bad ("Move hp_accel to drivers/platform/x86")
- commit 5616c23ad9 ("x86: doc: move x86-generic documentation from
Doc/x86/i386"):
* x86/usb-legacy-support.txt
* x86/boot.txt
* x86/zero_page.txt
- power/video_extension.txt was moved to acpi in commit 70e66e4df1
("ACPI / video: move video_extension.txt to Documentation/acpi")
Removed files (left in 00-INDEX)
- memory.txt was removed by commit 00ea8990aa ("memory.txt: remove
stray information")
- gpio.txt was moved to gpio/ in commit fd8e198cfc ("Documentation:
gpiolib: document new interface")
- networking/DLINK.txt was removed by commit 168e06ae26
("drivers/net: delete old parallel port de600/de620 drivers")
- serial/hayes-esp.txt was removed by commit f53a2ade0b ("tty: esp:
remove broken driver")
- s390/TAPE was removed by commit 9e280f6693 ("[S390] remove tape
block docu")
- vm/locking was removed by commit 57ea8171d2 ("mm: documentation:
remove hopelessly out-of-date locking doc")
- laptops/acer-wmi.txt was remvoed by commit 020036678e ("acer-wmi:
Delete out-of-date documentation")
Typos/misc issues
- rpc-server-gss.txt was added as knfsd-rpcgss.txt in commit
030d794bf4 ("SUNRPC: Use gssproxy upcall for server RPCGSS
authentication.")
- commit b88cf73d92 ("net: add missing entries to
Documentation/networking/00-INDEX")
* generic-hdlc.txt was added as generic_hdlc.txt
* spider_net.txt was added as spider-net.txt
- w1/master/mxc-w1 was added as mxc_w1 by commit a5fd9139f7 ("w1: add
1-wire master driver for i.MX27 / i.MX31")
- s390/zfcpdump.txt was added as zfcpdump by commit 6920c12a40
("[S390] Add Documentation/s390/00-INDEX.")
Signed-off-by: Henrik Austad <henrik@austad.us>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [rcu bits]
Acked-by: Rob Landley <rob@landley.net>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Len Brown <len.brown@intel.com>
Cc: James Bottomley <JBottomley@parallels.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
doc.2013.12.03a: Topic branch for documentation changes.
fixes.2013.12.12a: Topic branch for miscellaneous fixes.
rcutorture.2013.12.03a: Topic branch for new rcutorture/KVM scripting.
sparse.2013.12.12a: Topic branch for sparse-RCU changes.
Dave Jones got the following lockdep splat:
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.12.0-rc3+ #92 Not tainted
> -------------------------------------------------------
> trinity-child2/15191 is trying to acquire lock:
> (&rdp->nocb_wq){......}, at: [<ffffffff8108ff43>] __wake_up+0x23/0x50
>
> but task is already holding lock:
> (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&ctx->lock){-.-...}:
> [<ffffffff810cc243>] lock_acquire+0x93/0x200
> [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80
> [<ffffffff811500ff>] __perf_event_task_sched_out+0x2df/0x5e0
> [<ffffffff81091b83>] perf_event_task_sched_out+0x93/0xa0
> [<ffffffff81732052>] __schedule+0x1d2/0xa20
> [<ffffffff81732f30>] preempt_schedule_irq+0x50/0xb0
> [<ffffffff817352b6>] retint_kernel+0x26/0x30
> [<ffffffff813eed04>] tty_flip_buffer_push+0x34/0x50
> [<ffffffff813f0504>] pty_write+0x54/0x60
> [<ffffffff813e900d>] n_tty_write+0x32d/0x4e0
> [<ffffffff813e5838>] tty_write+0x158/0x2d0
> [<ffffffff811c4850>] vfs_write+0xc0/0x1f0
> [<ffffffff811c52cc>] SyS_write+0x4c/0xa0
> [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
>
> -> #2 (&rq->lock){-.-.-.}:
> [<ffffffff810cc243>] lock_acquire+0x93/0x200
> [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80
> [<ffffffff810980b2>] wake_up_new_task+0xc2/0x2e0
> [<ffffffff81054336>] do_fork+0x126/0x460
> [<ffffffff81054696>] kernel_thread+0x26/0x30
> [<ffffffff8171ff93>] rest_init+0x23/0x140
> [<ffffffff81ee1e4b>] start_kernel+0x3f6/0x403
> [<ffffffff81ee1571>] x86_64_start_reservations+0x2a/0x2c
> [<ffffffff81ee1664>] x86_64_start_kernel+0xf1/0xf4
>
> -> #1 (&p->pi_lock){-.-.-.}:
> [<ffffffff810cc243>] lock_acquire+0x93/0x200
> [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
> [<ffffffff810979d1>] try_to_wake_up+0x31/0x350
> [<ffffffff81097d62>] default_wake_function+0x12/0x20
> [<ffffffff81084af8>] autoremove_wake_function+0x18/0x40
> [<ffffffff8108ea38>] __wake_up_common+0x58/0x90
> [<ffffffff8108ff59>] __wake_up+0x39/0x50
> [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
> [<ffffffff81111450>] __call_rcu+0x140/0x820
> [<ffffffff81111b8d>] call_rcu+0x1d/0x20
> [<ffffffff81093697>] cpu_attach_domain+0x287/0x360
> [<ffffffff81099d7e>] build_sched_domains+0xe5e/0x10a0
> [<ffffffff81efa7fc>] sched_init_smp+0x3b7/0x47a
> [<ffffffff81ee1f4e>] kernel_init_freeable+0xf6/0x202
> [<ffffffff817200be>] kernel_init+0xe/0x190
> [<ffffffff8173d22c>] ret_from_fork+0x7c/0xb0
>
> -> #0 (&rdp->nocb_wq){......}:
> [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0
> [<ffffffff810cc243>] lock_acquire+0x93/0x200
> [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
> [<ffffffff8108ff43>] __wake_up+0x23/0x50
> [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
> [<ffffffff81111450>] __call_rcu+0x140/0x820
> [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30
> [<ffffffff81149abf>] put_ctx+0x4f/0x70
> [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230
> [<ffffffff81056b8d>] do_exit+0x30d/0xcc0
> [<ffffffff8105893c>] do_group_exit+0x4c/0xc0
> [<ffffffff810589c4>] SyS_exit_group+0x14/0x20
> [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
>
> other info that might help us debug this:
>
> Chain exists of:
> &rdp->nocb_wq --> &rq->lock --> &ctx->lock
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&ctx->lock);
> lock(&rq->lock);
> lock(&ctx->lock);
> lock(&rdp->nocb_wq);
>
> *** DEADLOCK ***
>
> 1 lock held by trinity-child2/15191:
> #0: (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
>
> stack backtrace:
> CPU: 2 PID: 15191 Comm: trinity-child2 Not tainted 3.12.0-rc3+ #92
> ffffffff82565b70 ffff880070c2dbf8 ffffffff8172a363 ffffffff824edf40
> ffff880070c2dc38 ffffffff81726741 ffff880070c2dc90 ffff88022383b1c0
> ffff88022383aac0 0000000000000000 ffff88022383b188 ffff88022383b1c0
> Call Trace:
> [<ffffffff8172a363>] dump_stack+0x4e/0x82
> [<ffffffff81726741>] print_circular_bug+0x200/0x20f
> [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0
> [<ffffffff810c6439>] ? get_lock_stats+0x19/0x60
> [<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80
> [<ffffffff810cc243>] lock_acquire+0x93/0x200
> [<ffffffff8108ff43>] ? __wake_up+0x23/0x50
> [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
> [<ffffffff8108ff43>] ? __wake_up+0x23/0x50
> [<ffffffff8108ff43>] __wake_up+0x23/0x50
> [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
> [<ffffffff81111450>] __call_rcu+0x140/0x820
> [<ffffffff8109bc8f>] ? local_clock+0x3f/0x50
> [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30
> [<ffffffff81149abf>] put_ctx+0x4f/0x70
> [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230
> [<ffffffff81056b8d>] do_exit+0x30d/0xcc0
> [<ffffffff810c9af5>] ? trace_hardirqs_on_caller+0x115/0x1e0
> [<ffffffff810c9bcd>] ? trace_hardirqs_on+0xd/0x10
> [<ffffffff8105893c>] do_group_exit+0x4c/0xc0
> [<ffffffff810589c4>] SyS_exit_group+0x14/0x20
> [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
The underlying problem is that perf is invoking call_rcu() with the
scheduler locks held, but in NOCB mode, call_rcu() will with high
probability invoke the scheduler -- which just might want to use its
locks. The reason that call_rcu() needs to invoke the scheduler is
to wake up the corresponding rcuo callback-offload kthread, which
does the job of starting up a grace period and invoking the callbacks
afterwards.
One solution (championed on a related problem by Lai Jiangshan) is to
simply defer the wakeup to some point where scheduler locks are no longer
held. Since we don't want to unnecessarily incur the cost of such
deferral, the task before us is threefold:
1. Determine when it is likely that a relevant scheduler lock is held.
2. Defer the wakeup in such cases.
3. Ensure that all deferred wakeups eventually happen, preferably
sooner rather than later.
We use irqs_disabled_flags() as a proxy for relevant scheduler locks
being held. This works because the relevant locks are always acquired
with interrupts disabled. We may defer more often than needed, but that
is at least safe.
The wakeup deferral is tracked via a new field in the per-CPU and
per-RCU-flavor rcu_data structure, namely ->nocb_defer_wakeup.
This flag is checked by the RCU core processing. The __rcu_pending()
function now checks this flag, which causes rcu_check_callbacks()
to initiate RCU core processing at each scheduling-clock interrupt
where this flag is set. Of course this is not sufficient because
scheduling-clock interrupts are often turned off (the things we used to
be able to count on!). So the flags are also checked on entry to any
state that RCU considers to be idle, which includes both NO_HZ_IDLE idle
state and NO_HZ_FULL user-mode-execution state.
This approach should allow call_rcu() to be invoked regardless of what
locks you might be holding, the key word being "should".
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>