Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar: "The main changes in this cycle were: - Make kfree_rcu() use kfree_bulk() for added performance - RCU updates - Callback-overload handling updates - Tasks-RCU KCSAN and sparse updates - Locking torture test and RCU torture test updates - Documentation updates - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) rcu: Make rcu_barrier() account for offline no-CBs CPUs rcu: Mark rcu_state.gp_seq to detect concurrent writes Documentation/memory-barriers: Fix typos doc: Add rcutorture scripting to torture.txt doc/RCU/rcu: Use https instead of http if possible doc/RCU/rcu: Use absolute paths for non-rst files doc/RCU/rcu: Use ':ref:' for links to other docs doc/RCU/listRCU: Update example function name doc/RCU/listRCU: Fix typos in a example code snippets doc/RCU/Design: Remove remaining HTML tags in ReST files doc: Add some more RCU list patterns in the kernel rcutorture: Set KCSAN Kconfig options to detect more data races rcutorture: Manually clean up after rcu_barrier() failure rcutorture: Make rcu_torture_barrier_cbs() post from corresponding CPU rcuperf: Measure memory footprint during kfree_rcu() test rcutorture: Annotation lockless accesses to rcu_torture_current rcutorture: Add READ_ONCE() to rcu_torture_count and rcu_torture_batch rcutorture: Fix stray access to rcu_fwd_cb_nodelay rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race rcutorture: Make kvm-find-errors.sh abort on bad directory ...
This commit is contained in:
commit
7c4fa15071
|
@ -4,7 +4,7 @@ A Tour Through TREE_RCU's Grace-Period Memory Ordering
|
||||||
|
|
||||||
August 8, 2017
|
August 8, 2017
|
||||||
|
|
||||||
This article was contributed by Paul E. McKenney
|
This article was contributed by Paul E. McKenney
|
||||||
|
|
||||||
Introduction
|
Introduction
|
||||||
============
|
============
|
||||||
|
@ -48,7 +48,7 @@ Tree RCU Grace Period Memory Ordering Building Blocks
|
||||||
|
|
||||||
The workhorse for RCU's grace-period memory ordering is the
|
The workhorse for RCU's grace-period memory ordering is the
|
||||||
critical section for the ``rcu_node`` structure's
|
critical section for the ``rcu_node`` structure's
|
||||||
``->lock``. These critical sections use helper functions for lock
|
``->lock``. These critical sections use helper functions for lock
|
||||||
acquisition, including ``raw_spin_lock_rcu_node()``,
|
acquisition, including ``raw_spin_lock_rcu_node()``,
|
||||||
``raw_spin_lock_irq_rcu_node()``, and ``raw_spin_lock_irqsave_rcu_node()``.
|
``raw_spin_lock_irq_rcu_node()``, and ``raw_spin_lock_irqsave_rcu_node()``.
|
||||||
Their lock-release counterparts are ``raw_spin_unlock_rcu_node()``,
|
Their lock-release counterparts are ``raw_spin_unlock_rcu_node()``,
|
||||||
|
@ -102,9 +102,9 @@ lock-acquisition and lock-release functions::
|
||||||
23 r3 = READ_ONCE(x);
|
23 r3 = READ_ONCE(x);
|
||||||
24 }
|
24 }
|
||||||
25
|
25
|
||||||
26 WARN_ON(r1 == 0 && r2 == 0 && r3 == 0);
|
26 WARN_ON(r1 == 0 && r2 == 0 && r3 == 0);
|
||||||
|
|
||||||
The ``WARN_ON()`` is evaluated at “the end of time”,
|
The ``WARN_ON()`` is evaluated at "the end of time",
|
||||||
after all changes have propagated throughout the system.
|
after all changes have propagated throughout the system.
|
||||||
Without the ``smp_mb__after_unlock_lock()`` provided by the
|
Without the ``smp_mb__after_unlock_lock()`` provided by the
|
||||||
acquisition functions, this ``WARN_ON()`` could trigger, for example
|
acquisition functions, this ``WARN_ON()`` could trigger, for example
|
||||||
|
|
|
@ -4,12 +4,61 @@ Using RCU to Protect Read-Mostly Linked Lists
|
||||||
=============================================
|
=============================================
|
||||||
|
|
||||||
One of the best applications of RCU is to protect read-mostly linked lists
|
One of the best applications of RCU is to protect read-mostly linked lists
|
||||||
("struct list_head" in list.h). One big advantage of this approach
|
(``struct list_head`` in list.h). One big advantage of this approach
|
||||||
is that all of the required memory barriers are included for you in
|
is that all of the required memory barriers are included for you in
|
||||||
the list macros. This document describes several applications of RCU,
|
the list macros. This document describes several applications of RCU,
|
||||||
with the best fits first.
|
with the best fits first.
|
||||||
|
|
||||||
Example 1: Read-Side Action Taken Outside of Lock, No In-Place Updates
|
|
||||||
|
Example 1: Read-mostly list: Deferred Destruction
|
||||||
|
-------------------------------------------------
|
||||||
|
|
||||||
|
A widely used usecase for RCU lists in the kernel is lockless iteration over
|
||||||
|
all processes in the system. ``task_struct::tasks`` represents the list node that
|
||||||
|
links all the processes. The list can be traversed in parallel to any list
|
||||||
|
additions or removals.
|
||||||
|
|
||||||
|
The traversal of the list is done using ``for_each_process()`` which is defined
|
||||||
|
by the 2 macros::
|
||||||
|
|
||||||
|
#define next_task(p) \
|
||||||
|
list_entry_rcu((p)->tasks.next, struct task_struct, tasks)
|
||||||
|
|
||||||
|
#define for_each_process(p) \
|
||||||
|
for (p = &init_task ; (p = next_task(p)) != &init_task ; )
|
||||||
|
|
||||||
|
The code traversing the list of all processes typically looks like::
|
||||||
|
|
||||||
|
rcu_read_lock();
|
||||||
|
for_each_process(p) {
|
||||||
|
/* Do something with p */
|
||||||
|
}
|
||||||
|
rcu_read_unlock();
|
||||||
|
|
||||||
|
The simplified code for removing a process from a task list is::
|
||||||
|
|
||||||
|
void release_task(struct task_struct *p)
|
||||||
|
{
|
||||||
|
write_lock(&tasklist_lock);
|
||||||
|
list_del_rcu(&p->tasks);
|
||||||
|
write_unlock(&tasklist_lock);
|
||||||
|
call_rcu(&p->rcu, delayed_put_task_struct);
|
||||||
|
}
|
||||||
|
|
||||||
|
When a process exits, ``release_task()`` calls ``list_del_rcu(&p->tasks)`` under
|
||||||
|
``tasklist_lock`` writer lock protection, to remove the task from the list of
|
||||||
|
all tasks. The ``tasklist_lock`` prevents concurrent list additions/removals
|
||||||
|
from corrupting the list. Readers using ``for_each_process()`` are not protected
|
||||||
|
with the ``tasklist_lock``. To prevent readers from noticing changes in the list
|
||||||
|
pointers, the ``task_struct`` object is freed only after one or more grace
|
||||||
|
periods elapse (with the help of call_rcu()). This deferring of destruction
|
||||||
|
ensures that any readers traversing the list will see valid ``p->tasks.next``
|
||||||
|
pointers and deletion/freeing can happen in parallel with traversal of the list.
|
||||||
|
This pattern is also called an **existence lock**, since RCU pins the object in
|
||||||
|
memory until all existing readers finish.
|
||||||
|
|
||||||
|
|
||||||
|
Example 2: Read-Side Action Taken Outside of Lock: No In-Place Updates
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
The best applications are cases where, if reader-writer locking were
|
The best applications are cases where, if reader-writer locking were
|
||||||
|
@ -26,7 +75,7 @@ added or deleted, rather than being modified in place.
|
||||||
|
|
||||||
A straightforward example of this use of RCU may be found in the
|
A straightforward example of this use of RCU may be found in the
|
||||||
system-call auditing support. For example, a reader-writer locked
|
system-call auditing support. For example, a reader-writer locked
|
||||||
implementation of audit_filter_task() might be as follows::
|
implementation of ``audit_filter_task()`` might be as follows::
|
||||||
|
|
||||||
static enum audit_state audit_filter_task(struct task_struct *tsk)
|
static enum audit_state audit_filter_task(struct task_struct *tsk)
|
||||||
{
|
{
|
||||||
|
@ -34,7 +83,7 @@ implementation of audit_filter_task() might be as follows::
|
||||||
enum audit_state state;
|
enum audit_state state;
|
||||||
|
|
||||||
read_lock(&auditsc_lock);
|
read_lock(&auditsc_lock);
|
||||||
/* Note: audit_netlink_sem held by caller. */
|
/* Note: audit_filter_mutex held by caller. */
|
||||||
list_for_each_entry(e, &audit_tsklist, list) {
|
list_for_each_entry(e, &audit_tsklist, list) {
|
||||||
if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
|
if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
|
||||||
read_unlock(&auditsc_lock);
|
read_unlock(&auditsc_lock);
|
||||||
|
@ -58,7 +107,7 @@ This means that RCU can be easily applied to the read side, as follows::
|
||||||
enum audit_state state;
|
enum audit_state state;
|
||||||
|
|
||||||
rcu_read_lock();
|
rcu_read_lock();
|
||||||
/* Note: audit_netlink_sem held by caller. */
|
/* Note: audit_filter_mutex held by caller. */
|
||||||
list_for_each_entry_rcu(e, &audit_tsklist, list) {
|
list_for_each_entry_rcu(e, &audit_tsklist, list) {
|
||||||
if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
|
if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
|
||||||
rcu_read_unlock();
|
rcu_read_unlock();
|
||||||
|
@ -69,13 +118,13 @@ This means that RCU can be easily applied to the read side, as follows::
|
||||||
return AUDIT_BUILD_CONTEXT;
|
return AUDIT_BUILD_CONTEXT;
|
||||||
}
|
}
|
||||||
|
|
||||||
The read_lock() and read_unlock() calls have become rcu_read_lock()
|
The ``read_lock()`` and ``read_unlock()`` calls have become rcu_read_lock()
|
||||||
and rcu_read_unlock(), respectively, and the list_for_each_entry() has
|
and rcu_read_unlock(), respectively, and the list_for_each_entry() has
|
||||||
become list_for_each_entry_rcu(). The _rcu() list-traversal primitives
|
become list_for_each_entry_rcu(). The **_rcu()** list-traversal primitives
|
||||||
insert the read-side memory barriers that are required on DEC Alpha CPUs.
|
insert the read-side memory barriers that are required on DEC Alpha CPUs.
|
||||||
|
|
||||||
The changes to the update side are also straightforward. A reader-writer
|
The changes to the update side are also straightforward. A reader-writer lock
|
||||||
lock might be used as follows for deletion and insertion::
|
might be used as follows for deletion and insertion::
|
||||||
|
|
||||||
static inline int audit_del_rule(struct audit_rule *rule,
|
static inline int audit_del_rule(struct audit_rule *rule,
|
||||||
struct list_head *list)
|
struct list_head *list)
|
||||||
|
@ -115,7 +164,7 @@ Following are the RCU equivalents for these two functions::
|
||||||
{
|
{
|
||||||
struct audit_entry *e;
|
struct audit_entry *e;
|
||||||
|
|
||||||
/* Do not use the _rcu iterator here, since this is the only
|
/* No need to use the _rcu iterator here, since this is the only
|
||||||
* deletion routine. */
|
* deletion routine. */
|
||||||
list_for_each_entry(e, list, list) {
|
list_for_each_entry(e, list, list) {
|
||||||
if (!audit_compare_rule(rule, &e->rule)) {
|
if (!audit_compare_rule(rule, &e->rule)) {
|
||||||
|
@ -139,30 +188,30 @@ Following are the RCU equivalents for these two functions::
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
Normally, the write_lock() and write_unlock() would be replaced by
|
Normally, the ``write_lock()`` and ``write_unlock()`` would be replaced by a
|
||||||
a spin_lock() and a spin_unlock(), but in this case, all callers hold
|
spin_lock() and a spin_unlock(). But in this case, all callers hold
|
||||||
audit_netlink_sem, so no additional locking is required. The auditsc_lock
|
``audit_filter_mutex``, so no additional locking is required. The
|
||||||
can therefore be eliminated, since use of RCU eliminates the need for
|
``auditsc_lock`` can therefore be eliminated, since use of RCU eliminates the
|
||||||
writers to exclude readers. Normally, the write_lock() calls would
|
need for writers to exclude readers.
|
||||||
be converted into spin_lock() calls.
|
|
||||||
|
|
||||||
The list_del(), list_add(), and list_add_tail() primitives have been
|
The list_del(), list_add(), and list_add_tail() primitives have been
|
||||||
replaced by list_del_rcu(), list_add_rcu(), and list_add_tail_rcu().
|
replaced by list_del_rcu(), list_add_rcu(), and list_add_tail_rcu().
|
||||||
The _rcu() list-manipulation primitives add memory barriers that are
|
The **_rcu()** list-manipulation primitives add memory barriers that are needed on
|
||||||
needed on weakly ordered CPUs (most of them!). The list_del_rcu()
|
weakly ordered CPUs (most of them!). The list_del_rcu() primitive omits the
|
||||||
primitive omits the pointer poisoning debug-assist code that would
|
pointer poisoning debug-assist code that would otherwise cause concurrent
|
||||||
otherwise cause concurrent readers to fail spectacularly.
|
readers to fail spectacularly.
|
||||||
|
|
||||||
So, when readers can tolerate stale data and when entries are either added
|
So, when readers can tolerate stale data and when entries are either added or
|
||||||
or deleted, without in-place modification, it is very easy to use RCU!
|
deleted, without in-place modification, it is very easy to use RCU!
|
||||||
|
|
||||||
Example 2: Handling In-Place Updates
|
|
||||||
|
Example 3: Handling In-Place Updates
|
||||||
------------------------------------
|
------------------------------------
|
||||||
|
|
||||||
The system-call auditing code does not update auditing rules in place.
|
The system-call auditing code does not update auditing rules in place. However,
|
||||||
However, if it did, reader-writer-locked code to do so might look as
|
if it did, the reader-writer-locked code to do so might look as follows
|
||||||
follows (presumably, the field_count is only permitted to decrease,
|
(assuming only ``field_count`` is updated, otherwise, the added fields would
|
||||||
otherwise, the added fields would need to be filled in)::
|
need to be filled in)::
|
||||||
|
|
||||||
static inline int audit_upd_rule(struct audit_rule *rule,
|
static inline int audit_upd_rule(struct audit_rule *rule,
|
||||||
struct list_head *list,
|
struct list_head *list,
|
||||||
|
@ -170,14 +219,14 @@ otherwise, the added fields would need to be filled in)::
|
||||||
__u32 newfield_count)
|
__u32 newfield_count)
|
||||||
{
|
{
|
||||||
struct audit_entry *e;
|
struct audit_entry *e;
|
||||||
struct audit_newentry *ne;
|
struct audit_entry *ne;
|
||||||
|
|
||||||
write_lock(&auditsc_lock);
|
write_lock(&auditsc_lock);
|
||||||
/* Note: audit_netlink_sem held by caller. */
|
/* Note: audit_filter_mutex held by caller. */
|
||||||
list_for_each_entry(e, list, list) {
|
list_for_each_entry(e, list, list) {
|
||||||
if (!audit_compare_rule(rule, &e->rule)) {
|
if (!audit_compare_rule(rule, &e->rule)) {
|
||||||
e->rule.action = newaction;
|
e->rule.action = newaction;
|
||||||
e->rule.file_count = newfield_count;
|
e->rule.field_count = newfield_count;
|
||||||
write_unlock(&auditsc_lock);
|
write_unlock(&auditsc_lock);
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
@ -188,8 +237,8 @@ otherwise, the added fields would need to be filled in)::
|
||||||
|
|
||||||
The RCU version creates a copy, updates the copy, then replaces the old
|
The RCU version creates a copy, updates the copy, then replaces the old
|
||||||
entry with the newly updated entry. This sequence of actions, allowing
|
entry with the newly updated entry. This sequence of actions, allowing
|
||||||
concurrent reads while doing a copy to perform an update, is what gives
|
concurrent reads while making a copy to perform an update, is what gives
|
||||||
RCU ("read-copy update") its name. The RCU code is as follows::
|
RCU (*read-copy update*) its name. The RCU code is as follows::
|
||||||
|
|
||||||
static inline int audit_upd_rule(struct audit_rule *rule,
|
static inline int audit_upd_rule(struct audit_rule *rule,
|
||||||
struct list_head *list,
|
struct list_head *list,
|
||||||
|
@ -197,7 +246,7 @@ RCU ("read-copy update") its name. The RCU code is as follows::
|
||||||
__u32 newfield_count)
|
__u32 newfield_count)
|
||||||
{
|
{
|
||||||
struct audit_entry *e;
|
struct audit_entry *e;
|
||||||
struct audit_newentry *ne;
|
struct audit_entry *ne;
|
||||||
|
|
||||||
list_for_each_entry(e, list, list) {
|
list_for_each_entry(e, list, list) {
|
||||||
if (!audit_compare_rule(rule, &e->rule)) {
|
if (!audit_compare_rule(rule, &e->rule)) {
|
||||||
|
@ -206,7 +255,7 @@ RCU ("read-copy update") its name. The RCU code is as follows::
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
audit_copy_rule(&ne->rule, &e->rule);
|
audit_copy_rule(&ne->rule, &e->rule);
|
||||||
ne->rule.action = newaction;
|
ne->rule.action = newaction;
|
||||||
ne->rule.file_count = newfield_count;
|
ne->rule.field_count = newfield_count;
|
||||||
list_replace_rcu(&e->list, &ne->list);
|
list_replace_rcu(&e->list, &ne->list);
|
||||||
call_rcu(&e->rcu, audit_free_rule);
|
call_rcu(&e->rcu, audit_free_rule);
|
||||||
return 0;
|
return 0;
|
||||||
|
@ -215,34 +264,45 @@ RCU ("read-copy update") its name. The RCU code is as follows::
|
||||||
return -EFAULT; /* No matching rule */
|
return -EFAULT; /* No matching rule */
|
||||||
}
|
}
|
||||||
|
|
||||||
Again, this assumes that the caller holds audit_netlink_sem. Normally,
|
Again, this assumes that the caller holds ``audit_filter_mutex``. Normally, the
|
||||||
the reader-writer lock would become a spinlock in this sort of code.
|
writer lock would become a spinlock in this sort of code.
|
||||||
|
|
||||||
Example 3: Eliminating Stale Data
|
Another use of this pattern can be found in the openswitch driver's *connection
|
||||||
|
tracking table* code in ``ct_limit_set()``. The table holds connection tracking
|
||||||
|
entries and has a limit on the maximum entries. There is one such table
|
||||||
|
per-zone and hence one *limit* per zone. The zones are mapped to their limits
|
||||||
|
through a hashtable using an RCU-managed hlist for the hash chains. When a new
|
||||||
|
limit is set, a new limit object is allocated and ``ct_limit_set()`` is called
|
||||||
|
to replace the old limit object with the new one using list_replace_rcu().
|
||||||
|
The old limit object is then freed after a grace period using kfree_rcu().
|
||||||
|
|
||||||
|
|
||||||
|
Example 4: Eliminating Stale Data
|
||||||
---------------------------------
|
---------------------------------
|
||||||
|
|
||||||
The auditing examples above tolerate stale data, as do most algorithms
|
The auditing example above tolerates stale data, as do most algorithms
|
||||||
that are tracking external state. Because there is a delay from the
|
that are tracking external state. Because there is a delay from the
|
||||||
time the external state changes before Linux becomes aware of the change,
|
time the external state changes before Linux becomes aware of the change,
|
||||||
additional RCU-induced staleness is normally not a problem.
|
additional RCU-induced staleness is generally not a problem.
|
||||||
|
|
||||||
However, there are many examples where stale data cannot be tolerated.
|
However, there are many examples where stale data cannot be tolerated.
|
||||||
One example in the Linux kernel is the System V IPC (see the ipc_lock()
|
One example in the Linux kernel is the System V IPC (see the shm_lock()
|
||||||
function in ipc/util.c). This code checks a "deleted" flag under a
|
function in ipc/shm.c). This code checks a *deleted* flag under a
|
||||||
per-entry spinlock, and, if the "deleted" flag is set, pretends that the
|
per-entry spinlock, and, if the *deleted* flag is set, pretends that the
|
||||||
entry does not exist. For this to be helpful, the search function must
|
entry does not exist. For this to be helpful, the search function must
|
||||||
return holding the per-entry spinlock, as ipc_lock() does in fact do.
|
return holding the per-entry spinlock, as shm_lock() does in fact do.
|
||||||
|
|
||||||
|
.. _quick_quiz:
|
||||||
|
|
||||||
Quick Quiz:
|
Quick Quiz:
|
||||||
Why does the search function need to return holding the per-entry lock for
|
For the deleted-flag technique to be helpful, why is it necessary
|
||||||
this deleted-flag technique to be helpful?
|
to hold the per-entry lock while returning from the search function?
|
||||||
|
|
||||||
:ref:`Answer to Quick Quiz <answer_quick_quiz_list>`
|
:ref:`Answer to Quick Quiz <quick_quiz_answer>`
|
||||||
|
|
||||||
If the system-call audit module were to ever need to reject stale data,
|
If the system-call audit module were to ever need to reject stale data, one way
|
||||||
one way to accomplish this would be to add a "deleted" flag and a "lock"
|
to accomplish this would be to add a ``deleted`` flag and a ``lock`` spinlock to the
|
||||||
spinlock to the audit_entry structure, and modify audit_filter_task()
|
audit_entry structure, and modify ``audit_filter_task()`` as follows::
|
||||||
as follows::
|
|
||||||
|
|
||||||
static enum audit_state audit_filter_task(struct task_struct *tsk)
|
static enum audit_state audit_filter_task(struct task_struct *tsk)
|
||||||
{
|
{
|
||||||
|
@ -267,20 +327,20 @@ as follows::
|
||||||
}
|
}
|
||||||
|
|
||||||
Note that this example assumes that entries are only added and deleted.
|
Note that this example assumes that entries are only added and deleted.
|
||||||
Additional mechanism is required to deal correctly with the
|
Additional mechanism is required to deal correctly with the update-in-place
|
||||||
update-in-place performed by audit_upd_rule(). For one thing,
|
performed by ``audit_upd_rule()``. For one thing, ``audit_upd_rule()`` would
|
||||||
audit_upd_rule() would need additional memory barriers to ensure
|
need additional memory barriers to ensure that the list_add_rcu() was really
|
||||||
that the list_add_rcu() was really executed before the list_del_rcu().
|
executed before the list_del_rcu().
|
||||||
|
|
||||||
The audit_del_rule() function would need to set the "deleted"
|
The ``audit_del_rule()`` function would need to set the ``deleted`` flag under the
|
||||||
flag under the spinlock as follows::
|
spinlock as follows::
|
||||||
|
|
||||||
static inline int audit_del_rule(struct audit_rule *rule,
|
static inline int audit_del_rule(struct audit_rule *rule,
|
||||||
struct list_head *list)
|
struct list_head *list)
|
||||||
{
|
{
|
||||||
struct audit_entry *e;
|
struct audit_entry *e;
|
||||||
|
|
||||||
/* Do not need to use the _rcu iterator here, since this
|
/* No need to use the _rcu iterator here, since this
|
||||||
* is the only deletion routine. */
|
* is the only deletion routine. */
|
||||||
list_for_each_entry(e, list, list) {
|
list_for_each_entry(e, list, list) {
|
||||||
if (!audit_compare_rule(rule, &e->rule)) {
|
if (!audit_compare_rule(rule, &e->rule)) {
|
||||||
|
@ -295,6 +355,91 @@ flag under the spinlock as follows::
|
||||||
return -EFAULT; /* No matching rule */
|
return -EFAULT; /* No matching rule */
|
||||||
}
|
}
|
||||||
|
|
||||||
|
This too assumes that the caller holds ``audit_filter_mutex``.
|
||||||
|
|
||||||
|
|
||||||
|
Example 5: Skipping Stale Objects
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
|
For some usecases, reader performance can be improved by skipping stale objects
|
||||||
|
during read-side list traversal if the object in concern is pending destruction
|
||||||
|
after one or more grace periods. One such example can be found in the timerfd
|
||||||
|
subsystem. When a ``CLOCK_REALTIME`` clock is reprogrammed - for example due to
|
||||||
|
setting of the system time, then all programmed timerfds that depend on this
|
||||||
|
clock get triggered and processes waiting on them to expire are woken up in
|
||||||
|
advance of their scheduled expiry. To facilitate this, all such timers are added
|
||||||
|
to an RCU-managed ``cancel_list`` when they are setup in
|
||||||
|
``timerfd_setup_cancel()``::
|
||||||
|
|
||||||
|
static void timerfd_setup_cancel(struct timerfd_ctx *ctx, int flags)
|
||||||
|
{
|
||||||
|
spin_lock(&ctx->cancel_lock);
|
||||||
|
if ((ctx->clockid == CLOCK_REALTIME &&
|
||||||
|
(flags & TFD_TIMER_ABSTIME) && (flags & TFD_TIMER_CANCEL_ON_SET)) {
|
||||||
|
if (!ctx->might_cancel) {
|
||||||
|
ctx->might_cancel = true;
|
||||||
|
spin_lock(&cancel_lock);
|
||||||
|
list_add_rcu(&ctx->clist, &cancel_list);
|
||||||
|
spin_unlock(&cancel_lock);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
spin_unlock(&ctx->cancel_lock);
|
||||||
|
}
|
||||||
|
|
||||||
|
When a timerfd is freed (fd is closed), then the ``might_cancel`` flag of the
|
||||||
|
timerfd object is cleared, the object removed from the ``cancel_list`` and
|
||||||
|
destroyed::
|
||||||
|
|
||||||
|
int timerfd_release(struct inode *inode, struct file *file)
|
||||||
|
{
|
||||||
|
struct timerfd_ctx *ctx = file->private_data;
|
||||||
|
|
||||||
|
spin_lock(&ctx->cancel_lock);
|
||||||
|
if (ctx->might_cancel) {
|
||||||
|
ctx->might_cancel = false;
|
||||||
|
spin_lock(&cancel_lock);
|
||||||
|
list_del_rcu(&ctx->clist);
|
||||||
|
spin_unlock(&cancel_lock);
|
||||||
|
}
|
||||||
|
spin_unlock(&ctx->cancel_lock);
|
||||||
|
|
||||||
|
hrtimer_cancel(&ctx->t.tmr);
|
||||||
|
kfree_rcu(ctx, rcu);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
If the ``CLOCK_REALTIME`` clock is set, for example by a time server, the
|
||||||
|
hrtimer framework calls ``timerfd_clock_was_set()`` which walks the
|
||||||
|
``cancel_list`` and wakes up processes waiting on the timerfd. While iterating
|
||||||
|
the ``cancel_list``, the ``might_cancel`` flag is consulted to skip stale
|
||||||
|
objects::
|
||||||
|
|
||||||
|
void timerfd_clock_was_set(void)
|
||||||
|
{
|
||||||
|
struct timerfd_ctx *ctx;
|
||||||
|
unsigned long flags;
|
||||||
|
|
||||||
|
rcu_read_lock();
|
||||||
|
list_for_each_entry_rcu(ctx, &cancel_list, clist) {
|
||||||
|
if (!ctx->might_cancel)
|
||||||
|
continue;
|
||||||
|
spin_lock_irqsave(&ctx->wqh.lock, flags);
|
||||||
|
if (ctx->moffs != ktime_mono_to_real(0)) {
|
||||||
|
ctx->moffs = KTIME_MAX;
|
||||||
|
ctx->ticks++;
|
||||||
|
wake_up_locked_poll(&ctx->wqh, EPOLLIN);
|
||||||
|
}
|
||||||
|
spin_unlock_irqrestore(&ctx->wqh.lock, flags);
|
||||||
|
}
|
||||||
|
rcu_read_unlock();
|
||||||
|
}
|
||||||
|
|
||||||
|
The key point here is, because RCU-traversal of the ``cancel_list`` happens
|
||||||
|
while objects are being added and removed to the list, sometimes the traversal
|
||||||
|
can step on an object that has been removed from the list. In this example, it
|
||||||
|
is seen that it is better to skip such objects using a flag.
|
||||||
|
|
||||||
|
|
||||||
Summary
|
Summary
|
||||||
-------
|
-------
|
||||||
|
|
||||||
|
@ -303,19 +448,21 @@ the most amenable to use of RCU. The simplest case is where entries are
|
||||||
either added or deleted from the data structure (or atomically modified
|
either added or deleted from the data structure (or atomically modified
|
||||||
in place), but non-atomic in-place modifications can be handled by making
|
in place), but non-atomic in-place modifications can be handled by making
|
||||||
a copy, updating the copy, then replacing the original with the copy.
|
a copy, updating the copy, then replacing the original with the copy.
|
||||||
If stale data cannot be tolerated, then a "deleted" flag may be used
|
If stale data cannot be tolerated, then a *deleted* flag may be used
|
||||||
in conjunction with a per-entry spinlock in order to allow the search
|
in conjunction with a per-entry spinlock in order to allow the search
|
||||||
function to reject newly deleted data.
|
function to reject newly deleted data.
|
||||||
|
|
||||||
.. _answer_quick_quiz_list:
|
.. _quick_quiz_answer:
|
||||||
|
|
||||||
Answer to Quick Quiz:
|
Answer to Quick Quiz:
|
||||||
Why does the search function need to return holding the per-entry
|
For the deleted-flag technique to be helpful, why is it necessary
|
||||||
lock for this deleted-flag technique to be helpful?
|
to hold the per-entry lock while returning from the search function?
|
||||||
|
|
||||||
If the search function drops the per-entry lock before returning,
|
If the search function drops the per-entry lock before returning,
|
||||||
then the caller will be processing stale data in any case. If it
|
then the caller will be processing stale data in any case. If it
|
||||||
is really OK to be processing stale data, then you don't need a
|
is really OK to be processing stale data, then you don't need a
|
||||||
"deleted" flag. If processing stale data really is a problem,
|
*deleted* flag. If processing stale data really is a problem,
|
||||||
then you need to hold the per-entry lock across all of the code
|
then you need to hold the per-entry lock across all of the code
|
||||||
that uses the value that was returned.
|
that uses the value that was returned.
|
||||||
|
|
||||||
|
:ref:`Back to Quick Quiz <quick_quiz>`
|
||||||
|
|
|
@ -11,8 +11,8 @@ must be long enough that any readers accessing the item being deleted have
|
||||||
since dropped their references. For example, an RCU-protected deletion
|
since dropped their references. For example, an RCU-protected deletion
|
||||||
from a linked list would first remove the item from the list, wait for
|
from a linked list would first remove the item from the list, wait for
|
||||||
a grace period to elapse, then free the element. See the
|
a grace period to elapse, then free the element. See the
|
||||||
Documentation/RCU/listRCU.rst file for more information on using RCU with
|
:ref:`Documentation/RCU/listRCU.rst <list_rcu_doc>` for more information on
|
||||||
linked lists.
|
using RCU with linked lists.
|
||||||
|
|
||||||
Frequently Asked Questions
|
Frequently Asked Questions
|
||||||
--------------------------
|
--------------------------
|
||||||
|
@ -50,7 +50,7 @@ Frequently Asked Questions
|
||||||
- If I am running on a uniprocessor kernel, which can only do one
|
- If I am running on a uniprocessor kernel, which can only do one
|
||||||
thing at a time, why should I wait for a grace period?
|
thing at a time, why should I wait for a grace period?
|
||||||
|
|
||||||
See the Documentation/RCU/UP.rst file for more information.
|
See :ref:`Documentation/RCU/UP.rst <up_doc>` for more information.
|
||||||
|
|
||||||
- How can I see where RCU is currently used in the Linux kernel?
|
- How can I see where RCU is currently used in the Linux kernel?
|
||||||
|
|
||||||
|
@ -68,18 +68,18 @@ Frequently Asked Questions
|
||||||
|
|
||||||
- Why the name "RCU"?
|
- Why the name "RCU"?
|
||||||
|
|
||||||
"RCU" stands for "read-copy update". The file Documentation/RCU/listRCU.rst
|
"RCU" stands for "read-copy update".
|
||||||
has more information on where this name came from, search for
|
:ref:`Documentation/RCU/listRCU.rst <list_rcu_doc>` has more information on where
|
||||||
"read-copy update" to find it.
|
this name came from, search for "read-copy update" to find it.
|
||||||
|
|
||||||
- I hear that RCU is patented? What is with that?
|
- I hear that RCU is patented? What is with that?
|
||||||
|
|
||||||
Yes, it is. There are several known patents related to RCU,
|
Yes, it is. There are several known patents related to RCU,
|
||||||
search for the string "Patent" in RTFP.txt to find them.
|
search for the string "Patent" in Documentation/RCU/RTFP.txt to find them.
|
||||||
Of these, one was allowed to lapse by the assignee, and the
|
Of these, one was allowed to lapse by the assignee, and the
|
||||||
others have been contributed to the Linux kernel under GPL.
|
others have been contributed to the Linux kernel under GPL.
|
||||||
There are now also LGPL implementations of user-level RCU
|
There are now also LGPL implementations of user-level RCU
|
||||||
available (http://liburcu.org/).
|
available (https://liburcu.org/).
|
||||||
|
|
||||||
- I hear that RCU needs work in order to support realtime kernels?
|
- I hear that RCU needs work in order to support realtime kernels?
|
||||||
|
|
||||||
|
@ -88,5 +88,5 @@ Frequently Asked Questions
|
||||||
|
|
||||||
- Where can I find more information on RCU?
|
- Where can I find more information on RCU?
|
||||||
|
|
||||||
See the RTFP.txt file in this directory.
|
See the Documentation/RCU/RTFP.txt file.
|
||||||
Or point your browser at (http://www.rdrop.com/users/paulmck/RCU/).
|
Or point your browser at (http://www.rdrop.com/users/paulmck/RCU/).
|
||||||
|
|
|
@ -124,9 +124,14 @@ using a dynamically allocated srcu_struct (hence "srcud-" rather than
|
||||||
debugging. The final "T" entry contains the totals of the counters.
|
debugging. The final "T" entry contains the totals of the counters.
|
||||||
|
|
||||||
|
|
||||||
USAGE
|
USAGE ON SPECIFIC KERNEL BUILDS
|
||||||
|
|
||||||
The following script may be used to torture RCU:
|
It is sometimes desirable to torture RCU on a specific kernel build,
|
||||||
|
for example, when preparing to put that kernel build into production.
|
||||||
|
In that case, the kernel should be built with CONFIG_RCU_TORTURE_TEST=m
|
||||||
|
so that the test can be started using modprobe and terminated using rmmod.
|
||||||
|
|
||||||
|
For example, the following script may be used to torture RCU:
|
||||||
|
|
||||||
#!/bin/sh
|
#!/bin/sh
|
||||||
|
|
||||||
|
@ -142,8 +147,136 @@ checked for such errors. The "rmmod" command forces a "SUCCESS",
|
||||||
two are self-explanatory, while the last indicates that while there
|
two are self-explanatory, while the last indicates that while there
|
||||||
were no RCU failures, CPU-hotplug problems were detected.
|
were no RCU failures, CPU-hotplug problems were detected.
|
||||||
|
|
||||||
However, the tools/testing/selftests/rcutorture/bin/kvm.sh script
|
|
||||||
provides better automation, including automatic failure analysis.
|
USAGE ON MAINLINE KERNELS
|
||||||
It assumes a qemu/kvm-enabled platform, and runs guest OSes out of initrd.
|
|
||||||
See tools/testing/selftests/rcutorture/doc/initrd.txt for instructions
|
When using rcutorture to test changes to RCU itself, it is often
|
||||||
on setting up such an initrd.
|
necessary to build a number of kernels in order to test that change
|
||||||
|
across a broad range of combinations of the relevant Kconfig options
|
||||||
|
and of the relevant kernel boot parameters. In this situation, use
|
||||||
|
of modprobe and rmmod can be quite time-consuming and error-prone.
|
||||||
|
|
||||||
|
Therefore, the tools/testing/selftests/rcutorture/bin/kvm.sh
|
||||||
|
script is available for mainline testing for x86, arm64, and
|
||||||
|
powerpc. By default, it will run the series of tests specified by
|
||||||
|
tools/testing/selftests/rcutorture/configs/rcu/CFLIST, with each test
|
||||||
|
running for 30 minutes within a guest OS using a minimal userspace
|
||||||
|
supplied by an automatically generated initrd. After the tests are
|
||||||
|
complete, the resulting build products and console output are analyzed
|
||||||
|
for errors and the results of the runs are summarized.
|
||||||
|
|
||||||
|
On larger systems, rcutorture testing can be accelerated by passing the
|
||||||
|
--cpus argument to kvm.sh. For example, on a 64-CPU system, "--cpus 43"
|
||||||
|
would use up to 43 CPUs to run tests concurrently, which as of v5.4 would
|
||||||
|
complete all the scenarios in two batches, reducing the time to complete
|
||||||
|
from about eight hours to about one hour (not counting the time to build
|
||||||
|
the sixteen kernels). The "--dryrun sched" argument will not run tests,
|
||||||
|
but rather tell you how the tests would be scheduled into batches. This
|
||||||
|
can be useful when working out how many CPUs to specify in the --cpus
|
||||||
|
argument.
|
||||||
|
|
||||||
|
Not all changes require that all scenarios be run. For example, a change
|
||||||
|
to Tree SRCU might run only the SRCU-N and SRCU-P scenarios using the
|
||||||
|
--configs argument to kvm.sh as follows: "--configs 'SRCU-N SRCU-P'".
|
||||||
|
Large systems can run multiple copies of of the full set of scenarios,
|
||||||
|
for example, a system with 448 hardware threads can run five instances
|
||||||
|
of the full set concurrently. To make this happen:
|
||||||
|
|
||||||
|
kvm.sh --cpus 448 --configs '5*CFLIST'
|
||||||
|
|
||||||
|
Alternatively, such a system can run 56 concurrent instances of a single
|
||||||
|
eight-CPU scenario:
|
||||||
|
|
||||||
|
kvm.sh --cpus 448 --configs '56*TREE04'
|
||||||
|
|
||||||
|
Or 28 concurrent instances of each of two eight-CPU scenarios:
|
||||||
|
|
||||||
|
kvm.sh --cpus 448 --configs '28*TREE03 28*TREE04'
|
||||||
|
|
||||||
|
Of course, each concurrent instance will use memory, which can be
|
||||||
|
limited using the --memory argument, which defaults to 512M. Small
|
||||||
|
values for memory may require disabling the callback-flooding tests
|
||||||
|
using the --bootargs parameter discussed below.
|
||||||
|
|
||||||
|
Sometimes additional debugging is useful, and in such cases the --kconfig
|
||||||
|
parameter to kvm.sh may be used, for example, "--kconfig 'CONFIG_KASAN=y'".
|
||||||
|
|
||||||
|
Kernel boot arguments can also be supplied, for example, to control
|
||||||
|
rcutorture's module parameters. For example, to test a change to RCU's
|
||||||
|
CPU stall-warning code, use "--bootargs 'rcutorture.stall_cpu=30'".
|
||||||
|
This will of course result in the scripting reporting a failure, namely
|
||||||
|
the resuling RCU CPU stall warning. As noted above, reducing memory may
|
||||||
|
require disabling rcutorture's callback-flooding tests:
|
||||||
|
|
||||||
|
kvm.sh --cpus 448 --configs '56*TREE04' --memory 128M \
|
||||||
|
--bootargs 'rcutorture.fwd_progress=0'
|
||||||
|
|
||||||
|
Sometimes all that is needed is a full set of kernel builds. This is
|
||||||
|
what the --buildonly argument does.
|
||||||
|
|
||||||
|
Finally, the --trust-make argument allows each kernel build to reuse what
|
||||||
|
it can from the previous kernel build.
|
||||||
|
|
||||||
|
There are additional more arcane arguments that are documented in the
|
||||||
|
source code of the kvm.sh script.
|
||||||
|
|
||||||
|
If a run contains failures, the number of buildtime and runtime failures
|
||||||
|
is listed at the end of the kvm.sh output, which you really should redirect
|
||||||
|
to a file. The build products and console output of each run is kept in
|
||||||
|
tools/testing/selftests/rcutorture/res in timestamped directories. A
|
||||||
|
given directory can be supplied to kvm-find-errors.sh in order to have
|
||||||
|
it cycle you through summaries of errors and full error logs. For example:
|
||||||
|
|
||||||
|
tools/testing/selftests/rcutorture/bin/kvm-find-errors.sh \
|
||||||
|
tools/testing/selftests/rcutorture/res/2020.01.20-15.54.23
|
||||||
|
|
||||||
|
However, it is often more convenient to access the files directly.
|
||||||
|
Files pertaining to all scenarios in a run reside in the top-level
|
||||||
|
directory (2020.01.20-15.54.23 in the example above), while per-scenario
|
||||||
|
files reside in a subdirectory named after the scenario (for example,
|
||||||
|
"TREE04"). If a given scenario ran more than once (as in "--configs
|
||||||
|
'56*TREE04'" above), the directories corresponding to the second and
|
||||||
|
subsequent runs of that scenario include a sequence number, for example,
|
||||||
|
"TREE04.2", "TREE04.3", and so on.
|
||||||
|
|
||||||
|
The most frequently used file in the top-level directory is testid.txt.
|
||||||
|
If the test ran in a git repository, then this file contains the commit
|
||||||
|
that was tested and any uncommitted changes in diff format.
|
||||||
|
|
||||||
|
The most frequently used files in each per-scenario-run directory are:
|
||||||
|
|
||||||
|
.config: This file contains the Kconfig options.
|
||||||
|
|
||||||
|
Make.out: This contains build output for a specific scenario.
|
||||||
|
|
||||||
|
console.log: This contains the console output for a specific scenario.
|
||||||
|
This file may be examined once the kernel has booted, but
|
||||||
|
it might not exist if the build failed.
|
||||||
|
|
||||||
|
vmlinux: This contains the kernel, which can be useful with tools like
|
||||||
|
objdump and gdb.
|
||||||
|
|
||||||
|
A number of additional files are available, but are less frequently used.
|
||||||
|
Many are intended for debugging of rcutorture itself or of its scripting.
|
||||||
|
|
||||||
|
As of v5.4, a successful run with the default set of scenarios produces
|
||||||
|
the following summary at the end of the run on a 12-CPU system:
|
||||||
|
|
||||||
|
SRCU-N ------- 804233 GPs (148.932/s) [srcu: g10008272 f0x0 ]
|
||||||
|
SRCU-P ------- 202320 GPs (37.4667/s) [srcud: g1809476 f0x0 ]
|
||||||
|
SRCU-t ------- 1122086 GPs (207.794/s) [srcu: g0 f0x0 ]
|
||||||
|
SRCU-u ------- 1111285 GPs (205.794/s) [srcud: g1 f0x0 ]
|
||||||
|
TASKS01 ------- 19666 GPs (3.64185/s) [tasks: g0 f0x0 ]
|
||||||
|
TASKS02 ------- 20541 GPs (3.80389/s) [tasks: g0 f0x0 ]
|
||||||
|
TASKS03 ------- 19416 GPs (3.59556/s) [tasks: g0 f0x0 ]
|
||||||
|
TINY01 ------- 836134 GPs (154.84/s) [rcu: g0 f0x0 ] n_max_cbs: 34198
|
||||||
|
TINY02 ------- 850371 GPs (157.476/s) [rcu: g0 f0x0 ] n_max_cbs: 2631
|
||||||
|
TREE01 ------- 162625 GPs (30.1157/s) [rcu: g1124169 f0x0 ]
|
||||||
|
TREE02 ------- 333003 GPs (61.6672/s) [rcu: g2647753 f0x0 ] n_max_cbs: 35844
|
||||||
|
TREE03 ------- 306623 GPs (56.782/s) [rcu: g2975325 f0x0 ] n_max_cbs: 1496497
|
||||||
|
CPU count limited from 16 to 12
|
||||||
|
TREE04 ------- 246149 GPs (45.5831/s) [rcu: g1695737 f0x0 ] n_max_cbs: 434961
|
||||||
|
TREE05 ------- 314603 GPs (58.2598/s) [rcu: g2257741 f0x2 ] n_max_cbs: 193997
|
||||||
|
TREE07 ------- 167347 GPs (30.9902/s) [rcu: g1079021 f0x0 ] n_max_cbs: 478732
|
||||||
|
CPU count limited from 16 to 12
|
||||||
|
TREE09 ------- 752238 GPs (139.303/s) [rcu: g13075057 f0x0 ] n_max_cbs: 99011
|
||||||
|
|
|
@ -4005,6 +4005,15 @@
|
||||||
Set threshold of queued RCU callbacks below which
|
Set threshold of queued RCU callbacks below which
|
||||||
batch limiting is re-enabled.
|
batch limiting is re-enabled.
|
||||||
|
|
||||||
|
rcutree.qovld= [KNL]
|
||||||
|
Set threshold of queued RCU callbacks beyond which
|
||||||
|
RCU's force-quiescent-state scan will aggressively
|
||||||
|
enlist help from cond_resched() and sched IPIs to
|
||||||
|
help CPUs more quickly reach quiescent states.
|
||||||
|
Set to less than zero to make this be set based
|
||||||
|
on rcutree.qhimark at boot time and to zero to
|
||||||
|
disable more aggressive help enlistment.
|
||||||
|
|
||||||
rcutree.rcu_idle_gp_delay= [KNL]
|
rcutree.rcu_idle_gp_delay= [KNL]
|
||||||
Set wakeup interval for idle CPUs that have
|
Set wakeup interval for idle CPUs that have
|
||||||
RCU callbacks (RCU_FAST_NO_HZ=y).
|
RCU callbacks (RCU_FAST_NO_HZ=y).
|
||||||
|
@ -4220,6 +4229,12 @@
|
||||||
rcupdate.rcu_cpu_stall_suppress= [KNL]
|
rcupdate.rcu_cpu_stall_suppress= [KNL]
|
||||||
Suppress RCU CPU stall warning messages.
|
Suppress RCU CPU stall warning messages.
|
||||||
|
|
||||||
|
rcupdate.rcu_cpu_stall_suppress_at_boot= [KNL]
|
||||||
|
Suppress RCU CPU stall warning messages and
|
||||||
|
rcutorture writer stall warnings that occur
|
||||||
|
during early boot, that is, during the time
|
||||||
|
before the init task is spawned.
|
||||||
|
|
||||||
rcupdate.rcu_cpu_stall_timeout= [KNL]
|
rcupdate.rcu_cpu_stall_timeout= [KNL]
|
||||||
Set timeout for RCU CPU stall warning messages.
|
Set timeout for RCU CPU stall warning messages.
|
||||||
|
|
||||||
|
@ -4892,6 +4907,10 @@
|
||||||
topology updates sent by the hypervisor to this
|
topology updates sent by the hypervisor to this
|
||||||
LPAR.
|
LPAR.
|
||||||
|
|
||||||
|
torture.disable_onoff_at_boot= [KNL]
|
||||||
|
Prevent the CPU-hotplug component of torturing
|
||||||
|
until after init has spawned.
|
||||||
|
|
||||||
tp720= [HW,PS2]
|
tp720= [HW,PS2]
|
||||||
|
|
||||||
tpm_suspend_pcr=[HW,TPM]
|
tpm_suspend_pcr=[HW,TPM]
|
||||||
|
|
|
@ -185,7 +185,7 @@ As a further example, consider this sequence of events:
|
||||||
=============== ===============
|
=============== ===============
|
||||||
{ A == 1, B == 2, C == 3, P == &A, Q == &C }
|
{ A == 1, B == 2, C == 3, P == &A, Q == &C }
|
||||||
B = 4; Q = P;
|
B = 4; Q = P;
|
||||||
P = &B D = *Q;
|
P = &B; D = *Q;
|
||||||
|
|
||||||
There is an obvious data dependency here, as the value loaded into D depends on
|
There is an obvious data dependency here, as the value loaded into D depends on
|
||||||
the address retrieved from P by CPU 2. At the end of the sequence, any of the
|
the address retrieved from P by CPU 2. At the end of the sequence, any of the
|
||||||
|
@ -569,7 +569,7 @@ following sequence of events:
|
||||||
{ A == 1, B == 2, C == 3, P == &A, Q == &C }
|
{ A == 1, B == 2, C == 3, P == &A, Q == &C }
|
||||||
B = 4;
|
B = 4;
|
||||||
<write barrier>
|
<write barrier>
|
||||||
WRITE_ONCE(P, &B)
|
WRITE_ONCE(P, &B);
|
||||||
Q = READ_ONCE(P);
|
Q = READ_ONCE(P);
|
||||||
D = *Q;
|
D = *Q;
|
||||||
|
|
||||||
|
@ -1721,7 +1721,7 @@ of optimizations:
|
||||||
and WRITE_ONCE() are more selective: With READ_ONCE() and
|
and WRITE_ONCE() are more selective: With READ_ONCE() and
|
||||||
WRITE_ONCE(), the compiler need only forget the contents of the
|
WRITE_ONCE(), the compiler need only forget the contents of the
|
||||||
indicated memory locations, while with barrier() the compiler must
|
indicated memory locations, while with barrier() the compiler must
|
||||||
discard the value of all memory locations that it has currented
|
discard the value of all memory locations that it has currently
|
||||||
cached in any machine registers. Of course, the compiler must also
|
cached in any machine registers. Of course, the compiler must also
|
||||||
respect the order in which the READ_ONCE()s and WRITE_ONCE()s occur,
|
respect the order in which the READ_ONCE()s and WRITE_ONCE()s occur,
|
||||||
though the CPU of course need not do so.
|
though the CPU of course need not do so.
|
||||||
|
@ -1833,7 +1833,7 @@ Aside: In the case of data dependencies, the compiler would be expected
|
||||||
to issue the loads in the correct order (eg. `a[b]` would have to load
|
to issue the loads in the correct order (eg. `a[b]` would have to load
|
||||||
the value of b before loading a[b]), however there is no guarantee in
|
the value of b before loading a[b]), however there is no guarantee in
|
||||||
the C specification that the compiler may not speculate the value of b
|
the C specification that the compiler may not speculate the value of b
|
||||||
(eg. is equal to 1) and load a before b (eg. tmp = a[1]; if (b != 1)
|
(eg. is equal to 1) and load a[b] before b (eg. tmp = a[1]; if (b != 1)
|
||||||
tmp = a[b]; ). There is also the problem of a compiler reloading b after
|
tmp = a[b]; ). There is also the problem of a compiler reloading b after
|
||||||
having loaded a[b], thus having a newer copy of b than a[b]. A consensus
|
having loaded a[b], thus having a newer copy of b than a[b]. A consensus
|
||||||
has not yet been reached about these problems, however the READ_ONCE()
|
has not yet been reached about these problems, however the READ_ONCE()
|
||||||
|
|
|
@ -2489,7 +2489,7 @@ static int nfs_access_get_cached_rcu(struct inode *inode, const struct cred *cre
|
||||||
rcu_read_lock();
|
rcu_read_lock();
|
||||||
if (nfsi->cache_validity & NFS_INO_INVALID_ACCESS)
|
if (nfsi->cache_validity & NFS_INO_INVALID_ACCESS)
|
||||||
goto out;
|
goto out;
|
||||||
lh = rcu_dereference(nfsi->access_cache_entry_lru.prev);
|
lh = rcu_dereference(list_tail_rcu(&nfsi->access_cache_entry_lru));
|
||||||
cache = list_entry(lh, struct nfs_access_entry, lru);
|
cache = list_entry(lh, struct nfs_access_entry, lru);
|
||||||
if (lh == &nfsi->access_cache_entry_lru ||
|
if (lh == &nfsi->access_cache_entry_lru ||
|
||||||
cred_fscmp(cred, cache->cred) != 0)
|
cred_fscmp(cred, cache->cred) != 0)
|
||||||
|
|
|
@ -60,7 +60,7 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
|
||||||
#define __list_check_rcu(dummy, cond, extra...) \
|
#define __list_check_rcu(dummy, cond, extra...) \
|
||||||
({ \
|
({ \
|
||||||
check_arg_count_one(extra); \
|
check_arg_count_one(extra); \
|
||||||
RCU_LOCKDEP_WARN(!cond && !rcu_read_lock_any_held(), \
|
RCU_LOCKDEP_WARN(!(cond) && !rcu_read_lock_any_held(), \
|
||||||
"RCU-list traversed in non-reader section!"); \
|
"RCU-list traversed in non-reader section!"); \
|
||||||
})
|
})
|
||||||
#else
|
#else
|
||||||
|
|
|
@ -83,6 +83,7 @@ void rcu_scheduler_starting(void);
|
||||||
static inline void rcu_scheduler_starting(void) { }
|
static inline void rcu_scheduler_starting(void) { }
|
||||||
#endif /* #else #ifndef CONFIG_SRCU */
|
#endif /* #else #ifndef CONFIG_SRCU */
|
||||||
static inline void rcu_end_inkernel_boot(void) { }
|
static inline void rcu_end_inkernel_boot(void) { }
|
||||||
|
static inline bool rcu_inkernel_boot_has_ended(void) { return true; }
|
||||||
static inline bool rcu_is_watching(void) { return true; }
|
static inline bool rcu_is_watching(void) { return true; }
|
||||||
static inline void rcu_momentary_dyntick_idle(void) { }
|
static inline void rcu_momentary_dyntick_idle(void) { }
|
||||||
static inline void kfree_rcu_scheduler_running(void) { }
|
static inline void kfree_rcu_scheduler_running(void) { }
|
||||||
|
|
|
@ -54,6 +54,7 @@ void exit_rcu(void);
|
||||||
void rcu_scheduler_starting(void);
|
void rcu_scheduler_starting(void);
|
||||||
extern int rcu_scheduler_active __read_mostly;
|
extern int rcu_scheduler_active __read_mostly;
|
||||||
void rcu_end_inkernel_boot(void);
|
void rcu_end_inkernel_boot(void);
|
||||||
|
bool rcu_inkernel_boot_has_ended(void);
|
||||||
bool rcu_is_watching(void);
|
bool rcu_is_watching(void);
|
||||||
#ifndef CONFIG_PREEMPTION
|
#ifndef CONFIG_PREEMPTION
|
||||||
void rcu_all_qs(void);
|
void rcu_all_qs(void);
|
||||||
|
|
|
@ -164,7 +164,7 @@ static inline void destroy_timer_on_stack(struct timer_list *timer) { }
|
||||||
*/
|
*/
|
||||||
static inline int timer_pending(const struct timer_list * timer)
|
static inline int timer_pending(const struct timer_list * timer)
|
||||||
{
|
{
|
||||||
return timer->entry.pprev != NULL;
|
return !hlist_unhashed_lockless(&timer->entry);
|
||||||
}
|
}
|
||||||
|
|
||||||
extern void add_timer_on(struct timer_list *timer, int cpu);
|
extern void add_timer_on(struct timer_list *timer, int cpu);
|
||||||
|
|
|
@ -623,6 +623,34 @@ TRACE_EVENT_RCU(rcu_invoke_kfree_callback,
|
||||||
__entry->rcuname, __entry->rhp, __entry->offset)
|
__entry->rcuname, __entry->rhp, __entry->offset)
|
||||||
);
|
);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Tracepoint for the invocation of a single RCU callback of the special
|
||||||
|
* kfree_bulk() form. The first argument is the RCU flavor, the second
|
||||||
|
* argument is a number of elements in array to free, the third is an
|
||||||
|
* address of the array holding nr_records entries.
|
||||||
|
*/
|
||||||
|
TRACE_EVENT_RCU(rcu_invoke_kfree_bulk_callback,
|
||||||
|
|
||||||
|
TP_PROTO(const char *rcuname, unsigned long nr_records, void **p),
|
||||||
|
|
||||||
|
TP_ARGS(rcuname, nr_records, p),
|
||||||
|
|
||||||
|
TP_STRUCT__entry(
|
||||||
|
__field(const char *, rcuname)
|
||||||
|
__field(unsigned long, nr_records)
|
||||||
|
__field(void **, p)
|
||||||
|
),
|
||||||
|
|
||||||
|
TP_fast_assign(
|
||||||
|
__entry->rcuname = rcuname;
|
||||||
|
__entry->nr_records = nr_records;
|
||||||
|
__entry->p = p;
|
||||||
|
),
|
||||||
|
|
||||||
|
TP_printk("%s bulk=0x%p nr_records=%lu",
|
||||||
|
__entry->rcuname, __entry->p, __entry->nr_records)
|
||||||
|
);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Tracepoint for exiting rcu_do_batch after RCU callbacks have been
|
* Tracepoint for exiting rcu_do_batch after RCU callbacks have been
|
||||||
* invoked. The first argument is the name of the RCU flavor,
|
* invoked. The first argument is the name of the RCU flavor,
|
||||||
|
@ -712,6 +740,7 @@ TRACE_EVENT_RCU(rcu_torture_read,
|
||||||
* "Begin": rcu_barrier() started.
|
* "Begin": rcu_barrier() started.
|
||||||
* "EarlyExit": rcu_barrier() piggybacked, thus early exit.
|
* "EarlyExit": rcu_barrier() piggybacked, thus early exit.
|
||||||
* "Inc1": rcu_barrier() piggyback check counter incremented.
|
* "Inc1": rcu_barrier() piggyback check counter incremented.
|
||||||
|
* "OfflineNoCBQ": rcu_barrier() found offline no-CBs CPU with callbacks.
|
||||||
* "OnlineQ": rcu_barrier() found online CPU with callbacks.
|
* "OnlineQ": rcu_barrier() found online CPU with callbacks.
|
||||||
* "OnlineNQ": rcu_barrier() found online CPU, no callbacks.
|
* "OnlineNQ": rcu_barrier() found online CPU, no callbacks.
|
||||||
* "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
|
* "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
|
||||||
|
|
|
@ -618,7 +618,7 @@ static struct lock_torture_ops percpu_rwsem_lock_ops = {
|
||||||
static int lock_torture_writer(void *arg)
|
static int lock_torture_writer(void *arg)
|
||||||
{
|
{
|
||||||
struct lock_stress_stats *lwsp = arg;
|
struct lock_stress_stats *lwsp = arg;
|
||||||
static DEFINE_TORTURE_RANDOM(rand);
|
DEFINE_TORTURE_RANDOM(rand);
|
||||||
|
|
||||||
VERBOSE_TOROUT_STRING("lock_torture_writer task started");
|
VERBOSE_TOROUT_STRING("lock_torture_writer task started");
|
||||||
set_user_nice(current, MAX_NICE);
|
set_user_nice(current, MAX_NICE);
|
||||||
|
@ -655,7 +655,7 @@ static int lock_torture_writer(void *arg)
|
||||||
static int lock_torture_reader(void *arg)
|
static int lock_torture_reader(void *arg)
|
||||||
{
|
{
|
||||||
struct lock_stress_stats *lrsp = arg;
|
struct lock_stress_stats *lrsp = arg;
|
||||||
static DEFINE_TORTURE_RANDOM(rand);
|
DEFINE_TORTURE_RANDOM(rand);
|
||||||
|
|
||||||
VERBOSE_TOROUT_STRING("lock_torture_reader task started");
|
VERBOSE_TOROUT_STRING("lock_torture_reader task started");
|
||||||
set_user_nice(current, MAX_NICE);
|
set_user_nice(current, MAX_NICE);
|
||||||
|
@ -696,15 +696,16 @@ static void __torture_print_stats(char *page,
|
||||||
if (statp[i].n_lock_fail)
|
if (statp[i].n_lock_fail)
|
||||||
fail = true;
|
fail = true;
|
||||||
sum += statp[i].n_lock_acquired;
|
sum += statp[i].n_lock_acquired;
|
||||||
if (max < statp[i].n_lock_fail)
|
if (max < statp[i].n_lock_acquired)
|
||||||
max = statp[i].n_lock_fail;
|
max = statp[i].n_lock_acquired;
|
||||||
if (min > statp[i].n_lock_fail)
|
if (min > statp[i].n_lock_acquired)
|
||||||
min = statp[i].n_lock_fail;
|
min = statp[i].n_lock_acquired;
|
||||||
}
|
}
|
||||||
page += sprintf(page,
|
page += sprintf(page,
|
||||||
"%s: Total: %lld Max/Min: %ld/%ld %s Fail: %d %s\n",
|
"%s: Total: %lld Max/Min: %ld/%ld %s Fail: %d %s\n",
|
||||||
write ? "Writes" : "Reads ",
|
write ? "Writes" : "Reads ",
|
||||||
sum, max, min, max / 2 > min ? "???" : "",
|
sum, max, min,
|
||||||
|
!onoff_interval && max / 2 > min ? "???" : "",
|
||||||
fail, fail ? "!!!" : "");
|
fail, fail ? "!!!" : "");
|
||||||
if (fail)
|
if (fail)
|
||||||
atomic_inc(&cxt.n_lock_torture_errors);
|
atomic_inc(&cxt.n_lock_torture_errors);
|
||||||
|
|
|
@ -57,7 +57,7 @@ rt_mutex_set_owner(struct rt_mutex *lock, struct task_struct *owner)
|
||||||
if (rt_mutex_has_waiters(lock))
|
if (rt_mutex_has_waiters(lock))
|
||||||
val |= RT_MUTEX_HAS_WAITERS;
|
val |= RT_MUTEX_HAS_WAITERS;
|
||||||
|
|
||||||
lock->owner = (struct task_struct *)val;
|
WRITE_ONCE(lock->owner, (struct task_struct *)val);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline void clear_rt_mutex_waiters(struct rt_mutex *lock)
|
static inline void clear_rt_mutex_waiters(struct rt_mutex *lock)
|
||||||
|
|
|
@ -3,6 +3,10 @@
|
||||||
# and is generally not a function of system call inputs.
|
# and is generally not a function of system call inputs.
|
||||||
KCOV_INSTRUMENT := n
|
KCOV_INSTRUMENT := n
|
||||||
|
|
||||||
|
ifeq ($(CONFIG_KCSAN),y)
|
||||||
|
KBUILD_CFLAGS += -g -fno-omit-frame-pointer
|
||||||
|
endif
|
||||||
|
|
||||||
obj-y += update.o sync.o
|
obj-y += update.o sync.o
|
||||||
obj-$(CONFIG_TREE_SRCU) += srcutree.o
|
obj-$(CONFIG_TREE_SRCU) += srcutree.o
|
||||||
obj-$(CONFIG_TINY_SRCU) += srcutiny.o
|
obj-$(CONFIG_TINY_SRCU) += srcutiny.o
|
||||||
|
|
|
@ -198,6 +198,13 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head)
|
||||||
}
|
}
|
||||||
#endif /* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
|
#endif /* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
|
||||||
|
|
||||||
|
extern int rcu_cpu_stall_suppress_at_boot;
|
||||||
|
|
||||||
|
static inline bool rcu_stall_is_suppressed_at_boot(void)
|
||||||
|
{
|
||||||
|
return rcu_cpu_stall_suppress_at_boot && !rcu_inkernel_boot_has_ended();
|
||||||
|
}
|
||||||
|
|
||||||
#ifdef CONFIG_RCU_STALL_COMMON
|
#ifdef CONFIG_RCU_STALL_COMMON
|
||||||
|
|
||||||
extern int rcu_cpu_stall_ftrace_dump;
|
extern int rcu_cpu_stall_ftrace_dump;
|
||||||
|
@ -205,6 +212,11 @@ extern int rcu_cpu_stall_suppress;
|
||||||
extern int rcu_cpu_stall_timeout;
|
extern int rcu_cpu_stall_timeout;
|
||||||
int rcu_jiffies_till_stall_check(void);
|
int rcu_jiffies_till_stall_check(void);
|
||||||
|
|
||||||
|
static inline bool rcu_stall_is_suppressed(void)
|
||||||
|
{
|
||||||
|
return rcu_stall_is_suppressed_at_boot() || rcu_cpu_stall_suppress;
|
||||||
|
}
|
||||||
|
|
||||||
#define rcu_ftrace_dump_stall_suppress() \
|
#define rcu_ftrace_dump_stall_suppress() \
|
||||||
do { \
|
do { \
|
||||||
if (!rcu_cpu_stall_suppress) \
|
if (!rcu_cpu_stall_suppress) \
|
||||||
|
@ -218,6 +230,11 @@ do { \
|
||||||
} while (0)
|
} while (0)
|
||||||
|
|
||||||
#else /* #endif #ifdef CONFIG_RCU_STALL_COMMON */
|
#else /* #endif #ifdef CONFIG_RCU_STALL_COMMON */
|
||||||
|
|
||||||
|
static inline bool rcu_stall_is_suppressed(void)
|
||||||
|
{
|
||||||
|
return rcu_stall_is_suppressed_at_boot();
|
||||||
|
}
|
||||||
#define rcu_ftrace_dump_stall_suppress()
|
#define rcu_ftrace_dump_stall_suppress()
|
||||||
#define rcu_ftrace_dump_stall_unsuppress()
|
#define rcu_ftrace_dump_stall_unsuppress()
|
||||||
#endif /* #ifdef CONFIG_RCU_STALL_COMMON */
|
#endif /* #ifdef CONFIG_RCU_STALL_COMMON */
|
||||||
|
@ -325,7 +342,8 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
|
||||||
* Iterate over all possible CPUs in a leaf RCU node.
|
* Iterate over all possible CPUs in a leaf RCU node.
|
||||||
*/
|
*/
|
||||||
#define for_each_leaf_node_possible_cpu(rnp, cpu) \
|
#define for_each_leaf_node_possible_cpu(rnp, cpu) \
|
||||||
for ((cpu) = cpumask_next((rnp)->grplo - 1, cpu_possible_mask); \
|
for (WARN_ON_ONCE(!rcu_is_leaf_node(rnp)), \
|
||||||
|
(cpu) = cpumask_next((rnp)->grplo - 1, cpu_possible_mask); \
|
||||||
(cpu) <= rnp->grphi; \
|
(cpu) <= rnp->grphi; \
|
||||||
(cpu) = cpumask_next((cpu), cpu_possible_mask))
|
(cpu) = cpumask_next((cpu), cpu_possible_mask))
|
||||||
|
|
||||||
|
@ -335,7 +353,8 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
|
||||||
#define rcu_find_next_bit(rnp, cpu, mask) \
|
#define rcu_find_next_bit(rnp, cpu, mask) \
|
||||||
((rnp)->grplo + find_next_bit(&(mask), BITS_PER_LONG, (cpu)))
|
((rnp)->grplo + find_next_bit(&(mask), BITS_PER_LONG, (cpu)))
|
||||||
#define for_each_leaf_node_cpu_mask(rnp, cpu, mask) \
|
#define for_each_leaf_node_cpu_mask(rnp, cpu, mask) \
|
||||||
for ((cpu) = rcu_find_next_bit((rnp), 0, (mask)); \
|
for (WARN_ON_ONCE(!rcu_is_leaf_node(rnp)), \
|
||||||
|
(cpu) = rcu_find_next_bit((rnp), 0, (mask)); \
|
||||||
(cpu) <= rnp->grphi; \
|
(cpu) <= rnp->grphi; \
|
||||||
(cpu) = rcu_find_next_bit((rnp), (cpu) + 1 - (rnp->grplo), (mask)))
|
(cpu) = rcu_find_next_bit((rnp), (cpu) + 1 - (rnp->grplo), (mask)))
|
||||||
|
|
||||||
|
|
|
@ -182,7 +182,7 @@ void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
|
||||||
bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp)
|
bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp)
|
||||||
{
|
{
|
||||||
return rcu_segcblist_is_enabled(rsclp) &&
|
return rcu_segcblist_is_enabled(rsclp) &&
|
||||||
&rsclp->head != rsclp->tails[RCU_DONE_TAIL];
|
&rsclp->head != READ_ONCE(rsclp->tails[RCU_DONE_TAIL]);
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -381,8 +381,6 @@ void rcu_segcblist_insert_pend_cbs(struct rcu_segcblist *rsclp,
|
||||||
return; /* Nothing to do. */
|
return; /* Nothing to do. */
|
||||||
WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rclp->head);
|
WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rclp->head);
|
||||||
WRITE_ONCE(rsclp->tails[RCU_NEXT_TAIL], rclp->tail);
|
WRITE_ONCE(rsclp->tails[RCU_NEXT_TAIL], rclp->tail);
|
||||||
rclp->head = NULL;
|
|
||||||
rclp->tail = &rclp->head;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
|
|
@ -12,6 +12,7 @@
|
||||||
#include <linux/types.h>
|
#include <linux/types.h>
|
||||||
#include <linux/kernel.h>
|
#include <linux/kernel.h>
|
||||||
#include <linux/init.h>
|
#include <linux/init.h>
|
||||||
|
#include <linux/mm.h>
|
||||||
#include <linux/module.h>
|
#include <linux/module.h>
|
||||||
#include <linux/kthread.h>
|
#include <linux/kthread.h>
|
||||||
#include <linux/err.h>
|
#include <linux/err.h>
|
||||||
|
@ -611,6 +612,7 @@ kfree_perf_thread(void *arg)
|
||||||
long me = (long)arg;
|
long me = (long)arg;
|
||||||
struct kfree_obj *alloc_ptr;
|
struct kfree_obj *alloc_ptr;
|
||||||
u64 start_time, end_time;
|
u64 start_time, end_time;
|
||||||
|
long long mem_begin, mem_during = 0;
|
||||||
|
|
||||||
VERBOSE_PERFOUT_STRING("kfree_perf_thread task started");
|
VERBOSE_PERFOUT_STRING("kfree_perf_thread task started");
|
||||||
set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
|
set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
|
||||||
|
@ -626,6 +628,12 @@ kfree_perf_thread(void *arg)
|
||||||
}
|
}
|
||||||
|
|
||||||
do {
|
do {
|
||||||
|
if (!mem_during) {
|
||||||
|
mem_during = mem_begin = si_mem_available();
|
||||||
|
} else if (loop % (kfree_loops / 4) == 0) {
|
||||||
|
mem_during = (mem_during + si_mem_available()) / 2;
|
||||||
|
}
|
||||||
|
|
||||||
for (i = 0; i < kfree_alloc_num; i++) {
|
for (i = 0; i < kfree_alloc_num; i++) {
|
||||||
alloc_ptr = kmalloc(sizeof(struct kfree_obj), GFP_KERNEL);
|
alloc_ptr = kmalloc(sizeof(struct kfree_obj), GFP_KERNEL);
|
||||||
if (!alloc_ptr)
|
if (!alloc_ptr)
|
||||||
|
@ -645,9 +653,11 @@ kfree_perf_thread(void *arg)
|
||||||
else
|
else
|
||||||
b_rcu_gp_test_finished = cur_ops->get_gp_seq();
|
b_rcu_gp_test_finished = cur_ops->get_gp_seq();
|
||||||
|
|
||||||
pr_alert("Total time taken by all kfree'ers: %llu ns, loops: %d, batches: %ld\n",
|
pr_alert("Total time taken by all kfree'ers: %llu ns, loops: %d, batches: %ld, memory footprint: %lldMB\n",
|
||||||
(unsigned long long)(end_time - start_time), kfree_loops,
|
(unsigned long long)(end_time - start_time), kfree_loops,
|
||||||
rcuperf_seq_diff(b_rcu_gp_test_finished, b_rcu_gp_test_started));
|
rcuperf_seq_diff(b_rcu_gp_test_finished, b_rcu_gp_test_started),
|
||||||
|
(mem_begin - mem_during) >> (20 - PAGE_SHIFT));
|
||||||
|
|
||||||
if (shutdown) {
|
if (shutdown) {
|
||||||
smp_mb(); /* Assign before wake. */
|
smp_mb(); /* Assign before wake. */
|
||||||
wake_up(&shutdown_wq);
|
wake_up(&shutdown_wq);
|
||||||
|
|
|
@ -339,7 +339,7 @@ rcu_read_delay(struct torture_random_state *rrsp, struct rt_read_seg *rtrsp)
|
||||||
* period, and we want a long delay occasionally to trigger
|
* period, and we want a long delay occasionally to trigger
|
||||||
* force_quiescent_state. */
|
* force_quiescent_state. */
|
||||||
|
|
||||||
if (!rcu_fwd_cb_nodelay &&
|
if (!READ_ONCE(rcu_fwd_cb_nodelay) &&
|
||||||
!(torture_random(rrsp) % (nrealreaders * 2000 * longdelay_ms))) {
|
!(torture_random(rrsp) % (nrealreaders * 2000 * longdelay_ms))) {
|
||||||
started = cur_ops->get_gp_seq();
|
started = cur_ops->get_gp_seq();
|
||||||
ts = rcu_trace_clock_local();
|
ts = rcu_trace_clock_local();
|
||||||
|
@ -375,11 +375,12 @@ rcu_torture_pipe_update_one(struct rcu_torture *rp)
|
||||||
{
|
{
|
||||||
int i;
|
int i;
|
||||||
|
|
||||||
i = rp->rtort_pipe_count;
|
i = READ_ONCE(rp->rtort_pipe_count);
|
||||||
if (i > RCU_TORTURE_PIPE_LEN)
|
if (i > RCU_TORTURE_PIPE_LEN)
|
||||||
i = RCU_TORTURE_PIPE_LEN;
|
i = RCU_TORTURE_PIPE_LEN;
|
||||||
atomic_inc(&rcu_torture_wcount[i]);
|
atomic_inc(&rcu_torture_wcount[i]);
|
||||||
if (++rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
|
WRITE_ONCE(rp->rtort_pipe_count, i + 1);
|
||||||
|
if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
|
||||||
rp->rtort_mbtest = 0;
|
rp->rtort_mbtest = 0;
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
@ -1015,7 +1016,8 @@ rcu_torture_writer(void *arg)
|
||||||
if (i > RCU_TORTURE_PIPE_LEN)
|
if (i > RCU_TORTURE_PIPE_LEN)
|
||||||
i = RCU_TORTURE_PIPE_LEN;
|
i = RCU_TORTURE_PIPE_LEN;
|
||||||
atomic_inc(&rcu_torture_wcount[i]);
|
atomic_inc(&rcu_torture_wcount[i]);
|
||||||
old_rp->rtort_pipe_count++;
|
WRITE_ONCE(old_rp->rtort_pipe_count,
|
||||||
|
old_rp->rtort_pipe_count + 1);
|
||||||
switch (synctype[torture_random(&rand) % nsynctypes]) {
|
switch (synctype[torture_random(&rand) % nsynctypes]) {
|
||||||
case RTWS_DEF_FREE:
|
case RTWS_DEF_FREE:
|
||||||
rcu_torture_writer_state = RTWS_DEF_FREE;
|
rcu_torture_writer_state = RTWS_DEF_FREE;
|
||||||
|
@ -1067,7 +1069,8 @@ rcu_torture_writer(void *arg)
|
||||||
if (stutter_wait("rcu_torture_writer") &&
|
if (stutter_wait("rcu_torture_writer") &&
|
||||||
!READ_ONCE(rcu_fwd_cb_nodelay) &&
|
!READ_ONCE(rcu_fwd_cb_nodelay) &&
|
||||||
!cur_ops->slow_gps &&
|
!cur_ops->slow_gps &&
|
||||||
!torture_must_stop())
|
!torture_must_stop() &&
|
||||||
|
rcu_inkernel_boot_has_ended())
|
||||||
for (i = 0; i < ARRAY_SIZE(rcu_tortures); i++)
|
for (i = 0; i < ARRAY_SIZE(rcu_tortures); i++)
|
||||||
if (list_empty(&rcu_tortures[i].rtort_free) &&
|
if (list_empty(&rcu_tortures[i].rtort_free) &&
|
||||||
rcu_access_pointer(rcu_torture_current) !=
|
rcu_access_pointer(rcu_torture_current) !=
|
||||||
|
@ -1290,7 +1293,7 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp)
|
||||||
atomic_inc(&n_rcu_torture_mberror);
|
atomic_inc(&n_rcu_torture_mberror);
|
||||||
rtrsp = rcutorture_loop_extend(&readstate, trsp, rtrsp);
|
rtrsp = rcutorture_loop_extend(&readstate, trsp, rtrsp);
|
||||||
preempt_disable();
|
preempt_disable();
|
||||||
pipe_count = p->rtort_pipe_count;
|
pipe_count = READ_ONCE(p->rtort_pipe_count);
|
||||||
if (pipe_count > RCU_TORTURE_PIPE_LEN) {
|
if (pipe_count > RCU_TORTURE_PIPE_LEN) {
|
||||||
/* Should not happen, but... */
|
/* Should not happen, but... */
|
||||||
pipe_count = RCU_TORTURE_PIPE_LEN;
|
pipe_count = RCU_TORTURE_PIPE_LEN;
|
||||||
|
@ -1404,14 +1407,15 @@ rcu_torture_stats_print(void)
|
||||||
int i;
|
int i;
|
||||||
long pipesummary[RCU_TORTURE_PIPE_LEN + 1] = { 0 };
|
long pipesummary[RCU_TORTURE_PIPE_LEN + 1] = { 0 };
|
||||||
long batchsummary[RCU_TORTURE_PIPE_LEN + 1] = { 0 };
|
long batchsummary[RCU_TORTURE_PIPE_LEN + 1] = { 0 };
|
||||||
|
struct rcu_torture *rtcp;
|
||||||
static unsigned long rtcv_snap = ULONG_MAX;
|
static unsigned long rtcv_snap = ULONG_MAX;
|
||||||
static bool splatted;
|
static bool splatted;
|
||||||
struct task_struct *wtp;
|
struct task_struct *wtp;
|
||||||
|
|
||||||
for_each_possible_cpu(cpu) {
|
for_each_possible_cpu(cpu) {
|
||||||
for (i = 0; i < RCU_TORTURE_PIPE_LEN + 1; i++) {
|
for (i = 0; i < RCU_TORTURE_PIPE_LEN + 1; i++) {
|
||||||
pipesummary[i] += per_cpu(rcu_torture_count, cpu)[i];
|
pipesummary[i] += READ_ONCE(per_cpu(rcu_torture_count, cpu)[i]);
|
||||||
batchsummary[i] += per_cpu(rcu_torture_batch, cpu)[i];
|
batchsummary[i] += READ_ONCE(per_cpu(rcu_torture_batch, cpu)[i]);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
for (i = RCU_TORTURE_PIPE_LEN - 1; i >= 0; i--) {
|
for (i = RCU_TORTURE_PIPE_LEN - 1; i >= 0; i--) {
|
||||||
|
@ -1420,9 +1424,10 @@ rcu_torture_stats_print(void)
|
||||||
}
|
}
|
||||||
|
|
||||||
pr_alert("%s%s ", torture_type, TORTURE_FLAG);
|
pr_alert("%s%s ", torture_type, TORTURE_FLAG);
|
||||||
|
rtcp = rcu_access_pointer(rcu_torture_current);
|
||||||
pr_cont("rtc: %p %s: %lu tfle: %d rta: %d rtaf: %d rtf: %d ",
|
pr_cont("rtc: %p %s: %lu tfle: %d rta: %d rtaf: %d rtf: %d ",
|
||||||
rcu_torture_current,
|
rtcp,
|
||||||
rcu_torture_current ? "ver" : "VER",
|
rtcp && !rcu_stall_is_suppressed_at_boot() ? "ver" : "VER",
|
||||||
rcu_torture_current_version,
|
rcu_torture_current_version,
|
||||||
list_empty(&rcu_torture_freelist),
|
list_empty(&rcu_torture_freelist),
|
||||||
atomic_read(&n_rcu_torture_alloc),
|
atomic_read(&n_rcu_torture_alloc),
|
||||||
|
@ -1478,7 +1483,8 @@ rcu_torture_stats_print(void)
|
||||||
if (cur_ops->stats)
|
if (cur_ops->stats)
|
||||||
cur_ops->stats();
|
cur_ops->stats();
|
||||||
if (rtcv_snap == rcu_torture_current_version &&
|
if (rtcv_snap == rcu_torture_current_version &&
|
||||||
rcu_torture_current != NULL) {
|
rcu_access_pointer(rcu_torture_current) &&
|
||||||
|
!rcu_stall_is_suppressed()) {
|
||||||
int __maybe_unused flags = 0;
|
int __maybe_unused flags = 0;
|
||||||
unsigned long __maybe_unused gp_seq = 0;
|
unsigned long __maybe_unused gp_seq = 0;
|
||||||
|
|
||||||
|
@ -1993,7 +1999,10 @@ static int rcu_torture_fwd_prog(void *args)
|
||||||
schedule_timeout_interruptible(fwd_progress_holdoff * HZ);
|
schedule_timeout_interruptible(fwd_progress_holdoff * HZ);
|
||||||
WRITE_ONCE(rcu_fwd_emergency_stop, false);
|
WRITE_ONCE(rcu_fwd_emergency_stop, false);
|
||||||
register_oom_notifier(&rcutorture_oom_nb);
|
register_oom_notifier(&rcutorture_oom_nb);
|
||||||
|
if (!IS_ENABLED(CONFIG_TINY_RCU) ||
|
||||||
|
rcu_inkernel_boot_has_ended())
|
||||||
rcu_torture_fwd_prog_nr(rfp, &tested, &tested_tries);
|
rcu_torture_fwd_prog_nr(rfp, &tested, &tested_tries);
|
||||||
|
if (rcu_inkernel_boot_has_ended())
|
||||||
rcu_torture_fwd_prog_cr(rfp);
|
rcu_torture_fwd_prog_cr(rfp);
|
||||||
unregister_oom_notifier(&rcutorture_oom_nb);
|
unregister_oom_notifier(&rcutorture_oom_nb);
|
||||||
|
|
||||||
|
@ -2044,6 +2053,14 @@ static void rcu_torture_barrier_cbf(struct rcu_head *rcu)
|
||||||
atomic_inc(&barrier_cbs_invoked);
|
atomic_inc(&barrier_cbs_invoked);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* IPI handler to get callback posted on desired CPU, if online. */
|
||||||
|
static void rcu_torture_barrier1cb(void *rcu_void)
|
||||||
|
{
|
||||||
|
struct rcu_head *rhp = rcu_void;
|
||||||
|
|
||||||
|
cur_ops->call(rhp, rcu_torture_barrier_cbf);
|
||||||
|
}
|
||||||
|
|
||||||
/* kthread function to register callbacks used to test RCU barriers. */
|
/* kthread function to register callbacks used to test RCU barriers. */
|
||||||
static int rcu_torture_barrier_cbs(void *arg)
|
static int rcu_torture_barrier_cbs(void *arg)
|
||||||
{
|
{
|
||||||
|
@ -2067,9 +2084,11 @@ static int rcu_torture_barrier_cbs(void *arg)
|
||||||
* The above smp_load_acquire() ensures barrier_phase load
|
* The above smp_load_acquire() ensures barrier_phase load
|
||||||
* is ordered before the following ->call().
|
* is ordered before the following ->call().
|
||||||
*/
|
*/
|
||||||
local_irq_disable(); /* Just to test no-irq call_rcu(). */
|
if (smp_call_function_single(myid, rcu_torture_barrier1cb,
|
||||||
|
&rcu, 1)) {
|
||||||
|
// IPI failed, so use direct call from current CPU.
|
||||||
cur_ops->call(&rcu, rcu_torture_barrier_cbf);
|
cur_ops->call(&rcu, rcu_torture_barrier_cbf);
|
||||||
local_irq_enable();
|
}
|
||||||
if (atomic_dec_and_test(&barrier_cbs_count))
|
if (atomic_dec_and_test(&barrier_cbs_count))
|
||||||
wake_up(&barrier_wq);
|
wake_up(&barrier_wq);
|
||||||
} while (!torture_must_stop());
|
} while (!torture_must_stop());
|
||||||
|
@ -2105,7 +2124,21 @@ static int rcu_torture_barrier(void *arg)
|
||||||
pr_err("barrier_cbs_invoked = %d, n_barrier_cbs = %d\n",
|
pr_err("barrier_cbs_invoked = %d, n_barrier_cbs = %d\n",
|
||||||
atomic_read(&barrier_cbs_invoked),
|
atomic_read(&barrier_cbs_invoked),
|
||||||
n_barrier_cbs);
|
n_barrier_cbs);
|
||||||
WARN_ON_ONCE(1);
|
WARN_ON(1);
|
||||||
|
// Wait manually for the remaining callbacks
|
||||||
|
i = 0;
|
||||||
|
do {
|
||||||
|
if (WARN_ON(i++ > HZ))
|
||||||
|
i = INT_MIN;
|
||||||
|
schedule_timeout_interruptible(1);
|
||||||
|
cur_ops->cb_barrier();
|
||||||
|
} while (atomic_read(&barrier_cbs_invoked) !=
|
||||||
|
n_barrier_cbs &&
|
||||||
|
!torture_must_stop());
|
||||||
|
smp_mb(); // Can't trust ordering if broken.
|
||||||
|
if (!torture_must_stop())
|
||||||
|
pr_err("Recovered: barrier_cbs_invoked = %d\n",
|
||||||
|
atomic_read(&barrier_cbs_invoked));
|
||||||
} else {
|
} else {
|
||||||
n_barrier_successes++;
|
n_barrier_successes++;
|
||||||
}
|
}
|
||||||
|
|
|
@ -5,7 +5,7 @@
|
||||||
* Copyright (C) IBM Corporation, 2006
|
* Copyright (C) IBM Corporation, 2006
|
||||||
* Copyright (C) Fujitsu, 2012
|
* Copyright (C) Fujitsu, 2012
|
||||||
*
|
*
|
||||||
* Author: Paul McKenney <paulmck@linux.ibm.com>
|
* Authors: Paul McKenney <paulmck@linux.ibm.com>
|
||||||
* Lai Jiangshan <laijs@cn.fujitsu.com>
|
* Lai Jiangshan <laijs@cn.fujitsu.com>
|
||||||
*
|
*
|
||||||
* For detailed explanation of Read-Copy Update mechanism see -
|
* For detailed explanation of Read-Copy Update mechanism see -
|
||||||
|
@ -450,7 +450,7 @@ static void srcu_gp_start(struct srcu_struct *ssp)
|
||||||
spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */
|
spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */
|
||||||
smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */
|
smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */
|
||||||
rcu_seq_start(&ssp->srcu_gp_seq);
|
rcu_seq_start(&ssp->srcu_gp_seq);
|
||||||
state = rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq));
|
state = rcu_seq_state(ssp->srcu_gp_seq);
|
||||||
WARN_ON_ONCE(state != SRCU_STATE_SCAN1);
|
WARN_ON_ONCE(state != SRCU_STATE_SCAN1);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -534,7 +534,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
|
||||||
rcu_seq_end(&ssp->srcu_gp_seq);
|
rcu_seq_end(&ssp->srcu_gp_seq);
|
||||||
gpseq = rcu_seq_current(&ssp->srcu_gp_seq);
|
gpseq = rcu_seq_current(&ssp->srcu_gp_seq);
|
||||||
if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, gpseq))
|
if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, gpseq))
|
||||||
ssp->srcu_gp_seq_needed_exp = gpseq;
|
WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, gpseq);
|
||||||
spin_unlock_irq_rcu_node(ssp);
|
spin_unlock_irq_rcu_node(ssp);
|
||||||
mutex_unlock(&ssp->srcu_gp_mutex);
|
mutex_unlock(&ssp->srcu_gp_mutex);
|
||||||
/* A new grace period can start at this point. But only one. */
|
/* A new grace period can start at this point. But only one. */
|
||||||
|
@ -550,7 +550,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
|
||||||
snp->srcu_have_cbs[idx] = gpseq;
|
snp->srcu_have_cbs[idx] = gpseq;
|
||||||
rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
|
rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
|
||||||
if (ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, gpseq))
|
if (ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, gpseq))
|
||||||
snp->srcu_gp_seq_needed_exp = gpseq;
|
WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
|
||||||
mask = snp->srcu_data_have_cbs[idx];
|
mask = snp->srcu_data_have_cbs[idx];
|
||||||
snp->srcu_data_have_cbs[idx] = 0;
|
snp->srcu_data_have_cbs[idx] = 0;
|
||||||
spin_unlock_irq_rcu_node(snp);
|
spin_unlock_irq_rcu_node(snp);
|
||||||
|
@ -614,7 +614,7 @@ static void srcu_funnel_exp_start(struct srcu_struct *ssp, struct srcu_node *snp
|
||||||
}
|
}
|
||||||
spin_lock_irqsave_rcu_node(ssp, flags);
|
spin_lock_irqsave_rcu_node(ssp, flags);
|
||||||
if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, s))
|
if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, s))
|
||||||
ssp->srcu_gp_seq_needed_exp = s;
|
WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, s);
|
||||||
spin_unlock_irqrestore_rcu_node(ssp, flags);
|
spin_unlock_irqrestore_rcu_node(ssp, flags);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -660,7 +660,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
|
||||||
if (snp == sdp->mynode)
|
if (snp == sdp->mynode)
|
||||||
snp->srcu_data_have_cbs[idx] |= sdp->grpmask;
|
snp->srcu_data_have_cbs[idx] |= sdp->grpmask;
|
||||||
if (!do_norm && ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, s))
|
if (!do_norm && ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, s))
|
||||||
snp->srcu_gp_seq_needed_exp = s;
|
WRITE_ONCE(snp->srcu_gp_seq_needed_exp, s);
|
||||||
spin_unlock_irqrestore_rcu_node(snp, flags);
|
spin_unlock_irqrestore_rcu_node(snp, flags);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -674,7 +674,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
|
||||||
smp_store_release(&ssp->srcu_gp_seq_needed, s); /*^^^*/
|
smp_store_release(&ssp->srcu_gp_seq_needed, s); /*^^^*/
|
||||||
}
|
}
|
||||||
if (!do_norm && ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, s))
|
if (!do_norm && ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, s))
|
||||||
ssp->srcu_gp_seq_needed_exp = s;
|
WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, s);
|
||||||
|
|
||||||
/* If grace period not already done and none in progress, start it. */
|
/* If grace period not already done and none in progress, start it. */
|
||||||
if (!rcu_seq_done(&ssp->srcu_gp_seq, s) &&
|
if (!rcu_seq_done(&ssp->srcu_gp_seq, s) &&
|
||||||
|
@ -1079,7 +1079,7 @@ EXPORT_SYMBOL_GPL(srcu_barrier);
|
||||||
*/
|
*/
|
||||||
unsigned long srcu_batches_completed(struct srcu_struct *ssp)
|
unsigned long srcu_batches_completed(struct srcu_struct *ssp)
|
||||||
{
|
{
|
||||||
return ssp->srcu_idx;
|
return READ_ONCE(ssp->srcu_idx);
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(srcu_batches_completed);
|
EXPORT_SYMBOL_GPL(srcu_batches_completed);
|
||||||
|
|
||||||
|
@ -1130,7 +1130,9 @@ static void srcu_advance_state(struct srcu_struct *ssp)
|
||||||
return; /* readers present, retry later. */
|
return; /* readers present, retry later. */
|
||||||
}
|
}
|
||||||
srcu_flip(ssp);
|
srcu_flip(ssp);
|
||||||
|
spin_lock_irq_rcu_node(ssp);
|
||||||
rcu_seq_set_state(&ssp->srcu_gp_seq, SRCU_STATE_SCAN2);
|
rcu_seq_set_state(&ssp->srcu_gp_seq, SRCU_STATE_SCAN2);
|
||||||
|
spin_unlock_irq_rcu_node(ssp);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) == SRCU_STATE_SCAN2) {
|
if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) == SRCU_STATE_SCAN2) {
|
||||||
|
|
|
@ -1,12 +1,12 @@
|
||||||
// SPDX-License-Identifier: GPL-2.0+
|
// SPDX-License-Identifier: GPL-2.0+
|
||||||
/*
|
/*
|
||||||
* Read-Copy Update mechanism for mutual exclusion
|
* Read-Copy Update mechanism for mutual exclusion (tree-based version)
|
||||||
*
|
*
|
||||||
* Copyright IBM Corporation, 2008
|
* Copyright IBM Corporation, 2008
|
||||||
*
|
*
|
||||||
* Authors: Dipankar Sarma <dipankar@in.ibm.com>
|
* Authors: Dipankar Sarma <dipankar@in.ibm.com>
|
||||||
* Manfred Spraul <manfred@colorfullife.com>
|
* Manfred Spraul <manfred@colorfullife.com>
|
||||||
* Paul E. McKenney <paulmck@linux.ibm.com> Hierarchical version
|
* Paul E. McKenney <paulmck@linux.ibm.com>
|
||||||
*
|
*
|
||||||
* Based on the original work by Paul McKenney <paulmck@linux.ibm.com>
|
* Based on the original work by Paul McKenney <paulmck@linux.ibm.com>
|
||||||
* and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
|
* and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
|
||||||
|
@ -150,6 +150,7 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
|
||||||
static void invoke_rcu_core(void);
|
static void invoke_rcu_core(void);
|
||||||
static void rcu_report_exp_rdp(struct rcu_data *rdp);
|
static void rcu_report_exp_rdp(struct rcu_data *rdp);
|
||||||
static void sync_sched_exp_online_cleanup(int cpu);
|
static void sync_sched_exp_online_cleanup(int cpu);
|
||||||
|
static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
|
||||||
|
|
||||||
/* rcuc/rcub kthread realtime priority */
|
/* rcuc/rcub kthread realtime priority */
|
||||||
static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
|
static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
|
||||||
|
@ -342,14 +343,17 @@ bool rcu_eqs_special_set(int cpu)
|
||||||
{
|
{
|
||||||
int old;
|
int old;
|
||||||
int new;
|
int new;
|
||||||
|
int new_old;
|
||||||
struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
|
struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
|
||||||
|
|
||||||
|
new_old = atomic_read(&rdp->dynticks);
|
||||||
do {
|
do {
|
||||||
old = atomic_read(&rdp->dynticks);
|
old = new_old;
|
||||||
if (old & RCU_DYNTICK_CTRL_CTR)
|
if (old & RCU_DYNTICK_CTRL_CTR)
|
||||||
return false;
|
return false;
|
||||||
new = old | RCU_DYNTICK_CTRL_MASK;
|
new = old | RCU_DYNTICK_CTRL_MASK;
|
||||||
} while (atomic_cmpxchg(&rdp->dynticks, old, new) != old);
|
new_old = atomic_cmpxchg(&rdp->dynticks, old, new);
|
||||||
|
} while (new_old != old);
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -410,10 +414,15 @@ static long blimit = DEFAULT_RCU_BLIMIT;
|
||||||
static long qhimark = DEFAULT_RCU_QHIMARK;
|
static long qhimark = DEFAULT_RCU_QHIMARK;
|
||||||
#define DEFAULT_RCU_QLOMARK 100 /* Once only this many pending, use blimit. */
|
#define DEFAULT_RCU_QLOMARK 100 /* Once only this many pending, use blimit. */
|
||||||
static long qlowmark = DEFAULT_RCU_QLOMARK;
|
static long qlowmark = DEFAULT_RCU_QLOMARK;
|
||||||
|
#define DEFAULT_RCU_QOVLD_MULT 2
|
||||||
|
#define DEFAULT_RCU_QOVLD (DEFAULT_RCU_QOVLD_MULT * DEFAULT_RCU_QHIMARK)
|
||||||
|
static long qovld = DEFAULT_RCU_QOVLD; /* If this many pending, hammer QS. */
|
||||||
|
static long qovld_calc = -1; /* No pre-initialization lock acquisitions! */
|
||||||
|
|
||||||
module_param(blimit, long, 0444);
|
module_param(blimit, long, 0444);
|
||||||
module_param(qhimark, long, 0444);
|
module_param(qhimark, long, 0444);
|
||||||
module_param(qlowmark, long, 0444);
|
module_param(qlowmark, long, 0444);
|
||||||
|
module_param(qovld, long, 0444);
|
||||||
|
|
||||||
static ulong jiffies_till_first_fqs = ULONG_MAX;
|
static ulong jiffies_till_first_fqs = ULONG_MAX;
|
||||||
static ulong jiffies_till_next_fqs = ULONG_MAX;
|
static ulong jiffies_till_next_fqs = ULONG_MAX;
|
||||||
|
@ -818,11 +827,12 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
|
||||||
incby = 1;
|
incby = 1;
|
||||||
} else if (tick_nohz_full_cpu(rdp->cpu) &&
|
} else if (tick_nohz_full_cpu(rdp->cpu) &&
|
||||||
rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE &&
|
rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE &&
|
||||||
READ_ONCE(rdp->rcu_urgent_qs) && !rdp->rcu_forced_tick) {
|
READ_ONCE(rdp->rcu_urgent_qs) &&
|
||||||
|
!READ_ONCE(rdp->rcu_forced_tick)) {
|
||||||
raw_spin_lock_rcu_node(rdp->mynode);
|
raw_spin_lock_rcu_node(rdp->mynode);
|
||||||
// Recheck under lock.
|
// Recheck under lock.
|
||||||
if (rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
|
if (rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
|
||||||
rdp->rcu_forced_tick = true;
|
WRITE_ONCE(rdp->rcu_forced_tick, true);
|
||||||
tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
|
tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
|
||||||
}
|
}
|
||||||
raw_spin_unlock_rcu_node(rdp->mynode);
|
raw_spin_unlock_rcu_node(rdp->mynode);
|
||||||
|
@ -899,7 +909,7 @@ static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp)
|
||||||
WRITE_ONCE(rdp->rcu_need_heavy_qs, false);
|
WRITE_ONCE(rdp->rcu_need_heavy_qs, false);
|
||||||
if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_forced_tick) {
|
if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_forced_tick) {
|
||||||
tick_dep_clear_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
|
tick_dep_clear_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
|
||||||
rdp->rcu_forced_tick = false;
|
WRITE_ONCE(rdp->rcu_forced_tick, false);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -1072,7 +1082,8 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
|
||||||
rnhqp = &per_cpu(rcu_data.rcu_need_heavy_qs, rdp->cpu);
|
rnhqp = &per_cpu(rcu_data.rcu_need_heavy_qs, rdp->cpu);
|
||||||
if (!READ_ONCE(*rnhqp) &&
|
if (!READ_ONCE(*rnhqp) &&
|
||||||
(time_after(jiffies, rcu_state.gp_start + jtsq * 2) ||
|
(time_after(jiffies, rcu_state.gp_start + jtsq * 2) ||
|
||||||
time_after(jiffies, rcu_state.jiffies_resched))) {
|
time_after(jiffies, rcu_state.jiffies_resched) ||
|
||||||
|
rcu_state.cbovld)) {
|
||||||
WRITE_ONCE(*rnhqp, true);
|
WRITE_ONCE(*rnhqp, true);
|
||||||
/* Store rcu_need_heavy_qs before rcu_urgent_qs. */
|
/* Store rcu_need_heavy_qs before rcu_urgent_qs. */
|
||||||
smp_store_release(ruqp, true);
|
smp_store_release(ruqp, true);
|
||||||
|
@ -1089,8 +1100,8 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
|
||||||
* So hit them over the head with the resched_cpu() hammer!
|
* So hit them over the head with the resched_cpu() hammer!
|
||||||
*/
|
*/
|
||||||
if (tick_nohz_full_cpu(rdp->cpu) &&
|
if (tick_nohz_full_cpu(rdp->cpu) &&
|
||||||
time_after(jiffies,
|
(time_after(jiffies, READ_ONCE(rdp->last_fqs_resched) + jtsq * 3) ||
|
||||||
READ_ONCE(rdp->last_fqs_resched) + jtsq * 3)) {
|
rcu_state.cbovld)) {
|
||||||
WRITE_ONCE(*ruqp, true);
|
WRITE_ONCE(*ruqp, true);
|
||||||
resched_cpu(rdp->cpu);
|
resched_cpu(rdp->cpu);
|
||||||
WRITE_ONCE(rdp->last_fqs_resched, jiffies);
|
WRITE_ONCE(rdp->last_fqs_resched, jiffies);
|
||||||
|
@ -1126,8 +1137,9 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
|
||||||
static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
|
static void trace_rcu_this_gp(struct rcu_node *rnp, struct rcu_data *rdp,
|
||||||
unsigned long gp_seq_req, const char *s)
|
unsigned long gp_seq_req, const char *s)
|
||||||
{
|
{
|
||||||
trace_rcu_future_grace_period(rcu_state.name, rnp->gp_seq, gp_seq_req,
|
trace_rcu_future_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
|
||||||
rnp->level, rnp->grplo, rnp->grphi, s);
|
gp_seq_req, rnp->level,
|
||||||
|
rnp->grplo, rnp->grphi, s);
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -1174,7 +1186,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
|
||||||
TPS("Prestarted"));
|
TPS("Prestarted"));
|
||||||
goto unlock_out;
|
goto unlock_out;
|
||||||
}
|
}
|
||||||
rnp->gp_seq_needed = gp_seq_req;
|
WRITE_ONCE(rnp->gp_seq_needed, gp_seq_req);
|
||||||
if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) {
|
if (rcu_seq_state(rcu_seq_current(&rnp->gp_seq))) {
|
||||||
/*
|
/*
|
||||||
* We just marked the leaf or internal node, and a
|
* We just marked the leaf or internal node, and a
|
||||||
|
@ -1199,18 +1211,18 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
|
||||||
}
|
}
|
||||||
trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("Startedroot"));
|
trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("Startedroot"));
|
||||||
WRITE_ONCE(rcu_state.gp_flags, rcu_state.gp_flags | RCU_GP_FLAG_INIT);
|
WRITE_ONCE(rcu_state.gp_flags, rcu_state.gp_flags | RCU_GP_FLAG_INIT);
|
||||||
rcu_state.gp_req_activity = jiffies;
|
WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
|
||||||
if (!rcu_state.gp_kthread) {
|
if (!READ_ONCE(rcu_state.gp_kthread)) {
|
||||||
trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
|
trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
|
||||||
goto unlock_out;
|
goto unlock_out;
|
||||||
}
|
}
|
||||||
trace_rcu_grace_period(rcu_state.name, READ_ONCE(rcu_state.gp_seq), TPS("newreq"));
|
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("newreq"));
|
||||||
ret = true; /* Caller must wake GP kthread. */
|
ret = true; /* Caller must wake GP kthread. */
|
||||||
unlock_out:
|
unlock_out:
|
||||||
/* Push furthest requested GP to leaf node and rcu_data structure. */
|
/* Push furthest requested GP to leaf node and rcu_data structure. */
|
||||||
if (ULONG_CMP_LT(gp_seq_req, rnp->gp_seq_needed)) {
|
if (ULONG_CMP_LT(gp_seq_req, rnp->gp_seq_needed)) {
|
||||||
rnp_start->gp_seq_needed = rnp->gp_seq_needed;
|
WRITE_ONCE(rnp_start->gp_seq_needed, rnp->gp_seq_needed);
|
||||||
rdp->gp_seq_needed = rnp->gp_seq_needed;
|
WRITE_ONCE(rdp->gp_seq_needed, rnp->gp_seq_needed);
|
||||||
}
|
}
|
||||||
if (rnp != rnp_start)
|
if (rnp != rnp_start)
|
||||||
raw_spin_unlock_rcu_node(rnp);
|
raw_spin_unlock_rcu_node(rnp);
|
||||||
|
@ -1235,12 +1247,13 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Awaken the grace-period kthread. Don't do a self-awaken (unless in
|
* Awaken the grace-period kthread. Don't do a self-awaken (unless in an
|
||||||
* an interrupt or softirq handler), and don't bother awakening when there
|
* interrupt or softirq handler, in which case we just might immediately
|
||||||
* is nothing for the grace-period kthread to do (as in several CPUs raced
|
* sleep upon return, resulting in a grace-period hang), and don't bother
|
||||||
* to awaken, and we lost), and finally don't try to awaken a kthread that
|
* awakening when there is nothing for the grace-period kthread to do
|
||||||
* has not yet been created. If all those checks are passed, track some
|
* (as in several CPUs raced to awaken, we lost), and finally don't try
|
||||||
* debug information and awaken.
|
* to awaken a kthread that has not yet been created. If all those checks
|
||||||
|
* are passed, track some debug information and awaken.
|
||||||
*
|
*
|
||||||
* So why do the self-wakeup when in an interrupt or softirq handler
|
* So why do the self-wakeup when in an interrupt or softirq handler
|
||||||
* in the grace-period kthread's context? Because the kthread might have
|
* in the grace-period kthread's context? Because the kthread might have
|
||||||
|
@ -1250,10 +1263,10 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
|
||||||
*/
|
*/
|
||||||
static void rcu_gp_kthread_wake(void)
|
static void rcu_gp_kthread_wake(void)
|
||||||
{
|
{
|
||||||
if ((current == rcu_state.gp_kthread &&
|
struct task_struct *t = READ_ONCE(rcu_state.gp_kthread);
|
||||||
!in_irq() && !in_serving_softirq()) ||
|
|
||||||
!READ_ONCE(rcu_state.gp_flags) ||
|
if ((current == t && !in_irq() && !in_serving_softirq()) ||
|
||||||
!rcu_state.gp_kthread)
|
!READ_ONCE(rcu_state.gp_flags) || !t)
|
||||||
return;
|
return;
|
||||||
WRITE_ONCE(rcu_state.gp_wake_time, jiffies);
|
WRITE_ONCE(rcu_state.gp_wake_time, jiffies);
|
||||||
WRITE_ONCE(rcu_state.gp_wake_seq, READ_ONCE(rcu_state.gp_seq));
|
WRITE_ONCE(rcu_state.gp_wake_seq, READ_ONCE(rcu_state.gp_seq));
|
||||||
|
@ -1321,7 +1334,7 @@ static void rcu_accelerate_cbs_unlocked(struct rcu_node *rnp,
|
||||||
|
|
||||||
rcu_lockdep_assert_cblist_protected(rdp);
|
rcu_lockdep_assert_cblist_protected(rdp);
|
||||||
c = rcu_seq_snap(&rcu_state.gp_seq);
|
c = rcu_seq_snap(&rcu_state.gp_seq);
|
||||||
if (!rdp->gpwrap && ULONG_CMP_GE(rdp->gp_seq_needed, c)) {
|
if (!READ_ONCE(rdp->gpwrap) && ULONG_CMP_GE(rdp->gp_seq_needed, c)) {
|
||||||
/* Old request still live, so mark recent callbacks. */
|
/* Old request still live, so mark recent callbacks. */
|
||||||
(void)rcu_segcblist_accelerate(&rdp->cblist, c);
|
(void)rcu_segcblist_accelerate(&rdp->cblist, c);
|
||||||
return;
|
return;
|
||||||
|
@ -1386,7 +1399,7 @@ static void __maybe_unused rcu_advance_cbs_nowake(struct rcu_node *rnp,
|
||||||
static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
|
static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
|
||||||
{
|
{
|
||||||
bool ret = false;
|
bool ret = false;
|
||||||
bool need_gp;
|
bool need_qs;
|
||||||
const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
|
const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
|
||||||
rcu_segcblist_is_offloaded(&rdp->cblist);
|
rcu_segcblist_is_offloaded(&rdp->cblist);
|
||||||
|
|
||||||
|
@ -1400,10 +1413,13 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
|
||||||
unlikely(READ_ONCE(rdp->gpwrap))) {
|
unlikely(READ_ONCE(rdp->gpwrap))) {
|
||||||
if (!offloaded)
|
if (!offloaded)
|
||||||
ret = rcu_advance_cbs(rnp, rdp); /* Advance CBs. */
|
ret = rcu_advance_cbs(rnp, rdp); /* Advance CBs. */
|
||||||
|
rdp->core_needs_qs = false;
|
||||||
trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuend"));
|
trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuend"));
|
||||||
} else {
|
} else {
|
||||||
if (!offloaded)
|
if (!offloaded)
|
||||||
ret = rcu_accelerate_cbs(rnp, rdp); /* Recent CBs. */
|
ret = rcu_accelerate_cbs(rnp, rdp); /* Recent CBs. */
|
||||||
|
if (rdp->core_needs_qs)
|
||||||
|
rdp->core_needs_qs = !!(rnp->qsmask & rdp->grpmask);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Now handle the beginnings of any new-to-this-CPU grace periods. */
|
/* Now handle the beginnings of any new-to-this-CPU grace periods. */
|
||||||
|
@ -1415,14 +1431,14 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
|
||||||
* go looking for one.
|
* go looking for one.
|
||||||
*/
|
*/
|
||||||
trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
|
trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
|
||||||
need_gp = !!(rnp->qsmask & rdp->grpmask);
|
need_qs = !!(rnp->qsmask & rdp->grpmask);
|
||||||
rdp->cpu_no_qs.b.norm = need_gp;
|
rdp->cpu_no_qs.b.norm = need_qs;
|
||||||
rdp->core_needs_qs = need_gp;
|
rdp->core_needs_qs = need_qs;
|
||||||
zero_cpu_stall_ticks(rdp);
|
zero_cpu_stall_ticks(rdp);
|
||||||
}
|
}
|
||||||
rdp->gp_seq = rnp->gp_seq; /* Remember new grace-period state. */
|
rdp->gp_seq = rnp->gp_seq; /* Remember new grace-period state. */
|
||||||
if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || rdp->gpwrap)
|
if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || rdp->gpwrap)
|
||||||
rdp->gp_seq_needed = rnp->gp_seq_needed;
|
WRITE_ONCE(rdp->gp_seq_needed, rnp->gp_seq_needed);
|
||||||
WRITE_ONCE(rdp->gpwrap, false);
|
WRITE_ONCE(rdp->gpwrap, false);
|
||||||
rcu_gpnum_ovf(rnp, rdp);
|
rcu_gpnum_ovf(rnp, rdp);
|
||||||
return ret;
|
return ret;
|
||||||
|
@ -1651,8 +1667,7 @@ static void rcu_gp_fqs_loop(void)
|
||||||
WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
|
WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
|
||||||
jiffies + (j ? 3 * j : 2));
|
jiffies + (j ? 3 * j : 2));
|
||||||
}
|
}
|
||||||
trace_rcu_grace_period(rcu_state.name,
|
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
|
||||||
READ_ONCE(rcu_state.gp_seq),
|
|
||||||
TPS("fqswait"));
|
TPS("fqswait"));
|
||||||
rcu_state.gp_state = RCU_GP_WAIT_FQS;
|
rcu_state.gp_state = RCU_GP_WAIT_FQS;
|
||||||
ret = swait_event_idle_timeout_exclusive(
|
ret = swait_event_idle_timeout_exclusive(
|
||||||
|
@ -1666,13 +1681,11 @@ static void rcu_gp_fqs_loop(void)
|
||||||
/* If time for quiescent-state forcing, do it. */
|
/* If time for quiescent-state forcing, do it. */
|
||||||
if (ULONG_CMP_GE(jiffies, rcu_state.jiffies_force_qs) ||
|
if (ULONG_CMP_GE(jiffies, rcu_state.jiffies_force_qs) ||
|
||||||
(gf & RCU_GP_FLAG_FQS)) {
|
(gf & RCU_GP_FLAG_FQS)) {
|
||||||
trace_rcu_grace_period(rcu_state.name,
|
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
|
||||||
READ_ONCE(rcu_state.gp_seq),
|
|
||||||
TPS("fqsstart"));
|
TPS("fqsstart"));
|
||||||
rcu_gp_fqs(first_gp_fqs);
|
rcu_gp_fqs(first_gp_fqs);
|
||||||
first_gp_fqs = false;
|
first_gp_fqs = false;
|
||||||
trace_rcu_grace_period(rcu_state.name,
|
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
|
||||||
READ_ONCE(rcu_state.gp_seq),
|
|
||||||
TPS("fqsend"));
|
TPS("fqsend"));
|
||||||
cond_resched_tasks_rcu_qs();
|
cond_resched_tasks_rcu_qs();
|
||||||
WRITE_ONCE(rcu_state.gp_activity, jiffies);
|
WRITE_ONCE(rcu_state.gp_activity, jiffies);
|
||||||
|
@ -1683,8 +1696,7 @@ static void rcu_gp_fqs_loop(void)
|
||||||
cond_resched_tasks_rcu_qs();
|
cond_resched_tasks_rcu_qs();
|
||||||
WRITE_ONCE(rcu_state.gp_activity, jiffies);
|
WRITE_ONCE(rcu_state.gp_activity, jiffies);
|
||||||
WARN_ON(signal_pending(current));
|
WARN_ON(signal_pending(current));
|
||||||
trace_rcu_grace_period(rcu_state.name,
|
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
|
||||||
READ_ONCE(rcu_state.gp_seq),
|
|
||||||
TPS("fqswaitsig"));
|
TPS("fqswaitsig"));
|
||||||
ret = 1; /* Keep old FQS timing. */
|
ret = 1; /* Keep old FQS timing. */
|
||||||
j = jiffies;
|
j = jiffies;
|
||||||
|
@ -1701,8 +1713,9 @@ static void rcu_gp_fqs_loop(void)
|
||||||
*/
|
*/
|
||||||
static void rcu_gp_cleanup(void)
|
static void rcu_gp_cleanup(void)
|
||||||
{
|
{
|
||||||
unsigned long gp_duration;
|
int cpu;
|
||||||
bool needgp = false;
|
bool needgp = false;
|
||||||
|
unsigned long gp_duration;
|
||||||
unsigned long new_gp_seq;
|
unsigned long new_gp_seq;
|
||||||
bool offloaded;
|
bool offloaded;
|
||||||
struct rcu_data *rdp;
|
struct rcu_data *rdp;
|
||||||
|
@ -1748,6 +1761,12 @@ static void rcu_gp_cleanup(void)
|
||||||
needgp = __note_gp_changes(rnp, rdp) || needgp;
|
needgp = __note_gp_changes(rnp, rdp) || needgp;
|
||||||
/* smp_mb() provided by prior unlock-lock pair. */
|
/* smp_mb() provided by prior unlock-lock pair. */
|
||||||
needgp = rcu_future_gp_cleanup(rnp) || needgp;
|
needgp = rcu_future_gp_cleanup(rnp) || needgp;
|
||||||
|
// Reset overload indication for CPUs no longer overloaded
|
||||||
|
if (rcu_is_leaf_node(rnp))
|
||||||
|
for_each_leaf_node_cpu_mask(rnp, cpu, rnp->cbovldmask) {
|
||||||
|
rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||||
|
check_cb_ovld_locked(rdp, rnp);
|
||||||
|
}
|
||||||
sq = rcu_nocb_gp_get(rnp);
|
sq = rcu_nocb_gp_get(rnp);
|
||||||
raw_spin_unlock_irq_rcu_node(rnp);
|
raw_spin_unlock_irq_rcu_node(rnp);
|
||||||
rcu_nocb_gp_cleanup(sq);
|
rcu_nocb_gp_cleanup(sq);
|
||||||
|
@ -1774,9 +1793,9 @@ static void rcu_gp_cleanup(void)
|
||||||
rcu_segcblist_is_offloaded(&rdp->cblist);
|
rcu_segcblist_is_offloaded(&rdp->cblist);
|
||||||
if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
|
if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
|
||||||
WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
|
WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
|
||||||
rcu_state.gp_req_activity = jiffies;
|
WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
|
||||||
trace_rcu_grace_period(rcu_state.name,
|
trace_rcu_grace_period(rcu_state.name,
|
||||||
READ_ONCE(rcu_state.gp_seq),
|
rcu_state.gp_seq,
|
||||||
TPS("newreq"));
|
TPS("newreq"));
|
||||||
} else {
|
} else {
|
||||||
WRITE_ONCE(rcu_state.gp_flags,
|
WRITE_ONCE(rcu_state.gp_flags,
|
||||||
|
@ -1795,8 +1814,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
|
||||||
|
|
||||||
/* Handle grace-period start. */
|
/* Handle grace-period start. */
|
||||||
for (;;) {
|
for (;;) {
|
||||||
trace_rcu_grace_period(rcu_state.name,
|
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
|
||||||
READ_ONCE(rcu_state.gp_seq),
|
|
||||||
TPS("reqwait"));
|
TPS("reqwait"));
|
||||||
rcu_state.gp_state = RCU_GP_WAIT_GPS;
|
rcu_state.gp_state = RCU_GP_WAIT_GPS;
|
||||||
swait_event_idle_exclusive(rcu_state.gp_wq,
|
swait_event_idle_exclusive(rcu_state.gp_wq,
|
||||||
|
@ -1809,8 +1827,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
|
||||||
cond_resched_tasks_rcu_qs();
|
cond_resched_tasks_rcu_qs();
|
||||||
WRITE_ONCE(rcu_state.gp_activity, jiffies);
|
WRITE_ONCE(rcu_state.gp_activity, jiffies);
|
||||||
WARN_ON(signal_pending(current));
|
WARN_ON(signal_pending(current));
|
||||||
trace_rcu_grace_period(rcu_state.name,
|
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
|
||||||
READ_ONCE(rcu_state.gp_seq),
|
|
||||||
TPS("reqwaitsig"));
|
TPS("reqwaitsig"));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -1881,7 +1898,7 @@ static void rcu_report_qs_rnp(unsigned long mask, struct rcu_node *rnp,
|
||||||
WARN_ON_ONCE(oldmask); /* Any child must be all zeroed! */
|
WARN_ON_ONCE(oldmask); /* Any child must be all zeroed! */
|
||||||
WARN_ON_ONCE(!rcu_is_leaf_node(rnp) &&
|
WARN_ON_ONCE(!rcu_is_leaf_node(rnp) &&
|
||||||
rcu_preempt_blocked_readers_cgp(rnp));
|
rcu_preempt_blocked_readers_cgp(rnp));
|
||||||
rnp->qsmask &= ~mask;
|
WRITE_ONCE(rnp->qsmask, rnp->qsmask & ~mask);
|
||||||
trace_rcu_quiescent_state_report(rcu_state.name, rnp->gp_seq,
|
trace_rcu_quiescent_state_report(rcu_state.name, rnp->gp_seq,
|
||||||
mask, rnp->qsmask, rnp->level,
|
mask, rnp->qsmask, rnp->level,
|
||||||
rnp->grplo, rnp->grphi,
|
rnp->grplo, rnp->grphi,
|
||||||
|
@ -1904,7 +1921,7 @@ static void rcu_report_qs_rnp(unsigned long mask, struct rcu_node *rnp,
|
||||||
rnp_c = rnp;
|
rnp_c = rnp;
|
||||||
rnp = rnp->parent;
|
rnp = rnp->parent;
|
||||||
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
||||||
oldmask = rnp_c->qsmask;
|
oldmask = READ_ONCE(rnp_c->qsmask);
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -1987,6 +2004,8 @@ rcu_report_qs_rdp(int cpu, struct rcu_data *rdp)
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
mask = rdp->grpmask;
|
mask = rdp->grpmask;
|
||||||
|
if (rdp->cpu == smp_processor_id())
|
||||||
|
rdp->core_needs_qs = false;
|
||||||
if ((rnp->qsmask & mask) == 0) {
|
if ((rnp->qsmask & mask) == 0) {
|
||||||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||||
} else {
|
} else {
|
||||||
|
@ -2052,7 +2071,7 @@ int rcutree_dying_cpu(unsigned int cpu)
|
||||||
return 0;
|
return 0;
|
||||||
|
|
||||||
blkd = !!(rnp->qsmask & rdp->grpmask);
|
blkd = !!(rnp->qsmask & rdp->grpmask);
|
||||||
trace_rcu_grace_period(rcu_state.name, rnp->gp_seq,
|
trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
|
||||||
blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
|
blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
@ -2294,10 +2313,13 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
|
||||||
struct rcu_data *rdp;
|
struct rcu_data *rdp;
|
||||||
struct rcu_node *rnp;
|
struct rcu_node *rnp;
|
||||||
|
|
||||||
|
rcu_state.cbovld = rcu_state.cbovldnext;
|
||||||
|
rcu_state.cbovldnext = false;
|
||||||
rcu_for_each_leaf_node(rnp) {
|
rcu_for_each_leaf_node(rnp) {
|
||||||
cond_resched_tasks_rcu_qs();
|
cond_resched_tasks_rcu_qs();
|
||||||
mask = 0;
|
mask = 0;
|
||||||
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
||||||
|
rcu_state.cbovldnext |= !!rnp->cbovldmask;
|
||||||
if (rnp->qsmask == 0) {
|
if (rnp->qsmask == 0) {
|
||||||
if (!IS_ENABLED(CONFIG_PREEMPT_RCU) ||
|
if (!IS_ENABLED(CONFIG_PREEMPT_RCU) ||
|
||||||
rcu_preempt_blocked_readers_cgp(rnp)) {
|
rcu_preempt_blocked_readers_cgp(rnp)) {
|
||||||
|
@ -2579,11 +2601,48 @@ static void rcu_leak_callback(struct rcu_head *rhp)
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Helper function for call_rcu() and friends. The cpu argument will
|
* Check and if necessary update the leaf rcu_node structure's
|
||||||
* normally be -1, indicating "currently running CPU". It may specify
|
* ->cbovldmask bit corresponding to the current CPU based on that CPU's
|
||||||
* a CPU only if that CPU is a no-CBs CPU. Currently, only rcu_barrier()
|
* number of queued RCU callbacks. The caller must hold the leaf rcu_node
|
||||||
* is expected to specify a CPU.
|
* structure's ->lock.
|
||||||
*/
|
*/
|
||||||
|
static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp)
|
||||||
|
{
|
||||||
|
raw_lockdep_assert_held_rcu_node(rnp);
|
||||||
|
if (qovld_calc <= 0)
|
||||||
|
return; // Early boot and wildcard value set.
|
||||||
|
if (rcu_segcblist_n_cbs(&rdp->cblist) >= qovld_calc)
|
||||||
|
WRITE_ONCE(rnp->cbovldmask, rnp->cbovldmask | rdp->grpmask);
|
||||||
|
else
|
||||||
|
WRITE_ONCE(rnp->cbovldmask, rnp->cbovldmask & ~rdp->grpmask);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Check and if necessary update the leaf rcu_node structure's
|
||||||
|
* ->cbovldmask bit corresponding to the current CPU based on that CPU's
|
||||||
|
* number of queued RCU callbacks. No locks need be held, but the
|
||||||
|
* caller must have disabled interrupts.
|
||||||
|
*
|
||||||
|
* Note that this function ignores the possibility that there are a lot
|
||||||
|
* of callbacks all of which have already seen the end of their respective
|
||||||
|
* grace periods. This omission is due to the need for no-CBs CPUs to
|
||||||
|
* be holding ->nocb_lock to do this check, which is too heavy for a
|
||||||
|
* common-case operation.
|
||||||
|
*/
|
||||||
|
static void check_cb_ovld(struct rcu_data *rdp)
|
||||||
|
{
|
||||||
|
struct rcu_node *const rnp = rdp->mynode;
|
||||||
|
|
||||||
|
if (qovld_calc <= 0 ||
|
||||||
|
((rcu_segcblist_n_cbs(&rdp->cblist) >= qovld_calc) ==
|
||||||
|
!!(READ_ONCE(rnp->cbovldmask) & rdp->grpmask)))
|
||||||
|
return; // Early boot wildcard value or already set correctly.
|
||||||
|
raw_spin_lock_rcu_node(rnp);
|
||||||
|
check_cb_ovld_locked(rdp, rnp);
|
||||||
|
raw_spin_unlock_rcu_node(rnp);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Helper function for call_rcu() and friends. */
|
||||||
static void
|
static void
|
||||||
__call_rcu(struct rcu_head *head, rcu_callback_t func)
|
__call_rcu(struct rcu_head *head, rcu_callback_t func)
|
||||||
{
|
{
|
||||||
|
@ -2621,9 +2680,10 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
|
||||||
rcu_segcblist_init(&rdp->cblist);
|
rcu_segcblist_init(&rdp->cblist);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
check_cb_ovld(rdp);
|
||||||
if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
|
if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
|
||||||
return; // Enqueued onto ->nocb_bypass, so just leave.
|
return; // Enqueued onto ->nocb_bypass, so just leave.
|
||||||
/* If we get here, rcu_nocb_try_bypass() acquired ->nocb_lock. */
|
// If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
|
||||||
rcu_segcblist_enqueue(&rdp->cblist, head);
|
rcu_segcblist_enqueue(&rdp->cblist, head);
|
||||||
if (__is_kfree_rcu_offset((unsigned long)func))
|
if (__is_kfree_rcu_offset((unsigned long)func))
|
||||||
trace_rcu_kfree_callback(rcu_state.name, head,
|
trace_rcu_kfree_callback(rcu_state.name, head,
|
||||||
|
@ -2689,22 +2749,47 @@ EXPORT_SYMBOL_GPL(call_rcu);
|
||||||
#define KFREE_DRAIN_JIFFIES (HZ / 50)
|
#define KFREE_DRAIN_JIFFIES (HZ / 50)
|
||||||
#define KFREE_N_BATCHES 2
|
#define KFREE_N_BATCHES 2
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This macro defines how many entries the "records" array
|
||||||
|
* will contain. It is based on the fact that the size of
|
||||||
|
* kfree_rcu_bulk_data structure becomes exactly one page.
|
||||||
|
*/
|
||||||
|
#define KFREE_BULK_MAX_ENTR ((PAGE_SIZE / sizeof(void *)) - 3)
|
||||||
|
|
||||||
|
/**
|
||||||
|
* struct kfree_rcu_bulk_data - single block to store kfree_rcu() pointers
|
||||||
|
* @nr_records: Number of active pointers in the array
|
||||||
|
* @records: Array of the kfree_rcu() pointers
|
||||||
|
* @next: Next bulk object in the block chain
|
||||||
|
* @head_free_debug: For debug, when CONFIG_DEBUG_OBJECTS_RCU_HEAD is set
|
||||||
|
*/
|
||||||
|
struct kfree_rcu_bulk_data {
|
||||||
|
unsigned long nr_records;
|
||||||
|
void *records[KFREE_BULK_MAX_ENTR];
|
||||||
|
struct kfree_rcu_bulk_data *next;
|
||||||
|
struct rcu_head *head_free_debug;
|
||||||
|
};
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
|
* struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
|
||||||
* @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
|
* @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
|
||||||
* @head_free: List of kfree_rcu() objects waiting for a grace period
|
* @head_free: List of kfree_rcu() objects waiting for a grace period
|
||||||
|
* @bhead_free: Bulk-List of kfree_rcu() objects waiting for a grace period
|
||||||
* @krcp: Pointer to @kfree_rcu_cpu structure
|
* @krcp: Pointer to @kfree_rcu_cpu structure
|
||||||
*/
|
*/
|
||||||
|
|
||||||
struct kfree_rcu_cpu_work {
|
struct kfree_rcu_cpu_work {
|
||||||
struct rcu_work rcu_work;
|
struct rcu_work rcu_work;
|
||||||
struct rcu_head *head_free;
|
struct rcu_head *head_free;
|
||||||
|
struct kfree_rcu_bulk_data *bhead_free;
|
||||||
struct kfree_rcu_cpu *krcp;
|
struct kfree_rcu_cpu *krcp;
|
||||||
};
|
};
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
|
* struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
|
||||||
* @head: List of kfree_rcu() objects not yet waiting for a grace period
|
* @head: List of kfree_rcu() objects not yet waiting for a grace period
|
||||||
|
* @bhead: Bulk-List of kfree_rcu() objects not yet waiting for a grace period
|
||||||
|
* @bcached: Keeps at most one object for later reuse when build chain blocks
|
||||||
* @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
|
* @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
|
||||||
* @lock: Synchronize access to this structure
|
* @lock: Synchronize access to this structure
|
||||||
* @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
|
* @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
|
||||||
|
@ -2718,6 +2803,8 @@ struct kfree_rcu_cpu_work {
|
||||||
*/
|
*/
|
||||||
struct kfree_rcu_cpu {
|
struct kfree_rcu_cpu {
|
||||||
struct rcu_head *head;
|
struct rcu_head *head;
|
||||||
|
struct kfree_rcu_bulk_data *bhead;
|
||||||
|
struct kfree_rcu_bulk_data *bcached;
|
||||||
struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
|
struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
|
||||||
spinlock_t lock;
|
spinlock_t lock;
|
||||||
struct delayed_work monitor_work;
|
struct delayed_work monitor_work;
|
||||||
|
@ -2727,14 +2814,24 @@ struct kfree_rcu_cpu {
|
||||||
|
|
||||||
static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc);
|
static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc);
|
||||||
|
|
||||||
|
static __always_inline void
|
||||||
|
debug_rcu_head_unqueue_bulk(struct rcu_head *head)
|
||||||
|
{
|
||||||
|
#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
|
||||||
|
for (; head; head = head->next)
|
||||||
|
debug_rcu_head_unqueue(head);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* This function is invoked in workqueue context after a grace period.
|
* This function is invoked in workqueue context after a grace period.
|
||||||
* It frees all the objects queued on ->head_free.
|
* It frees all the objects queued on ->bhead_free or ->head_free.
|
||||||
*/
|
*/
|
||||||
static void kfree_rcu_work(struct work_struct *work)
|
static void kfree_rcu_work(struct work_struct *work)
|
||||||
{
|
{
|
||||||
unsigned long flags;
|
unsigned long flags;
|
||||||
struct rcu_head *head, *next;
|
struct rcu_head *head, *next;
|
||||||
|
struct kfree_rcu_bulk_data *bhead, *bnext;
|
||||||
struct kfree_rcu_cpu *krcp;
|
struct kfree_rcu_cpu *krcp;
|
||||||
struct kfree_rcu_cpu_work *krwp;
|
struct kfree_rcu_cpu_work *krwp;
|
||||||
|
|
||||||
|
@ -2744,22 +2841,44 @@ static void kfree_rcu_work(struct work_struct *work)
|
||||||
spin_lock_irqsave(&krcp->lock, flags);
|
spin_lock_irqsave(&krcp->lock, flags);
|
||||||
head = krwp->head_free;
|
head = krwp->head_free;
|
||||||
krwp->head_free = NULL;
|
krwp->head_free = NULL;
|
||||||
|
bhead = krwp->bhead_free;
|
||||||
|
krwp->bhead_free = NULL;
|
||||||
spin_unlock_irqrestore(&krcp->lock, flags);
|
spin_unlock_irqrestore(&krcp->lock, flags);
|
||||||
|
|
||||||
// List "head" is now private, so traverse locklessly.
|
/* "bhead" is now private, so traverse locklessly. */
|
||||||
|
for (; bhead; bhead = bnext) {
|
||||||
|
bnext = bhead->next;
|
||||||
|
|
||||||
|
debug_rcu_head_unqueue_bulk(bhead->head_free_debug);
|
||||||
|
|
||||||
|
rcu_lock_acquire(&rcu_callback_map);
|
||||||
|
trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
|
||||||
|
bhead->nr_records, bhead->records);
|
||||||
|
|
||||||
|
kfree_bulk(bhead->nr_records, bhead->records);
|
||||||
|
rcu_lock_release(&rcu_callback_map);
|
||||||
|
|
||||||
|
if (cmpxchg(&krcp->bcached, NULL, bhead))
|
||||||
|
free_page((unsigned long) bhead);
|
||||||
|
|
||||||
|
cond_resched_tasks_rcu_qs();
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Emergency case only. It can happen under low memory
|
||||||
|
* condition when an allocation gets failed, so the "bulk"
|
||||||
|
* path can not be temporary maintained.
|
||||||
|
*/
|
||||||
for (; head; head = next) {
|
for (; head; head = next) {
|
||||||
unsigned long offset = (unsigned long)head->func;
|
unsigned long offset = (unsigned long)head->func;
|
||||||
|
|
||||||
next = head->next;
|
next = head->next;
|
||||||
// Potentially optimize with kfree_bulk in future.
|
|
||||||
debug_rcu_head_unqueue(head);
|
debug_rcu_head_unqueue(head);
|
||||||
rcu_lock_acquire(&rcu_callback_map);
|
rcu_lock_acquire(&rcu_callback_map);
|
||||||
trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset);
|
trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset);
|
||||||
|
|
||||||
if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset))) {
|
if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset)))
|
||||||
/* Could be optimized with kfree_bulk() in future. */
|
|
||||||
kfree((void *)head - offset);
|
kfree((void *)head - offset);
|
||||||
}
|
|
||||||
|
|
||||||
rcu_lock_release(&rcu_callback_map);
|
rcu_lock_release(&rcu_callback_map);
|
||||||
cond_resched_tasks_rcu_qs();
|
cond_resched_tasks_rcu_qs();
|
||||||
|
@ -2774,26 +2893,48 @@ static void kfree_rcu_work(struct work_struct *work)
|
||||||
*/
|
*/
|
||||||
static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
|
static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
|
||||||
{
|
{
|
||||||
|
struct kfree_rcu_cpu_work *krwp;
|
||||||
|
bool queued = false;
|
||||||
int i;
|
int i;
|
||||||
struct kfree_rcu_cpu_work *krwp = NULL;
|
|
||||||
|
|
||||||
lockdep_assert_held(&krcp->lock);
|
lockdep_assert_held(&krcp->lock);
|
||||||
for (i = 0; i < KFREE_N_BATCHES; i++)
|
|
||||||
if (!krcp->krw_arr[i].head_free) {
|
for (i = 0; i < KFREE_N_BATCHES; i++) {
|
||||||
krwp = &(krcp->krw_arr[i]);
|
krwp = &(krcp->krw_arr[i]);
|
||||||
break;
|
|
||||||
|
/*
|
||||||
|
* Try to detach bhead or head and attach it over any
|
||||||
|
* available corresponding free channel. It can be that
|
||||||
|
* a previous RCU batch is in progress, it means that
|
||||||
|
* immediately to queue another one is not possible so
|
||||||
|
* return false to tell caller to retry.
|
||||||
|
*/
|
||||||
|
if ((krcp->bhead && !krwp->bhead_free) ||
|
||||||
|
(krcp->head && !krwp->head_free)) {
|
||||||
|
/* Channel 1. */
|
||||||
|
if (!krwp->bhead_free) {
|
||||||
|
krwp->bhead_free = krcp->bhead;
|
||||||
|
krcp->bhead = NULL;
|
||||||
}
|
}
|
||||||
|
|
||||||
// If a previous RCU batch is in progress, we cannot immediately
|
/* Channel 2. */
|
||||||
// queue another one, so return false to tell caller to retry.
|
if (!krwp->head_free) {
|
||||||
if (!krwp)
|
|
||||||
return false;
|
|
||||||
|
|
||||||
krwp->head_free = krcp->head;
|
krwp->head_free = krcp->head;
|
||||||
krcp->head = NULL;
|
krcp->head = NULL;
|
||||||
INIT_RCU_WORK(&krwp->rcu_work, kfree_rcu_work);
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* One work is per one batch, so there are two "free channels",
|
||||||
|
* "bhead_free" and "head_free" the batch can handle. It can be
|
||||||
|
* that the work is in the pending state when two channels have
|
||||||
|
* been detached following each other, one by one.
|
||||||
|
*/
|
||||||
queue_rcu_work(system_wq, &krwp->rcu_work);
|
queue_rcu_work(system_wq, &krwp->rcu_work);
|
||||||
return true;
|
queued = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return queued;
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
|
static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
|
||||||
|
@ -2830,19 +2971,65 @@ static void kfree_rcu_monitor(struct work_struct *work)
|
||||||
spin_unlock_irqrestore(&krcp->lock, flags);
|
spin_unlock_irqrestore(&krcp->lock, flags);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static inline bool
|
||||||
|
kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
|
||||||
|
struct rcu_head *head, rcu_callback_t func)
|
||||||
|
{
|
||||||
|
struct kfree_rcu_bulk_data *bnode;
|
||||||
|
|
||||||
|
if (unlikely(!krcp->initialized))
|
||||||
|
return false;
|
||||||
|
|
||||||
|
lockdep_assert_held(&krcp->lock);
|
||||||
|
|
||||||
|
/* Check if a new block is required. */
|
||||||
|
if (!krcp->bhead ||
|
||||||
|
krcp->bhead->nr_records == KFREE_BULK_MAX_ENTR) {
|
||||||
|
bnode = xchg(&krcp->bcached, NULL);
|
||||||
|
if (!bnode) {
|
||||||
|
WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
|
||||||
|
|
||||||
|
bnode = (struct kfree_rcu_bulk_data *)
|
||||||
|
__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Switch to emergency path. */
|
||||||
|
if (unlikely(!bnode))
|
||||||
|
return false;
|
||||||
|
|
||||||
|
/* Initialize the new block. */
|
||||||
|
bnode->nr_records = 0;
|
||||||
|
bnode->next = krcp->bhead;
|
||||||
|
bnode->head_free_debug = NULL;
|
||||||
|
|
||||||
|
/* Attach it to the head. */
|
||||||
|
krcp->bhead = bnode;
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
|
||||||
|
head->func = func;
|
||||||
|
head->next = krcp->bhead->head_free_debug;
|
||||||
|
krcp->bhead->head_free_debug = head;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Finally insert. */
|
||||||
|
krcp->bhead->records[krcp->bhead->nr_records++] =
|
||||||
|
(void *) head - (unsigned long) func;
|
||||||
|
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Queue a request for lazy invocation of kfree() after a grace period.
|
* Queue a request for lazy invocation of kfree_bulk()/kfree() after a grace
|
||||||
|
* period. Please note there are two paths are maintained, one is the main one
|
||||||
|
* that uses kfree_bulk() interface and second one is emergency one, that is
|
||||||
|
* used only when the main path can not be maintained temporary, due to memory
|
||||||
|
* pressure.
|
||||||
*
|
*
|
||||||
* Each kfree_call_rcu() request is added to a batch. The batch will be drained
|
* Each kfree_call_rcu() request is added to a batch. The batch will be drained
|
||||||
* every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch
|
* every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will
|
||||||
* will be kfree'd in workqueue context. This allows us to:
|
* be free'd in workqueue context. This allows us to: batch requests together to
|
||||||
*
|
* reduce the number of grace periods during heavy kfree_rcu() load.
|
||||||
* 1. Batch requests together to reduce the number of grace periods during
|
|
||||||
* heavy kfree_rcu() load.
|
|
||||||
*
|
|
||||||
* 2. It makes it possible to use kfree_bulk() on a large number of
|
|
||||||
* kfree_rcu() requests thus reducing cache misses and the per-object
|
|
||||||
* overhead of kfree().
|
|
||||||
*/
|
*/
|
||||||
void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
|
void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
|
||||||
{
|
{
|
||||||
|
@ -2861,9 +3048,16 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
|
||||||
__func__, head);
|
__func__, head);
|
||||||
goto unlock_return;
|
goto unlock_return;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Under high memory pressure GFP_NOWAIT can fail,
|
||||||
|
* in that case the emergency path is maintained.
|
||||||
|
*/
|
||||||
|
if (unlikely(!kfree_call_rcu_add_ptr_to_bulk(krcp, head, func))) {
|
||||||
head->func = func;
|
head->func = func;
|
||||||
head->next = krcp->head;
|
head->next = krcp->head;
|
||||||
krcp->head = head;
|
krcp->head = head;
|
||||||
|
}
|
||||||
|
|
||||||
// Set timer to drain after KFREE_DRAIN_JIFFIES.
|
// Set timer to drain after KFREE_DRAIN_JIFFIES.
|
||||||
if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
|
if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
|
||||||
|
@ -3075,24 +3269,32 @@ static void rcu_barrier_trace(const char *s, int cpu, unsigned long done)
|
||||||
/*
|
/*
|
||||||
* RCU callback function for rcu_barrier(). If we are last, wake
|
* RCU callback function for rcu_barrier(). If we are last, wake
|
||||||
* up the task executing rcu_barrier().
|
* up the task executing rcu_barrier().
|
||||||
|
*
|
||||||
|
* Note that the value of rcu_state.barrier_sequence must be captured
|
||||||
|
* before the atomic_dec_and_test(). Otherwise, if this CPU is not last,
|
||||||
|
* other CPUs might count the value down to zero before this CPU gets
|
||||||
|
* around to invoking rcu_barrier_trace(), which might result in bogus
|
||||||
|
* data from the next instance of rcu_barrier().
|
||||||
*/
|
*/
|
||||||
static void rcu_barrier_callback(struct rcu_head *rhp)
|
static void rcu_barrier_callback(struct rcu_head *rhp)
|
||||||
{
|
{
|
||||||
|
unsigned long __maybe_unused s = rcu_state.barrier_sequence;
|
||||||
|
|
||||||
if (atomic_dec_and_test(&rcu_state.barrier_cpu_count)) {
|
if (atomic_dec_and_test(&rcu_state.barrier_cpu_count)) {
|
||||||
rcu_barrier_trace(TPS("LastCB"), -1,
|
rcu_barrier_trace(TPS("LastCB"), -1, s);
|
||||||
rcu_state.barrier_sequence);
|
|
||||||
complete(&rcu_state.barrier_completion);
|
complete(&rcu_state.barrier_completion);
|
||||||
} else {
|
} else {
|
||||||
rcu_barrier_trace(TPS("CB"), -1, rcu_state.barrier_sequence);
|
rcu_barrier_trace(TPS("CB"), -1, s);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Called with preemption disabled, and from cross-cpu IRQ context.
|
* Called with preemption disabled, and from cross-cpu IRQ context.
|
||||||
*/
|
*/
|
||||||
static void rcu_barrier_func(void *unused)
|
static void rcu_barrier_func(void *cpu_in)
|
||||||
{
|
{
|
||||||
struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
|
uintptr_t cpu = (uintptr_t)cpu_in;
|
||||||
|
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||||
|
|
||||||
rcu_barrier_trace(TPS("IRQ"), -1, rcu_state.barrier_sequence);
|
rcu_barrier_trace(TPS("IRQ"), -1, rcu_state.barrier_sequence);
|
||||||
rdp->barrier_head.func = rcu_barrier_callback;
|
rdp->barrier_head.func = rcu_barrier_callback;
|
||||||
|
@ -3119,7 +3321,7 @@ static void rcu_barrier_func(void *unused)
|
||||||
*/
|
*/
|
||||||
void rcu_barrier(void)
|
void rcu_barrier(void)
|
||||||
{
|
{
|
||||||
int cpu;
|
uintptr_t cpu;
|
||||||
struct rcu_data *rdp;
|
struct rcu_data *rdp;
|
||||||
unsigned long s = rcu_seq_snap(&rcu_state.barrier_sequence);
|
unsigned long s = rcu_seq_snap(&rcu_state.barrier_sequence);
|
||||||
|
|
||||||
|
@ -3142,13 +3344,14 @@ void rcu_barrier(void)
|
||||||
rcu_barrier_trace(TPS("Inc1"), -1, rcu_state.barrier_sequence);
|
rcu_barrier_trace(TPS("Inc1"), -1, rcu_state.barrier_sequence);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Initialize the count to one rather than to zero in order to
|
* Initialize the count to two rather than to zero in order
|
||||||
* avoid a too-soon return to zero in case of a short grace period
|
* to avoid a too-soon return to zero in case of an immediate
|
||||||
* (or preemption of this task). Exclude CPU-hotplug operations
|
* invocation of the just-enqueued callback (or preemption of
|
||||||
* to ensure that no offline CPU has callbacks queued.
|
* this task). Exclude CPU-hotplug operations to ensure that no
|
||||||
|
* offline non-offloaded CPU has callbacks queued.
|
||||||
*/
|
*/
|
||||||
init_completion(&rcu_state.barrier_completion);
|
init_completion(&rcu_state.barrier_completion);
|
||||||
atomic_set(&rcu_state.barrier_cpu_count, 1);
|
atomic_set(&rcu_state.barrier_cpu_count, 2);
|
||||||
get_online_cpus();
|
get_online_cpus();
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -3158,13 +3361,23 @@ void rcu_barrier(void)
|
||||||
*/
|
*/
|
||||||
for_each_possible_cpu(cpu) {
|
for_each_possible_cpu(cpu) {
|
||||||
rdp = per_cpu_ptr(&rcu_data, cpu);
|
rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||||
if (!cpu_online(cpu) &&
|
if (cpu_is_offline(cpu) &&
|
||||||
!rcu_segcblist_is_offloaded(&rdp->cblist))
|
!rcu_segcblist_is_offloaded(&rdp->cblist))
|
||||||
continue;
|
continue;
|
||||||
if (rcu_segcblist_n_cbs(&rdp->cblist)) {
|
if (rcu_segcblist_n_cbs(&rdp->cblist) && cpu_online(cpu)) {
|
||||||
rcu_barrier_trace(TPS("OnlineQ"), cpu,
|
rcu_barrier_trace(TPS("OnlineQ"), cpu,
|
||||||
rcu_state.barrier_sequence);
|
rcu_state.barrier_sequence);
|
||||||
smp_call_function_single(cpu, rcu_barrier_func, NULL, 1);
|
smp_call_function_single(cpu, rcu_barrier_func, (void *)cpu, 1);
|
||||||
|
} else if (rcu_segcblist_n_cbs(&rdp->cblist) &&
|
||||||
|
cpu_is_offline(cpu)) {
|
||||||
|
rcu_barrier_trace(TPS("OfflineNoCBQ"), cpu,
|
||||||
|
rcu_state.barrier_sequence);
|
||||||
|
local_irq_disable();
|
||||||
|
rcu_barrier_func((void *)cpu);
|
||||||
|
local_irq_enable();
|
||||||
|
} else if (cpu_is_offline(cpu)) {
|
||||||
|
rcu_barrier_trace(TPS("OfflineNoCBNoQ"), cpu,
|
||||||
|
rcu_state.barrier_sequence);
|
||||||
} else {
|
} else {
|
||||||
rcu_barrier_trace(TPS("OnlineNQ"), cpu,
|
rcu_barrier_trace(TPS("OnlineNQ"), cpu,
|
||||||
rcu_state.barrier_sequence);
|
rcu_state.barrier_sequence);
|
||||||
|
@ -3176,7 +3389,7 @@ void rcu_barrier(void)
|
||||||
* Now that we have an rcu_barrier_callback() callback on each
|
* Now that we have an rcu_barrier_callback() callback on each
|
||||||
* CPU, and thus each counted, remove the initial count.
|
* CPU, and thus each counted, remove the initial count.
|
||||||
*/
|
*/
|
||||||
if (atomic_dec_and_test(&rcu_state.barrier_cpu_count))
|
if (atomic_sub_and_test(2, &rcu_state.barrier_cpu_count))
|
||||||
complete(&rcu_state.barrier_completion);
|
complete(&rcu_state.barrier_completion);
|
||||||
|
|
||||||
/* Wait for all rcu_barrier_callback() callbacks to be invoked. */
|
/* Wait for all rcu_barrier_callback() callbacks to be invoked. */
|
||||||
|
@ -3275,12 +3488,12 @@ int rcutree_prepare_cpu(unsigned int cpu)
|
||||||
rnp = rdp->mynode;
|
rnp = rdp->mynode;
|
||||||
raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
|
raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
|
||||||
rdp->beenonline = true; /* We have now been online. */
|
rdp->beenonline = true; /* We have now been online. */
|
||||||
rdp->gp_seq = rnp->gp_seq;
|
rdp->gp_seq = READ_ONCE(rnp->gp_seq);
|
||||||
rdp->gp_seq_needed = rnp->gp_seq;
|
rdp->gp_seq_needed = rdp->gp_seq;
|
||||||
rdp->cpu_no_qs.b.norm = true;
|
rdp->cpu_no_qs.b.norm = true;
|
||||||
rdp->core_needs_qs = false;
|
rdp->core_needs_qs = false;
|
||||||
rdp->rcu_iw_pending = false;
|
rdp->rcu_iw_pending = false;
|
||||||
rdp->rcu_iw_gp_seq = rnp->gp_seq - 1;
|
rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
|
||||||
trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
|
trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
|
||||||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||||
rcu_prepare_kthreads(cpu);
|
rcu_prepare_kthreads(cpu);
|
||||||
|
@ -3378,7 +3591,7 @@ void rcu_cpu_starting(unsigned int cpu)
|
||||||
rnp = rdp->mynode;
|
rnp = rdp->mynode;
|
||||||
mask = rdp->grpmask;
|
mask = rdp->grpmask;
|
||||||
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
||||||
rnp->qsmaskinitnext |= mask;
|
WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask);
|
||||||
oldmask = rnp->expmaskinitnext;
|
oldmask = rnp->expmaskinitnext;
|
||||||
rnp->expmaskinitnext |= mask;
|
rnp->expmaskinitnext |= mask;
|
||||||
oldmask ^= rnp->expmaskinitnext;
|
oldmask ^= rnp->expmaskinitnext;
|
||||||
|
@ -3431,7 +3644,7 @@ void rcu_report_dead(unsigned int cpu)
|
||||||
rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
|
rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
|
||||||
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
||||||
}
|
}
|
||||||
rnp->qsmaskinitnext &= ~mask;
|
WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);
|
||||||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||||
raw_spin_unlock(&rcu_state.ofl_lock);
|
raw_spin_unlock(&rcu_state.ofl_lock);
|
||||||
|
|
||||||
|
@ -3545,7 +3758,10 @@ static int __init rcu_spawn_gp_kthread(void)
|
||||||
}
|
}
|
||||||
rnp = rcu_get_root();
|
rnp = rcu_get_root();
|
||||||
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
||||||
rcu_state.gp_kthread = t;
|
WRITE_ONCE(rcu_state.gp_activity, jiffies);
|
||||||
|
WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
|
||||||
|
// Reset .gp_activity and .gp_req_activity before setting .gp_kthread.
|
||||||
|
smp_store_release(&rcu_state.gp_kthread, t); /* ^^^ */
|
||||||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||||
wake_up_process(t);
|
wake_up_process(t);
|
||||||
rcu_spawn_nocb_kthreads();
|
rcu_spawn_nocb_kthreads();
|
||||||
|
@ -3769,8 +3985,11 @@ static void __init kfree_rcu_batch_init(void)
|
||||||
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
|
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
|
||||||
|
|
||||||
spin_lock_init(&krcp->lock);
|
spin_lock_init(&krcp->lock);
|
||||||
for (i = 0; i < KFREE_N_BATCHES; i++)
|
for (i = 0; i < KFREE_N_BATCHES; i++) {
|
||||||
|
INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
|
||||||
krcp->krw_arr[i].krcp = krcp;
|
krcp->krw_arr[i].krcp = krcp;
|
||||||
|
}
|
||||||
|
|
||||||
INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
|
INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
|
||||||
krcp->initialized = true;
|
krcp->initialized = true;
|
||||||
}
|
}
|
||||||
|
@ -3809,6 +4028,13 @@ void __init rcu_init(void)
|
||||||
rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0);
|
rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0);
|
||||||
WARN_ON(!rcu_par_gp_wq);
|
WARN_ON(!rcu_par_gp_wq);
|
||||||
srcu_init();
|
srcu_init();
|
||||||
|
|
||||||
|
/* Fill in default value for rcutree.qovld boot parameter. */
|
||||||
|
/* -After- the rcu_node ->lock fields are initialized! */
|
||||||
|
if (qovld < 0)
|
||||||
|
qovld_calc = DEFAULT_RCU_QOVLD_MULT * qhimark;
|
||||||
|
else
|
||||||
|
qovld_calc = qovld;
|
||||||
}
|
}
|
||||||
|
|
||||||
#include "tree_stall.h"
|
#include "tree_stall.h"
|
||||||
|
|
|
@ -68,6 +68,8 @@ struct rcu_node {
|
||||||
/* Online CPUs for next expedited GP. */
|
/* Online CPUs for next expedited GP. */
|
||||||
/* Any CPU that has ever been online will */
|
/* Any CPU that has ever been online will */
|
||||||
/* have its bit set. */
|
/* have its bit set. */
|
||||||
|
unsigned long cbovldmask;
|
||||||
|
/* CPUs experiencing callback overload. */
|
||||||
unsigned long ffmask; /* Fully functional CPUs. */
|
unsigned long ffmask; /* Fully functional CPUs. */
|
||||||
unsigned long grpmask; /* Mask to apply to parent qsmask. */
|
unsigned long grpmask; /* Mask to apply to parent qsmask. */
|
||||||
/* Only one bit will be set in this mask. */
|
/* Only one bit will be set in this mask. */
|
||||||
|
@ -321,6 +323,8 @@ struct rcu_state {
|
||||||
atomic_t expedited_need_qs; /* # CPUs left to check in. */
|
atomic_t expedited_need_qs; /* # CPUs left to check in. */
|
||||||
struct swait_queue_head expedited_wq; /* Wait for check-ins. */
|
struct swait_queue_head expedited_wq; /* Wait for check-ins. */
|
||||||
int ncpus_snap; /* # CPUs seen last time. */
|
int ncpus_snap; /* # CPUs seen last time. */
|
||||||
|
u8 cbovld; /* Callback overload now? */
|
||||||
|
u8 cbovldnext; /* ^ ^ next time? */
|
||||||
|
|
||||||
unsigned long jiffies_force_qs; /* Time at which to invoke */
|
unsigned long jiffies_force_qs; /* Time at which to invoke */
|
||||||
/* force_quiescent_state(). */
|
/* force_quiescent_state(). */
|
||||||
|
|
|
@ -314,7 +314,7 @@ static bool exp_funnel_lock(unsigned long s)
|
||||||
sync_exp_work_done(s));
|
sync_exp_work_done(s));
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
rnp->exp_seq_rq = s; /* Followers can wait on us. */
|
WRITE_ONCE(rnp->exp_seq_rq, s); /* Followers can wait on us. */
|
||||||
spin_unlock(&rnp->exp_lock);
|
spin_unlock(&rnp->exp_lock);
|
||||||
trace_rcu_exp_funnel_lock(rcu_state.name, rnp->level,
|
trace_rcu_exp_funnel_lock(rcu_state.name, rnp->level,
|
||||||
rnp->grplo, rnp->grphi, TPS("nxtlvl"));
|
rnp->grplo, rnp->grphi, TPS("nxtlvl"));
|
||||||
|
@ -485,6 +485,7 @@ static bool synchronize_rcu_expedited_wait_once(long tlimit)
|
||||||
static void synchronize_rcu_expedited_wait(void)
|
static void synchronize_rcu_expedited_wait(void)
|
||||||
{
|
{
|
||||||
int cpu;
|
int cpu;
|
||||||
|
unsigned long j;
|
||||||
unsigned long jiffies_stall;
|
unsigned long jiffies_stall;
|
||||||
unsigned long jiffies_start;
|
unsigned long jiffies_start;
|
||||||
unsigned long mask;
|
unsigned long mask;
|
||||||
|
@ -496,7 +497,7 @@ static void synchronize_rcu_expedited_wait(void)
|
||||||
trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
|
trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
|
||||||
jiffies_stall = rcu_jiffies_till_stall_check();
|
jiffies_stall = rcu_jiffies_till_stall_check();
|
||||||
jiffies_start = jiffies;
|
jiffies_start = jiffies;
|
||||||
if (IS_ENABLED(CONFIG_NO_HZ_FULL)) {
|
if (tick_nohz_full_enabled() && rcu_inkernel_boot_has_ended()) {
|
||||||
if (synchronize_rcu_expedited_wait_once(1))
|
if (synchronize_rcu_expedited_wait_once(1))
|
||||||
return;
|
return;
|
||||||
rcu_for_each_leaf_node(rnp) {
|
rcu_for_each_leaf_node(rnp) {
|
||||||
|
@ -508,12 +509,16 @@ static void synchronize_rcu_expedited_wait(void)
|
||||||
tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
|
tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
j = READ_ONCE(jiffies_till_first_fqs);
|
||||||
|
if (synchronize_rcu_expedited_wait_once(j + HZ))
|
||||||
|
return;
|
||||||
|
WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_RT));
|
||||||
}
|
}
|
||||||
|
|
||||||
for (;;) {
|
for (;;) {
|
||||||
if (synchronize_rcu_expedited_wait_once(jiffies_stall))
|
if (synchronize_rcu_expedited_wait_once(jiffies_stall))
|
||||||
return;
|
return;
|
||||||
if (rcu_cpu_stall_suppress)
|
if (rcu_stall_is_suppressed())
|
||||||
continue;
|
continue;
|
||||||
panic_on_rcu_stall();
|
panic_on_rcu_stall();
|
||||||
pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
|
pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
|
||||||
|
@ -589,7 +594,7 @@ static void rcu_exp_wait_wake(unsigned long s)
|
||||||
spin_lock(&rnp->exp_lock);
|
spin_lock(&rnp->exp_lock);
|
||||||
/* Recheck, avoid hang in case someone just arrived. */
|
/* Recheck, avoid hang in case someone just arrived. */
|
||||||
if (ULONG_CMP_LT(rnp->exp_seq_rq, s))
|
if (ULONG_CMP_LT(rnp->exp_seq_rq, s))
|
||||||
rnp->exp_seq_rq = s;
|
WRITE_ONCE(rnp->exp_seq_rq, s);
|
||||||
spin_unlock(&rnp->exp_lock);
|
spin_unlock(&rnp->exp_lock);
|
||||||
}
|
}
|
||||||
smp_mb(); /* All above changes before wakeup. */
|
smp_mb(); /* All above changes before wakeup. */
|
||||||
|
|
|
@ -56,6 +56,8 @@ static void __init rcu_bootup_announce_oddness(void)
|
||||||
pr_info("\tBoot-time adjustment of callback high-water mark to %ld.\n", qhimark);
|
pr_info("\tBoot-time adjustment of callback high-water mark to %ld.\n", qhimark);
|
||||||
if (qlowmark != DEFAULT_RCU_QLOMARK)
|
if (qlowmark != DEFAULT_RCU_QLOMARK)
|
||||||
pr_info("\tBoot-time adjustment of callback low-water mark to %ld.\n", qlowmark);
|
pr_info("\tBoot-time adjustment of callback low-water mark to %ld.\n", qlowmark);
|
||||||
|
if (qovld != DEFAULT_RCU_QOVLD)
|
||||||
|
pr_info("\tBoot-time adjustment of callback overload level to %ld.\n", qovld);
|
||||||
if (jiffies_till_first_fqs != ULONG_MAX)
|
if (jiffies_till_first_fqs != ULONG_MAX)
|
||||||
pr_info("\tBoot-time adjustment of first FQS scan delay to %ld jiffies.\n", jiffies_till_first_fqs);
|
pr_info("\tBoot-time adjustment of first FQS scan delay to %ld jiffies.\n", jiffies_till_first_fqs);
|
||||||
if (jiffies_till_next_fqs != ULONG_MAX)
|
if (jiffies_till_next_fqs != ULONG_MAX)
|
||||||
|
@ -753,7 +755,7 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
|
||||||
raw_lockdep_assert_held_rcu_node(rnp);
|
raw_lockdep_assert_held_rcu_node(rnp);
|
||||||
pr_info("%s: grp: %d-%d level: %d ->gp_seq %ld ->completedqs %ld\n",
|
pr_info("%s: grp: %d-%d level: %d ->gp_seq %ld ->completedqs %ld\n",
|
||||||
__func__, rnp->grplo, rnp->grphi, rnp->level,
|
__func__, rnp->grplo, rnp->grphi, rnp->level,
|
||||||
(long)rnp->gp_seq, (long)rnp->completedqs);
|
(long)READ_ONCE(rnp->gp_seq), (long)rnp->completedqs);
|
||||||
for (rnp1 = rnp; rnp1; rnp1 = rnp1->parent)
|
for (rnp1 = rnp; rnp1; rnp1 = rnp1->parent)
|
||||||
pr_info("%s: %d:%d ->qsmask %#lx ->qsmaskinit %#lx ->qsmaskinitnext %#lx\n",
|
pr_info("%s: %d:%d ->qsmask %#lx ->qsmaskinit %#lx ->qsmaskinitnext %#lx\n",
|
||||||
__func__, rnp1->grplo, rnp1->grphi, rnp1->qsmask, rnp1->qsmaskinit, rnp1->qsmaskinitnext);
|
__func__, rnp1->grplo, rnp1->grphi, rnp1->qsmask, rnp1->qsmaskinit, rnp1->qsmaskinitnext);
|
||||||
|
@ -1032,18 +1034,18 @@ static int rcu_boost_kthread(void *arg)
|
||||||
|
|
||||||
trace_rcu_utilization(TPS("Start boost kthread@init"));
|
trace_rcu_utilization(TPS("Start boost kthread@init"));
|
||||||
for (;;) {
|
for (;;) {
|
||||||
rnp->boost_kthread_status = RCU_KTHREAD_WAITING;
|
WRITE_ONCE(rnp->boost_kthread_status, RCU_KTHREAD_WAITING);
|
||||||
trace_rcu_utilization(TPS("End boost kthread@rcu_wait"));
|
trace_rcu_utilization(TPS("End boost kthread@rcu_wait"));
|
||||||
rcu_wait(rnp->boost_tasks || rnp->exp_tasks);
|
rcu_wait(rnp->boost_tasks || rnp->exp_tasks);
|
||||||
trace_rcu_utilization(TPS("Start boost kthread@rcu_wait"));
|
trace_rcu_utilization(TPS("Start boost kthread@rcu_wait"));
|
||||||
rnp->boost_kthread_status = RCU_KTHREAD_RUNNING;
|
WRITE_ONCE(rnp->boost_kthread_status, RCU_KTHREAD_RUNNING);
|
||||||
more2boost = rcu_boost(rnp);
|
more2boost = rcu_boost(rnp);
|
||||||
if (more2boost)
|
if (more2boost)
|
||||||
spincnt++;
|
spincnt++;
|
||||||
else
|
else
|
||||||
spincnt = 0;
|
spincnt = 0;
|
||||||
if (spincnt > 10) {
|
if (spincnt > 10) {
|
||||||
rnp->boost_kthread_status = RCU_KTHREAD_YIELDING;
|
WRITE_ONCE(rnp->boost_kthread_status, RCU_KTHREAD_YIELDING);
|
||||||
trace_rcu_utilization(TPS("End boost kthread@rcu_yield"));
|
trace_rcu_utilization(TPS("End boost kthread@rcu_yield"));
|
||||||
schedule_timeout_interruptible(2);
|
schedule_timeout_interruptible(2);
|
||||||
trace_rcu_utilization(TPS("Start boost kthread@rcu_yield"));
|
trace_rcu_utilization(TPS("Start boost kthread@rcu_yield"));
|
||||||
|
@ -1077,12 +1079,12 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
|
||||||
(rnp->gp_tasks != NULL &&
|
(rnp->gp_tasks != NULL &&
|
||||||
rnp->boost_tasks == NULL &&
|
rnp->boost_tasks == NULL &&
|
||||||
rnp->qsmask == 0 &&
|
rnp->qsmask == 0 &&
|
||||||
ULONG_CMP_GE(jiffies, rnp->boost_time))) {
|
(ULONG_CMP_GE(jiffies, rnp->boost_time) || rcu_state.cbovld))) {
|
||||||
if (rnp->exp_tasks == NULL)
|
if (rnp->exp_tasks == NULL)
|
||||||
rnp->boost_tasks = rnp->gp_tasks;
|
rnp->boost_tasks = rnp->gp_tasks;
|
||||||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||||
rcu_wake_cond(rnp->boost_kthread_task,
|
rcu_wake_cond(rnp->boost_kthread_task,
|
||||||
rnp->boost_kthread_status);
|
READ_ONCE(rnp->boost_kthread_status));
|
||||||
} else {
|
} else {
|
||||||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||||
}
|
}
|
||||||
|
@ -1486,6 +1488,7 @@ module_param(nocb_nobypass_lim_per_jiffy, int, 0);
|
||||||
* flag the contention.
|
* flag the contention.
|
||||||
*/
|
*/
|
||||||
static void rcu_nocb_bypass_lock(struct rcu_data *rdp)
|
static void rcu_nocb_bypass_lock(struct rcu_data *rdp)
|
||||||
|
__acquires(&rdp->nocb_bypass_lock)
|
||||||
{
|
{
|
||||||
lockdep_assert_irqs_disabled();
|
lockdep_assert_irqs_disabled();
|
||||||
if (raw_spin_trylock(&rdp->nocb_bypass_lock))
|
if (raw_spin_trylock(&rdp->nocb_bypass_lock))
|
||||||
|
@ -1529,6 +1532,7 @@ static bool rcu_nocb_bypass_trylock(struct rcu_data *rdp)
|
||||||
* Release the specified rcu_data structure's ->nocb_bypass_lock.
|
* Release the specified rcu_data structure's ->nocb_bypass_lock.
|
||||||
*/
|
*/
|
||||||
static void rcu_nocb_bypass_unlock(struct rcu_data *rdp)
|
static void rcu_nocb_bypass_unlock(struct rcu_data *rdp)
|
||||||
|
__releases(&rdp->nocb_bypass_lock)
|
||||||
{
|
{
|
||||||
lockdep_assert_irqs_disabled();
|
lockdep_assert_irqs_disabled();
|
||||||
raw_spin_unlock(&rdp->nocb_bypass_lock);
|
raw_spin_unlock(&rdp->nocb_bypass_lock);
|
||||||
|
@ -1577,8 +1581,7 @@ static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
|
||||||
static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp)
|
static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp)
|
||||||
{
|
{
|
||||||
lockdep_assert_irqs_disabled();
|
lockdep_assert_irqs_disabled();
|
||||||
if (rcu_segcblist_is_offloaded(&rdp->cblist) &&
|
if (rcu_segcblist_is_offloaded(&rdp->cblist))
|
||||||
cpu_online(rdp->cpu))
|
|
||||||
lockdep_assert_held(&rdp->nocb_lock);
|
lockdep_assert_held(&rdp->nocb_lock);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -1930,6 +1933,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
|
||||||
struct rcu_data *rdp;
|
struct rcu_data *rdp;
|
||||||
struct rcu_node *rnp;
|
struct rcu_node *rnp;
|
||||||
unsigned long wait_gp_seq = 0; // Suppress "use uninitialized" warning.
|
unsigned long wait_gp_seq = 0; // Suppress "use uninitialized" warning.
|
||||||
|
bool wasempty = false;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Each pass through the following loop checks for CBs and for the
|
* Each pass through the following loop checks for CBs and for the
|
||||||
|
@ -1969,10 +1973,13 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
|
||||||
rcu_seq_done(&rnp->gp_seq, cur_gp_seq))) {
|
rcu_seq_done(&rnp->gp_seq, cur_gp_seq))) {
|
||||||
raw_spin_lock_rcu_node(rnp); /* irqs disabled. */
|
raw_spin_lock_rcu_node(rnp); /* irqs disabled. */
|
||||||
needwake_gp = rcu_advance_cbs(rnp, rdp);
|
needwake_gp = rcu_advance_cbs(rnp, rdp);
|
||||||
|
wasempty = rcu_segcblist_restempty(&rdp->cblist,
|
||||||
|
RCU_NEXT_READY_TAIL);
|
||||||
raw_spin_unlock_rcu_node(rnp); /* irqs disabled. */
|
raw_spin_unlock_rcu_node(rnp); /* irqs disabled. */
|
||||||
}
|
}
|
||||||
// Need to wait on some grace period?
|
// Need to wait on some grace period?
|
||||||
WARN_ON_ONCE(!rcu_segcblist_restempty(&rdp->cblist,
|
WARN_ON_ONCE(wasempty &&
|
||||||
|
!rcu_segcblist_restempty(&rdp->cblist,
|
||||||
RCU_NEXT_READY_TAIL));
|
RCU_NEXT_READY_TAIL));
|
||||||
if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq)) {
|
if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq)) {
|
||||||
if (!needwait_gp ||
|
if (!needwait_gp ||
|
||||||
|
|
|
@ -102,7 +102,7 @@ static void record_gp_stall_check_time(void)
|
||||||
unsigned long j = jiffies;
|
unsigned long j = jiffies;
|
||||||
unsigned long j1;
|
unsigned long j1;
|
||||||
|
|
||||||
rcu_state.gp_start = j;
|
WRITE_ONCE(rcu_state.gp_start, j);
|
||||||
j1 = rcu_jiffies_till_stall_check();
|
j1 = rcu_jiffies_till_stall_check();
|
||||||
/* Record ->gp_start before ->jiffies_stall. */
|
/* Record ->gp_start before ->jiffies_stall. */
|
||||||
smp_store_release(&rcu_state.jiffies_stall, j + j1); /* ^^^ */
|
smp_store_release(&rcu_state.jiffies_stall, j + j1); /* ^^^ */
|
||||||
|
@ -383,7 +383,7 @@ static void print_other_cpu_stall(unsigned long gp_seq)
|
||||||
|
|
||||||
/* Kick and suppress, if so configured. */
|
/* Kick and suppress, if so configured. */
|
||||||
rcu_stall_kick_kthreads();
|
rcu_stall_kick_kthreads();
|
||||||
if (rcu_cpu_stall_suppress)
|
if (rcu_stall_is_suppressed())
|
||||||
return;
|
return;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -452,7 +452,7 @@ static void print_cpu_stall(void)
|
||||||
|
|
||||||
/* Kick and suppress, if so configured. */
|
/* Kick and suppress, if so configured. */
|
||||||
rcu_stall_kick_kthreads();
|
rcu_stall_kick_kthreads();
|
||||||
if (rcu_cpu_stall_suppress)
|
if (rcu_stall_is_suppressed())
|
||||||
return;
|
return;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -504,7 +504,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
|
||||||
unsigned long js;
|
unsigned long js;
|
||||||
struct rcu_node *rnp;
|
struct rcu_node *rnp;
|
||||||
|
|
||||||
if ((rcu_cpu_stall_suppress && !rcu_kick_kthreads) ||
|
if ((rcu_stall_is_suppressed() && !rcu_kick_kthreads) ||
|
||||||
!rcu_gp_in_progress())
|
!rcu_gp_in_progress())
|
||||||
return;
|
return;
|
||||||
rcu_stall_kick_kthreads();
|
rcu_stall_kick_kthreads();
|
||||||
|
@ -578,6 +578,7 @@ void show_rcu_gp_kthreads(void)
|
||||||
unsigned long jw;
|
unsigned long jw;
|
||||||
struct rcu_data *rdp;
|
struct rcu_data *rdp;
|
||||||
struct rcu_node *rnp;
|
struct rcu_node *rnp;
|
||||||
|
struct task_struct *t = READ_ONCE(rcu_state.gp_kthread);
|
||||||
|
|
||||||
j = jiffies;
|
j = jiffies;
|
||||||
ja = j - READ_ONCE(rcu_state.gp_activity);
|
ja = j - READ_ONCE(rcu_state.gp_activity);
|
||||||
|
@ -585,28 +586,28 @@ void show_rcu_gp_kthreads(void)
|
||||||
jw = j - READ_ONCE(rcu_state.gp_wake_time);
|
jw = j - READ_ONCE(rcu_state.gp_wake_time);
|
||||||
pr_info("%s: wait state: %s(%d) ->state: %#lx delta ->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld ->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
|
pr_info("%s: wait state: %s(%d) ->state: %#lx delta ->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld ->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
|
||||||
rcu_state.name, gp_state_getname(rcu_state.gp_state),
|
rcu_state.name, gp_state_getname(rcu_state.gp_state),
|
||||||
rcu_state.gp_state,
|
rcu_state.gp_state, t ? t->state : 0x1ffffL,
|
||||||
rcu_state.gp_kthread ? rcu_state.gp_kthread->state : 0x1ffffL,
|
|
||||||
ja, jr, jw, (long)READ_ONCE(rcu_state.gp_wake_seq),
|
ja, jr, jw, (long)READ_ONCE(rcu_state.gp_wake_seq),
|
||||||
(long)READ_ONCE(rcu_state.gp_seq),
|
(long)READ_ONCE(rcu_state.gp_seq),
|
||||||
(long)READ_ONCE(rcu_get_root()->gp_seq_needed),
|
(long)READ_ONCE(rcu_get_root()->gp_seq_needed),
|
||||||
READ_ONCE(rcu_state.gp_flags));
|
READ_ONCE(rcu_state.gp_flags));
|
||||||
rcu_for_each_node_breadth_first(rnp) {
|
rcu_for_each_node_breadth_first(rnp) {
|
||||||
if (ULONG_CMP_GE(rcu_state.gp_seq, rnp->gp_seq_needed))
|
if (ULONG_CMP_GE(READ_ONCE(rcu_state.gp_seq),
|
||||||
|
READ_ONCE(rnp->gp_seq_needed)))
|
||||||
continue;
|
continue;
|
||||||
pr_info("\trcu_node %d:%d ->gp_seq %ld ->gp_seq_needed %ld\n",
|
pr_info("\trcu_node %d:%d ->gp_seq %ld ->gp_seq_needed %ld\n",
|
||||||
rnp->grplo, rnp->grphi, (long)rnp->gp_seq,
|
rnp->grplo, rnp->grphi, (long)READ_ONCE(rnp->gp_seq),
|
||||||
(long)rnp->gp_seq_needed);
|
(long)READ_ONCE(rnp->gp_seq_needed));
|
||||||
if (!rcu_is_leaf_node(rnp))
|
if (!rcu_is_leaf_node(rnp))
|
||||||
continue;
|
continue;
|
||||||
for_each_leaf_node_possible_cpu(rnp, cpu) {
|
for_each_leaf_node_possible_cpu(rnp, cpu) {
|
||||||
rdp = per_cpu_ptr(&rcu_data, cpu);
|
rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||||
if (rdp->gpwrap ||
|
if (READ_ONCE(rdp->gpwrap) ||
|
||||||
ULONG_CMP_GE(rcu_state.gp_seq,
|
ULONG_CMP_GE(READ_ONCE(rcu_state.gp_seq),
|
||||||
rdp->gp_seq_needed))
|
READ_ONCE(rdp->gp_seq_needed)))
|
||||||
continue;
|
continue;
|
||||||
pr_info("\tcpu %d ->gp_seq_needed %ld\n",
|
pr_info("\tcpu %d ->gp_seq_needed %ld\n",
|
||||||
cpu, (long)rdp->gp_seq_needed);
|
cpu, (long)READ_ONCE(rdp->gp_seq_needed));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
for_each_possible_cpu(cpu) {
|
for_each_possible_cpu(cpu) {
|
||||||
|
@ -631,7 +632,9 @@ static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
|
||||||
static atomic_t warned = ATOMIC_INIT(0);
|
static atomic_t warned = ATOMIC_INIT(0);
|
||||||
|
|
||||||
if (!IS_ENABLED(CONFIG_PROVE_RCU) || rcu_gp_in_progress() ||
|
if (!IS_ENABLED(CONFIG_PROVE_RCU) || rcu_gp_in_progress() ||
|
||||||
ULONG_CMP_GE(rnp_root->gp_seq, rnp_root->gp_seq_needed))
|
ULONG_CMP_GE(READ_ONCE(rnp_root->gp_seq),
|
||||||
|
READ_ONCE(rnp_root->gp_seq_needed)) ||
|
||||||
|
!smp_load_acquire(&rcu_state.gp_kthread)) // Get stable kthread.
|
||||||
return;
|
return;
|
||||||
j = jiffies; /* Expensive access, and in common case don't get here. */
|
j = jiffies; /* Expensive access, and in common case don't get here. */
|
||||||
if (time_before(j, READ_ONCE(rcu_state.gp_req_activity) + gpssdelay) ||
|
if (time_before(j, READ_ONCE(rcu_state.gp_req_activity) + gpssdelay) ||
|
||||||
|
@ -642,7 +645,8 @@ static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
|
||||||
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
||||||
j = jiffies;
|
j = jiffies;
|
||||||
if (rcu_gp_in_progress() ||
|
if (rcu_gp_in_progress() ||
|
||||||
ULONG_CMP_GE(rnp_root->gp_seq, rnp_root->gp_seq_needed) ||
|
ULONG_CMP_GE(READ_ONCE(rnp_root->gp_seq),
|
||||||
|
READ_ONCE(rnp_root->gp_seq_needed)) ||
|
||||||
time_before(j, READ_ONCE(rcu_state.gp_req_activity) + gpssdelay) ||
|
time_before(j, READ_ONCE(rcu_state.gp_req_activity) + gpssdelay) ||
|
||||||
time_before(j, READ_ONCE(rcu_state.gp_activity) + gpssdelay) ||
|
time_before(j, READ_ONCE(rcu_state.gp_activity) + gpssdelay) ||
|
||||||
atomic_read(&warned)) {
|
atomic_read(&warned)) {
|
||||||
|
@ -655,9 +659,10 @@ static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
|
||||||
raw_spin_lock_rcu_node(rnp_root); /* irqs already disabled. */
|
raw_spin_lock_rcu_node(rnp_root); /* irqs already disabled. */
|
||||||
j = jiffies;
|
j = jiffies;
|
||||||
if (rcu_gp_in_progress() ||
|
if (rcu_gp_in_progress() ||
|
||||||
ULONG_CMP_GE(rnp_root->gp_seq, rnp_root->gp_seq_needed) ||
|
ULONG_CMP_GE(READ_ONCE(rnp_root->gp_seq),
|
||||||
time_before(j, rcu_state.gp_req_activity + gpssdelay) ||
|
READ_ONCE(rnp_root->gp_seq_needed)) ||
|
||||||
time_before(j, rcu_state.gp_activity + gpssdelay) ||
|
time_before(j, READ_ONCE(rcu_state.gp_req_activity) + gpssdelay) ||
|
||||||
|
time_before(j, READ_ONCE(rcu_state.gp_activity) + gpssdelay) ||
|
||||||
atomic_xchg(&warned, 1)) {
|
atomic_xchg(&warned, 1)) {
|
||||||
if (rnp_root != rnp)
|
if (rnp_root != rnp)
|
||||||
/* irqs remain disabled. */
|
/* irqs remain disabled. */
|
||||||
|
|
|
@ -183,6 +183,8 @@ void rcu_unexpedite_gp(void)
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(rcu_unexpedite_gp);
|
EXPORT_SYMBOL_GPL(rcu_unexpedite_gp);
|
||||||
|
|
||||||
|
static bool rcu_boot_ended __read_mostly;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Inform RCU of the end of the in-kernel boot sequence.
|
* Inform RCU of the end of the in-kernel boot sequence.
|
||||||
*/
|
*/
|
||||||
|
@ -191,8 +193,18 @@ void rcu_end_inkernel_boot(void)
|
||||||
rcu_unexpedite_gp();
|
rcu_unexpedite_gp();
|
||||||
if (rcu_normal_after_boot)
|
if (rcu_normal_after_boot)
|
||||||
WRITE_ONCE(rcu_normal, 1);
|
WRITE_ONCE(rcu_normal, 1);
|
||||||
|
rcu_boot_ended = 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Let rcutorture know when it is OK to turn it up to eleven.
|
||||||
|
*/
|
||||||
|
bool rcu_inkernel_boot_has_ended(void)
|
||||||
|
{
|
||||||
|
return rcu_boot_ended;
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(rcu_inkernel_boot_has_ended);
|
||||||
|
|
||||||
#endif /* #ifndef CONFIG_TINY_RCU */
|
#endif /* #ifndef CONFIG_TINY_RCU */
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -464,13 +476,19 @@ EXPORT_SYMBOL_GPL(rcutorture_sched_setaffinity);
|
||||||
#ifdef CONFIG_RCU_STALL_COMMON
|
#ifdef CONFIG_RCU_STALL_COMMON
|
||||||
int rcu_cpu_stall_ftrace_dump __read_mostly;
|
int rcu_cpu_stall_ftrace_dump __read_mostly;
|
||||||
module_param(rcu_cpu_stall_ftrace_dump, int, 0644);
|
module_param(rcu_cpu_stall_ftrace_dump, int, 0644);
|
||||||
int rcu_cpu_stall_suppress __read_mostly; /* 1 = suppress stall warnings. */
|
int rcu_cpu_stall_suppress __read_mostly; // !0 = suppress stall warnings.
|
||||||
EXPORT_SYMBOL_GPL(rcu_cpu_stall_suppress);
|
EXPORT_SYMBOL_GPL(rcu_cpu_stall_suppress);
|
||||||
module_param(rcu_cpu_stall_suppress, int, 0644);
|
module_param(rcu_cpu_stall_suppress, int, 0644);
|
||||||
int rcu_cpu_stall_timeout __read_mostly = CONFIG_RCU_CPU_STALL_TIMEOUT;
|
int rcu_cpu_stall_timeout __read_mostly = CONFIG_RCU_CPU_STALL_TIMEOUT;
|
||||||
module_param(rcu_cpu_stall_timeout, int, 0644);
|
module_param(rcu_cpu_stall_timeout, int, 0644);
|
||||||
#endif /* #ifdef CONFIG_RCU_STALL_COMMON */
|
#endif /* #ifdef CONFIG_RCU_STALL_COMMON */
|
||||||
|
|
||||||
|
// Suppress boot-time RCU CPU stall warnings and rcutorture writer stall
|
||||||
|
// warnings. Also used by rcutorture even if stall warnings are excluded.
|
||||||
|
int rcu_cpu_stall_suppress_at_boot __read_mostly; // !0 = suppress boot stalls.
|
||||||
|
EXPORT_SYMBOL_GPL(rcu_cpu_stall_suppress_at_boot);
|
||||||
|
module_param(rcu_cpu_stall_suppress_at_boot, int, 0444);
|
||||||
|
|
||||||
#ifdef CONFIG_TASKS_RCU
|
#ifdef CONFIG_TASKS_RCU
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -528,7 +546,7 @@ void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func)
|
||||||
rhp->func = func;
|
rhp->func = func;
|
||||||
raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags);
|
raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags);
|
||||||
needwake = !rcu_tasks_cbs_head;
|
needwake = !rcu_tasks_cbs_head;
|
||||||
*rcu_tasks_cbs_tail = rhp;
|
WRITE_ONCE(*rcu_tasks_cbs_tail, rhp);
|
||||||
rcu_tasks_cbs_tail = &rhp->next;
|
rcu_tasks_cbs_tail = &rhp->next;
|
||||||
raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
|
raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
|
||||||
/* We can't create the thread unless interrupts are enabled. */
|
/* We can't create the thread unless interrupts are enabled. */
|
||||||
|
@ -658,7 +676,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
|
||||||
/* If there were none, wait a bit and start over. */
|
/* If there were none, wait a bit and start over. */
|
||||||
if (!list) {
|
if (!list) {
|
||||||
wait_event_interruptible(rcu_tasks_cbs_wq,
|
wait_event_interruptible(rcu_tasks_cbs_wq,
|
||||||
rcu_tasks_cbs_head);
|
READ_ONCE(rcu_tasks_cbs_head));
|
||||||
if (!rcu_tasks_cbs_head) {
|
if (!rcu_tasks_cbs_head) {
|
||||||
WARN_ON(signal_pending(current));
|
WARN_ON(signal_pending(current));
|
||||||
schedule_timeout_interruptible(HZ/10);
|
schedule_timeout_interruptible(HZ/10);
|
||||||
|
@ -801,7 +819,7 @@ static int __init rcu_spawn_tasks_kthread(void)
|
||||||
core_initcall(rcu_spawn_tasks_kthread);
|
core_initcall(rcu_spawn_tasks_kthread);
|
||||||
|
|
||||||
/* Do the srcu_read_lock() for the above synchronize_srcu(). */
|
/* Do the srcu_read_lock() for the above synchronize_srcu(). */
|
||||||
void exit_tasks_rcu_start(void)
|
void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
|
||||||
{
|
{
|
||||||
preempt_disable();
|
preempt_disable();
|
||||||
current->rcu_tasks_idx = __srcu_read_lock(&tasks_rcu_exit_srcu);
|
current->rcu_tasks_idx = __srcu_read_lock(&tasks_rcu_exit_srcu);
|
||||||
|
@ -809,7 +827,7 @@ void exit_tasks_rcu_start(void)
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Do the srcu_read_unlock() for the above synchronize_srcu(). */
|
/* Do the srcu_read_unlock() for the above synchronize_srcu(). */
|
||||||
void exit_tasks_rcu_finish(void)
|
void exit_tasks_rcu_finish(void) __releases(&tasks_rcu_exit_srcu)
|
||||||
{
|
{
|
||||||
preempt_disable();
|
preempt_disable();
|
||||||
__srcu_read_unlock(&tasks_rcu_exit_srcu, current->rcu_tasks_idx);
|
__srcu_read_unlock(&tasks_rcu_exit_srcu, current->rcu_tasks_idx);
|
||||||
|
|
|
@ -944,6 +944,7 @@ static struct timer_base *lock_timer_base(struct timer_list *timer,
|
||||||
|
|
||||||
#define MOD_TIMER_PENDING_ONLY 0x01
|
#define MOD_TIMER_PENDING_ONLY 0x01
|
||||||
#define MOD_TIMER_REDUCE 0x02
|
#define MOD_TIMER_REDUCE 0x02
|
||||||
|
#define MOD_TIMER_NOTPENDING 0x04
|
||||||
|
|
||||||
static inline int
|
static inline int
|
||||||
__mod_timer(struct timer_list *timer, unsigned long expires, unsigned int options)
|
__mod_timer(struct timer_list *timer, unsigned long expires, unsigned int options)
|
||||||
|
@ -960,7 +961,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option
|
||||||
* the timer is re-modified to have the same timeout or ends up in the
|
* the timer is re-modified to have the same timeout or ends up in the
|
||||||
* same array bucket then just return:
|
* same array bucket then just return:
|
||||||
*/
|
*/
|
||||||
if (timer_pending(timer)) {
|
if (!(options & MOD_TIMER_NOTPENDING) && timer_pending(timer)) {
|
||||||
/*
|
/*
|
||||||
* The downside of this optimization is that it can result in
|
* The downside of this optimization is that it can result in
|
||||||
* larger granularity than you would get from adding a new
|
* larger granularity than you would get from adding a new
|
||||||
|
@ -1133,7 +1134,7 @@ EXPORT_SYMBOL(timer_reduce);
|
||||||
void add_timer(struct timer_list *timer)
|
void add_timer(struct timer_list *timer)
|
||||||
{
|
{
|
||||||
BUG_ON(timer_pending(timer));
|
BUG_ON(timer_pending(timer));
|
||||||
mod_timer(timer, timer->expires);
|
__mod_timer(timer, timer->expires, MOD_TIMER_NOTPENDING);
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL(add_timer);
|
EXPORT_SYMBOL(add_timer);
|
||||||
|
|
||||||
|
@ -1891,7 +1892,7 @@ signed long __sched schedule_timeout(signed long timeout)
|
||||||
|
|
||||||
timer.task = current;
|
timer.task = current;
|
||||||
timer_setup_on_stack(&timer.timer, process_timeout, 0);
|
timer_setup_on_stack(&timer.timer, process_timeout, 0);
|
||||||
__mod_timer(&timer.timer, expire, 0);
|
__mod_timer(&timer.timer, expire, MOD_TIMER_NOTPENDING);
|
||||||
schedule();
|
schedule();
|
||||||
del_singleshot_timer_sync(&timer.timer);
|
del_singleshot_timer_sync(&timer.timer);
|
||||||
|
|
||||||
|
|
|
@ -42,6 +42,9 @@
|
||||||
MODULE_LICENSE("GPL");
|
MODULE_LICENSE("GPL");
|
||||||
MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>");
|
MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>");
|
||||||
|
|
||||||
|
static bool disable_onoff_at_boot;
|
||||||
|
module_param(disable_onoff_at_boot, bool, 0444);
|
||||||
|
|
||||||
static char *torture_type;
|
static char *torture_type;
|
||||||
static int verbose;
|
static int verbose;
|
||||||
|
|
||||||
|
@ -84,6 +87,7 @@ bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes,
|
||||||
{
|
{
|
||||||
unsigned long delta;
|
unsigned long delta;
|
||||||
int ret;
|
int ret;
|
||||||
|
char *s;
|
||||||
unsigned long starttime;
|
unsigned long starttime;
|
||||||
|
|
||||||
if (!cpu_online(cpu) || !cpu_is_hotpluggable(cpu))
|
if (!cpu_online(cpu) || !cpu_is_hotpluggable(cpu))
|
||||||
|
@ -99,10 +103,16 @@ bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes,
|
||||||
(*n_offl_attempts)++;
|
(*n_offl_attempts)++;
|
||||||
ret = cpu_down(cpu);
|
ret = cpu_down(cpu);
|
||||||
if (ret) {
|
if (ret) {
|
||||||
|
s = "";
|
||||||
|
if (!rcu_inkernel_boot_has_ended() && ret == -EBUSY) {
|
||||||
|
// PCI probe frequently disables hotplug during boot.
|
||||||
|
(*n_offl_attempts)--;
|
||||||
|
s = " (-EBUSY forgiven during boot)";
|
||||||
|
}
|
||||||
if (verbose)
|
if (verbose)
|
||||||
pr_alert("%s" TORTURE_FLAG
|
pr_alert("%s" TORTURE_FLAG
|
||||||
"torture_onoff task: offline %d failed: errno %d\n",
|
"torture_onoff task: offline %d failed%s: errno %d\n",
|
||||||
torture_type, cpu, ret);
|
torture_type, cpu, s, ret);
|
||||||
} else {
|
} else {
|
||||||
if (verbose > 1)
|
if (verbose > 1)
|
||||||
pr_alert("%s" TORTURE_FLAG
|
pr_alert("%s" TORTURE_FLAG
|
||||||
|
@ -137,6 +147,7 @@ bool torture_online(int cpu, long *n_onl_attempts, long *n_onl_successes,
|
||||||
{
|
{
|
||||||
unsigned long delta;
|
unsigned long delta;
|
||||||
int ret;
|
int ret;
|
||||||
|
char *s;
|
||||||
unsigned long starttime;
|
unsigned long starttime;
|
||||||
|
|
||||||
if (cpu_online(cpu) || !cpu_is_hotpluggable(cpu))
|
if (cpu_online(cpu) || !cpu_is_hotpluggable(cpu))
|
||||||
|
@ -150,10 +161,16 @@ bool torture_online(int cpu, long *n_onl_attempts, long *n_onl_successes,
|
||||||
(*n_onl_attempts)++;
|
(*n_onl_attempts)++;
|
||||||
ret = cpu_up(cpu);
|
ret = cpu_up(cpu);
|
||||||
if (ret) {
|
if (ret) {
|
||||||
|
s = "";
|
||||||
|
if (!rcu_inkernel_boot_has_ended() && ret == -EBUSY) {
|
||||||
|
// PCI probe frequently disables hotplug during boot.
|
||||||
|
(*n_onl_attempts)--;
|
||||||
|
s = " (-EBUSY forgiven during boot)";
|
||||||
|
}
|
||||||
if (verbose)
|
if (verbose)
|
||||||
pr_alert("%s" TORTURE_FLAG
|
pr_alert("%s" TORTURE_FLAG
|
||||||
"torture_onoff task: online %d failed: errno %d\n",
|
"torture_onoff task: online %d failed%s: errno %d\n",
|
||||||
torture_type, cpu, ret);
|
torture_type, cpu, s, ret);
|
||||||
} else {
|
} else {
|
||||||
if (verbose > 1)
|
if (verbose > 1)
|
||||||
pr_alert("%s" TORTURE_FLAG
|
pr_alert("%s" TORTURE_FLAG
|
||||||
|
@ -215,6 +232,10 @@ torture_onoff(void *arg)
|
||||||
VERBOSE_TOROUT_STRING("torture_onoff end holdoff");
|
VERBOSE_TOROUT_STRING("torture_onoff end holdoff");
|
||||||
}
|
}
|
||||||
while (!torture_must_stop()) {
|
while (!torture_must_stop()) {
|
||||||
|
if (disable_onoff_at_boot && !rcu_inkernel_boot_has_ended()) {
|
||||||
|
schedule_timeout_interruptible(HZ / 10);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
cpu = (torture_random(&rand) >> 4) % (maxcpu + 1);
|
cpu = (torture_random(&rand) >> 4) % (maxcpu + 1);
|
||||||
if (!torture_offline(cpu,
|
if (!torture_offline(cpu,
|
||||||
&n_offline_attempts, &n_offline_successes,
|
&n_offline_attempts, &n_offline_successes,
|
||||||
|
|
|
@ -12,7 +12,7 @@
|
||||||
# Returns 1 if the specified boot-parameter string tells rcutorture to
|
# Returns 1 if the specified boot-parameter string tells rcutorture to
|
||||||
# test CPU-hotplug operations.
|
# test CPU-hotplug operations.
|
||||||
bootparam_hotplug_cpu () {
|
bootparam_hotplug_cpu () {
|
||||||
echo "$1" | grep -q "rcutorture\.onoff_"
|
echo "$1" | grep -q "torture\.onoff_"
|
||||||
}
|
}
|
||||||
|
|
||||||
# checkarg --argname argtype $# arg mustmatch cannotmatch
|
# checkarg --argname argtype $# arg mustmatch cannotmatch
|
||||||
|
|
|
@ -20,7 +20,9 @@
|
||||||
rundir="${1}"
|
rundir="${1}"
|
||||||
if test -z "$rundir" -o ! -d "$rundir"
|
if test -z "$rundir" -o ! -d "$rundir"
|
||||||
then
|
then
|
||||||
|
echo Directory "$rundir" not found.
|
||||||
echo Usage: $0 directory
|
echo Usage: $0 directory
|
||||||
|
exit 1
|
||||||
fi
|
fi
|
||||||
editor=${EDITOR-vi}
|
editor=${EDITOR-vi}
|
||||||
|
|
||||||
|
|
|
@ -13,6 +13,9 @@
|
||||||
#
|
#
|
||||||
# Authors: Paul E. McKenney <paulmck@linux.ibm.com>
|
# Authors: Paul E. McKenney <paulmck@linux.ibm.com>
|
||||||
|
|
||||||
|
T=/tmp/kvm-recheck.sh.$$
|
||||||
|
trap 'rm -f $T' 0 2
|
||||||
|
|
||||||
PATH=`pwd`/tools/testing/selftests/rcutorture/bin:$PATH; export PATH
|
PATH=`pwd`/tools/testing/selftests/rcutorture/bin:$PATH; export PATH
|
||||||
. functions.sh
|
. functions.sh
|
||||||
for rd in "$@"
|
for rd in "$@"
|
||||||
|
@ -68,4 +71,16 @@ do
|
||||||
fi
|
fi
|
||||||
done
|
done
|
||||||
done
|
done
|
||||||
EDITOR=echo kvm-find-errors.sh "${@: -1}" > /dev/null 2>&1
|
EDITOR=echo kvm-find-errors.sh "${@: -1}" > $T 2>&1
|
||||||
|
ret=$?
|
||||||
|
builderrors="`tr ' ' '\012' < $T | grep -c '/Make.out.diags'`"
|
||||||
|
if test "$builderrors" -gt 0
|
||||||
|
then
|
||||||
|
echo $builderrors runs with build errors.
|
||||||
|
fi
|
||||||
|
runerrors="`tr ' ' '\012' < $T | grep -c '/console.log.diags'`"
|
||||||
|
if test "$runerrors" -gt 0
|
||||||
|
then
|
||||||
|
echo $runerrors runs with runtime errors.
|
||||||
|
fi
|
||||||
|
exit $ret
|
||||||
|
|
|
@ -39,7 +39,7 @@ TORTURE_TRUST_MAKE=""
|
||||||
resdir=""
|
resdir=""
|
||||||
configs=""
|
configs=""
|
||||||
cpus=0
|
cpus=0
|
||||||
ds=`date +%Y.%m.%d-%H:%M:%S`
|
ds=`date +%Y.%m.%d-%H.%M.%S`
|
||||||
jitter="-1"
|
jitter="-1"
|
||||||
|
|
||||||
usage () {
|
usage () {
|
||||||
|
|
|
@ -3,3 +3,5 @@ CONFIG_PRINTK_TIME=y
|
||||||
CONFIG_HYPERVISOR_GUEST=y
|
CONFIG_HYPERVISOR_GUEST=y
|
||||||
CONFIG_PARAVIRT=y
|
CONFIG_PARAVIRT=y
|
||||||
CONFIG_KVM_GUEST=y
|
CONFIG_KVM_GUEST=y
|
||||||
|
CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n
|
||||||
|
CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n
|
||||||
|
|
|
@ -0,0 +1,18 @@
|
||||||
|
CONFIG_SMP=y
|
||||||
|
CONFIG_NR_CPUS=100
|
||||||
|
CONFIG_PREEMPT_NONE=y
|
||||||
|
CONFIG_PREEMPT_VOLUNTARY=n
|
||||||
|
CONFIG_PREEMPT=n
|
||||||
|
#CHECK#CONFIG_TREE_RCU=y
|
||||||
|
CONFIG_HZ_PERIODIC=n
|
||||||
|
CONFIG_NO_HZ_IDLE=y
|
||||||
|
CONFIG_NO_HZ_FULL=n
|
||||||
|
CONFIG_RCU_FAST_NO_HZ=n
|
||||||
|
CONFIG_RCU_TRACE=n
|
||||||
|
CONFIG_RCU_NOCB_CPU=n
|
||||||
|
CONFIG_DEBUG_LOCK_ALLOC=n
|
||||||
|
CONFIG_PROVE_LOCKING=n
|
||||||
|
#CHECK#CONFIG_PROVE_RCU=n
|
||||||
|
CONFIG_DEBUG_OBJECTS=n
|
||||||
|
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
|
||||||
|
CONFIG_RCU_EXPERT=n
|
Loading…
Reference in New Issue