Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney: - Expedited grace-period updates - kfree_rcu() updates - RCU list updates - Preemptible RCU updates - Torture-test updates - Miscellaneous fixes - Documentation updates Signed-off-by: Ingo Molnar <mingo@kernel.org>
This commit is contained in:
commit
f8a4bb6bfa
4
.mailmap
4
.mailmap
|
@ -209,6 +209,10 @@ Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
|
|||
Patrick Mochel <mochel@digitalimplant.org>
|
||||
Paul Burton <paulburton@kernel.org> <paul.burton@imgtec.com>
|
||||
Paul Burton <paulburton@kernel.org> <paul.burton@mips.com>
|
||||
Paul E. McKenney <paulmck@kernel.org> <paulmck@linux.ibm.com>
|
||||
Paul E. McKenney <paulmck@kernel.org> <paulmck@linux.vnet.ibm.com>
|
||||
Paul E. McKenney <paulmck@kernel.org> <paul.mckenney@linaro.org>
|
||||
Paul E. McKenney <paulmck@kernel.org> <paulmck@us.ibm.com>
|
||||
Peter A Jonsson <pj@ludd.ltu.se>
|
||||
Peter Oruba <peter@oruba.de>
|
||||
Peter Oruba <peter.oruba@amd.com>
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
.. _NMI_rcu_doc:
|
||||
|
||||
Using RCU to Protect Dynamic NMI Handlers
|
||||
=========================================
|
||||
|
||||
|
||||
Although RCU is usually used to protect read-mostly data structures,
|
||||
|
@ -9,7 +12,7 @@ work in "arch/x86/oprofile/nmi_timer_int.c" and in
|
|||
"arch/x86/kernel/traps.c".
|
||||
|
||||
The relevant pieces of code are listed below, each followed by a
|
||||
brief explanation.
|
||||
brief explanation::
|
||||
|
||||
static int dummy_nmi_callback(struct pt_regs *regs, int cpu)
|
||||
{
|
||||
|
@ -18,12 +21,12 @@ brief explanation.
|
|||
|
||||
The dummy_nmi_callback() function is a "dummy" NMI handler that does
|
||||
nothing, but returns zero, thus saying that it did nothing, allowing
|
||||
the NMI handler to take the default machine-specific action.
|
||||
the NMI handler to take the default machine-specific action::
|
||||
|
||||
static nmi_callback_t nmi_callback = dummy_nmi_callback;
|
||||
|
||||
This nmi_callback variable is a global function pointer to the current
|
||||
NMI handler.
|
||||
NMI handler::
|
||||
|
||||
void do_nmi(struct pt_regs * regs, long error_code)
|
||||
{
|
||||
|
@ -53,11 +56,12 @@ anyway. However, in practice it is a good documentation aid, particularly
|
|||
for anyone attempting to do something similar on Alpha or on systems
|
||||
with aggressive optimizing compilers.
|
||||
|
||||
Quick Quiz: Why might the rcu_dereference_sched() be necessary on Alpha,
|
||||
given that the code referenced by the pointer is read-only?
|
||||
Quick Quiz:
|
||||
Why might the rcu_dereference_sched() be necessary on Alpha, given that the code referenced by the pointer is read-only?
|
||||
|
||||
:ref:`Answer to Quick Quiz <answer_quick_quiz_NMI>`
|
||||
|
||||
Back to the discussion of NMI and RCU...
|
||||
Back to the discussion of NMI and RCU::
|
||||
|
||||
void set_nmi_callback(nmi_callback_t callback)
|
||||
{
|
||||
|
@ -68,7 +72,7 @@ The set_nmi_callback() function registers an NMI handler. Note that any
|
|||
data that is to be used by the callback must be initialized up -before-
|
||||
the call to set_nmi_callback(). On architectures that do not order
|
||||
writes, the rcu_assign_pointer() ensures that the NMI handler sees the
|
||||
initialized values.
|
||||
initialized values::
|
||||
|
||||
void unset_nmi_callback(void)
|
||||
{
|
||||
|
@ -82,7 +86,7 @@ up any data structures used by the old NMI handler until execution
|
|||
of it completes on all other CPUs.
|
||||
|
||||
One way to accomplish this is via synchronize_rcu(), perhaps as
|
||||
follows:
|
||||
follows::
|
||||
|
||||
unset_nmi_callback();
|
||||
synchronize_rcu();
|
||||
|
@ -98,24 +102,23 @@ to free up the handler's data as soon as synchronize_rcu() returns.
|
|||
Important note: for this to work, the architecture in question must
|
||||
invoke nmi_enter() and nmi_exit() on NMI entry and exit, respectively.
|
||||
|
||||
.. _answer_quick_quiz_NMI:
|
||||
|
||||
Answer to Quick Quiz
|
||||
Answer to Quick Quiz:
|
||||
Why might the rcu_dereference_sched() be necessary on Alpha, given that the code referenced by the pointer is read-only?
|
||||
|
||||
Why might the rcu_dereference_sched() be necessary on Alpha, given
|
||||
that the code referenced by the pointer is read-only?
|
||||
The caller to set_nmi_callback() might well have
|
||||
initialized some data that is to be used by the new NMI
|
||||
handler. In this case, the rcu_dereference_sched() would
|
||||
be needed, because otherwise a CPU that received an NMI
|
||||
just after the new handler was set might see the pointer
|
||||
to the new NMI handler, but the old pre-initialized
|
||||
version of the handler's data.
|
||||
|
||||
Answer: The caller to set_nmi_callback() might well have
|
||||
initialized some data that is to be used by the new NMI
|
||||
handler. In this case, the rcu_dereference_sched() would
|
||||
be needed, because otherwise a CPU that received an NMI
|
||||
just after the new handler was set might see the pointer
|
||||
to the new NMI handler, but the old pre-initialized
|
||||
version of the handler's data.
|
||||
This same sad story can happen on other CPUs when using
|
||||
a compiler with aggressive pointer-value speculation
|
||||
optimizations.
|
||||
|
||||
This same sad story can happen on other CPUs when using
|
||||
a compiler with aggressive pointer-value speculation
|
||||
optimizations.
|
||||
|
||||
More important, the rcu_dereference_sched() makes it
|
||||
clear to someone reading the code that the pointer is
|
||||
being protected by RCU-sched.
|
||||
More important, the rcu_dereference_sched() makes it
|
||||
clear to someone reading the code that the pointer is
|
||||
being protected by RCU-sched.
|
|
@ -1,19 +1,21 @@
|
|||
Using RCU to Protect Read-Mostly Arrays
|
||||
.. _array_rcu_doc:
|
||||
|
||||
Using RCU to Protect Read-Mostly Arrays
|
||||
=======================================
|
||||
|
||||
Although RCU is more commonly used to protect linked lists, it can
|
||||
also be used to protect arrays. Three situations are as follows:
|
||||
|
||||
1. Hash Tables
|
||||
1. :ref:`Hash Tables <hash_tables>`
|
||||
|
||||
2. Static Arrays
|
||||
2. :ref:`Static Arrays <static_arrays>`
|
||||
|
||||
3. Resizeable Arrays
|
||||
3. :ref:`Resizable Arrays <resizable_arrays>`
|
||||
|
||||
Each of these three situations involves an RCU-protected pointer to an
|
||||
array that is separately indexed. It might be tempting to consider use
|
||||
of RCU to instead protect the index into an array, however, this use
|
||||
case is -not- supported. The problem with RCU-protected indexes into
|
||||
case is **not** supported. The problem with RCU-protected indexes into
|
||||
arrays is that compilers can play way too many optimization games with
|
||||
integers, which means that the rules governing handling of these indexes
|
||||
are far more trouble than they are worth. If RCU-protected indexes into
|
||||
|
@ -24,16 +26,20 @@ to be safely used.
|
|||
That aside, each of the three RCU-protected pointer situations are
|
||||
described in the following sections.
|
||||
|
||||
.. _hash_tables:
|
||||
|
||||
Situation 1: Hash Tables
|
||||
------------------------
|
||||
|
||||
Hash tables are often implemented as an array, where each array entry
|
||||
has a linked-list hash chain. Each hash chain can be protected by RCU
|
||||
as described in the listRCU.txt document. This approach also applies
|
||||
to other array-of-list situations, such as radix trees.
|
||||
|
||||
.. _static_arrays:
|
||||
|
||||
Situation 2: Static Arrays
|
||||
--------------------------
|
||||
|
||||
Static arrays, where the data (rather than a pointer to the data) is
|
||||
located in each array element, and where the array is never resized,
|
||||
|
@ -41,13 +47,17 @@ have not been used with RCU. Rik van Riel recommends using seqlock in
|
|||
this situation, which would also have minimal read-side overhead as long
|
||||
as updates are rare.
|
||||
|
||||
Quick Quiz: Why is it so important that updates be rare when
|
||||
using seqlock?
|
||||
Quick Quiz:
|
||||
Why is it so important that updates be rare when using seqlock?
|
||||
|
||||
:ref:`Answer to Quick Quiz <answer_quick_quiz_seqlock>`
|
||||
|
||||
Situation 3: Resizeable Arrays
|
||||
.. _resizable_arrays:
|
||||
|
||||
Use of RCU for resizeable arrays is demonstrated by the grow_ary()
|
||||
Situation 3: Resizable Arrays
|
||||
------------------------------
|
||||
|
||||
Use of RCU for resizable arrays is demonstrated by the grow_ary()
|
||||
function formerly used by the System V IPC code. The array is used
|
||||
to map from semaphore, message-queue, and shared-memory IDs to the data
|
||||
structure that represents the corresponding IPC construct. The grow_ary()
|
||||
|
@ -60,7 +70,7 @@ the remainder of the new, updates the ids->entries pointer to point to
|
|||
the new array, and invokes ipc_rcu_putref() to free up the old array.
|
||||
Note that rcu_assign_pointer() is used to update the ids->entries pointer,
|
||||
which includes any memory barriers required on whatever architecture
|
||||
you are running on.
|
||||
you are running on::
|
||||
|
||||
static int grow_ary(struct ipc_ids* ids, int newsize)
|
||||
{
|
||||
|
@ -112,7 +122,7 @@ a simple check suffices. The pointer to the structure corresponding
|
|||
to the desired IPC object is placed in "out", with NULL indicating
|
||||
a non-existent entry. After acquiring "out->lock", the "out->deleted"
|
||||
flag indicates whether the IPC object is in the process of being
|
||||
deleted, and, if not, the pointer is returned.
|
||||
deleted, and, if not, the pointer is returned::
|
||||
|
||||
struct kern_ipc_perm* ipc_lock(struct ipc_ids* ids, int id)
|
||||
{
|
||||
|
@ -144,8 +154,10 @@ deleted, and, if not, the pointer is returned.
|
|||
return out;
|
||||
}
|
||||
|
||||
.. _answer_quick_quiz_seqlock:
|
||||
|
||||
Answer to Quick Quiz:
|
||||
Why is it so important that updates be rare when using seqlock?
|
||||
|
||||
The reason that it is important that updates be rare when
|
||||
using seqlock is that frequent updates can livelock readers.
|
|
@ -7,8 +7,13 @@ RCU concepts
|
|||
.. toctree::
|
||||
:maxdepth: 3
|
||||
|
||||
arrayRCU
|
||||
rcubarrier
|
||||
rcu_dereference
|
||||
whatisRCU
|
||||
rcu
|
||||
listRCU
|
||||
NMI-RCU
|
||||
UP
|
||||
|
||||
Design/Memory-Ordering/Tree-RCU-Memory-Ordering
|
||||
|
|
|
@ -99,7 +99,7 @@ With this change, the rcu_dereference() is always within an RCU
|
|||
read-side critical section, which again would have suppressed the
|
||||
above lockdep-RCU splat.
|
||||
|
||||
But in this particular case, we don't actually deference the pointer
|
||||
But in this particular case, we don't actually dereference the pointer
|
||||
returned from rcu_dereference(). Instead, that pointer is just compared
|
||||
to the cic pointer, which means that the rcu_dereference() can be replaced
|
||||
by rcu_access_pointer() as follows:
|
||||
|
|
|
@ -1,4 +1,7 @@
|
|||
.. _rcu_dereference_doc:
|
||||
|
||||
PROPER CARE AND FEEDING OF RETURN VALUES FROM rcu_dereference()
|
||||
===============================================================
|
||||
|
||||
Most of the time, you can use values from rcu_dereference() or one of
|
||||
the similar primitives without worries. Dereferencing (prefix "*"),
|
||||
|
@ -8,7 +11,7 @@ subtraction of constants, and casts all work quite naturally and safely.
|
|||
It is nevertheless possible to get into trouble with other operations.
|
||||
Follow these rules to keep your RCU code working properly:
|
||||
|
||||
o You must use one of the rcu_dereference() family of primitives
|
||||
- You must use one of the rcu_dereference() family of primitives
|
||||
to load an RCU-protected pointer, otherwise CONFIG_PROVE_RCU
|
||||
will complain. Worse yet, your code can see random memory-corruption
|
||||
bugs due to games that compilers and DEC Alpha can play.
|
||||
|
@ -25,24 +28,24 @@ o You must use one of the rcu_dereference() family of primitives
|
|||
for an example where the compiler can in fact deduce the exact
|
||||
value of the pointer, and thus cause misordering.
|
||||
|
||||
o You are only permitted to use rcu_dereference on pointer values.
|
||||
- You are only permitted to use rcu_dereference on pointer values.
|
||||
The compiler simply knows too much about integral values to
|
||||
trust it to carry dependencies through integer operations.
|
||||
There are a very few exceptions, namely that you can temporarily
|
||||
cast the pointer to uintptr_t in order to:
|
||||
|
||||
o Set bits and clear bits down in the must-be-zero low-order
|
||||
- Set bits and clear bits down in the must-be-zero low-order
|
||||
bits of that pointer. This clearly means that the pointer
|
||||
must have alignment constraints, for example, this does
|
||||
-not- work in general for char* pointers.
|
||||
|
||||
o XOR bits to translate pointers, as is done in some
|
||||
- XOR bits to translate pointers, as is done in some
|
||||
classic buddy-allocator algorithms.
|
||||
|
||||
It is important to cast the value back to pointer before
|
||||
doing much of anything else with it.
|
||||
|
||||
o Avoid cancellation when using the "+" and "-" infix arithmetic
|
||||
- Avoid cancellation when using the "+" and "-" infix arithmetic
|
||||
operators. For example, for a given variable "x", avoid
|
||||
"(x-(uintptr_t)x)" for char* pointers. The compiler is within its
|
||||
rights to substitute zero for this sort of expression, so that
|
||||
|
@ -54,16 +57,16 @@ o Avoid cancellation when using the "+" and "-" infix arithmetic
|
|||
"p+a-b" is safe because its value still necessarily depends on
|
||||
the rcu_dereference(), thus maintaining proper ordering.
|
||||
|
||||
o If you are using RCU to protect JITed functions, so that the
|
||||
- If you are using RCU to protect JITed functions, so that the
|
||||
"()" function-invocation operator is applied to a value obtained
|
||||
(directly or indirectly) from rcu_dereference(), you may need to
|
||||
interact directly with the hardware to flush instruction caches.
|
||||
This issue arises on some systems when a newly JITed function is
|
||||
using the same memory that was used by an earlier JITed function.
|
||||
|
||||
o Do not use the results from relational operators ("==", "!=",
|
||||
- Do not use the results from relational operators ("==", "!=",
|
||||
">", ">=", "<", or "<=") when dereferencing. For example,
|
||||
the following (quite strange) code is buggy:
|
||||
the following (quite strange) code is buggy::
|
||||
|
||||
int *p;
|
||||
int *q;
|
||||
|
@ -81,11 +84,11 @@ o Do not use the results from relational operators ("==", "!=",
|
|||
after such branches, but can speculate loads, which can again
|
||||
result in misordering bugs.
|
||||
|
||||
o Be very careful about comparing pointers obtained from
|
||||
- Be very careful about comparing pointers obtained from
|
||||
rcu_dereference() against non-NULL values. As Linus Torvalds
|
||||
explained, if the two pointers are equal, the compiler could
|
||||
substitute the pointer you are comparing against for the pointer
|
||||
obtained from rcu_dereference(). For example:
|
||||
obtained from rcu_dereference(). For example::
|
||||
|
||||
p = rcu_dereference(gp);
|
||||
if (p == &default_struct)
|
||||
|
@ -93,7 +96,7 @@ o Be very careful about comparing pointers obtained from
|
|||
|
||||
Because the compiler now knows that the value of "p" is exactly
|
||||
the address of the variable "default_struct", it is free to
|
||||
transform this code into the following:
|
||||
transform this code into the following::
|
||||
|
||||
p = rcu_dereference(gp);
|
||||
if (p == &default_struct)
|
||||
|
@ -105,14 +108,14 @@ o Be very careful about comparing pointers obtained from
|
|||
|
||||
However, comparisons are OK in the following cases:
|
||||
|
||||
o The comparison was against the NULL pointer. If the
|
||||
- The comparison was against the NULL pointer. If the
|
||||
compiler knows that the pointer is NULL, you had better
|
||||
not be dereferencing it anyway. If the comparison is
|
||||
non-equal, the compiler is none the wiser. Therefore,
|
||||
it is safe to compare pointers from rcu_dereference()
|
||||
against NULL pointers.
|
||||
|
||||
o The pointer is never dereferenced after being compared.
|
||||
- The pointer is never dereferenced after being compared.
|
||||
Since there are no subsequent dereferences, the compiler
|
||||
cannot use anything it learned from the comparison
|
||||
to reorder the non-existent subsequent dereferences.
|
||||
|
@ -124,31 +127,31 @@ o Be very careful about comparing pointers obtained from
|
|||
dereferenced, rcu_access_pointer() should be used in place
|
||||
of rcu_dereference().
|
||||
|
||||
o The comparison is against a pointer that references memory
|
||||
- The comparison is against a pointer that references memory
|
||||
that was initialized "a long time ago." The reason
|
||||
this is safe is that even if misordering occurs, the
|
||||
misordering will not affect the accesses that follow
|
||||
the comparison. So exactly how long ago is "a long
|
||||
time ago"? Here are some possibilities:
|
||||
|
||||
o Compile time.
|
||||
- Compile time.
|
||||
|
||||
o Boot time.
|
||||
- Boot time.
|
||||
|
||||
o Module-init time for module code.
|
||||
- Module-init time for module code.
|
||||
|
||||
o Prior to kthread creation for kthread code.
|
||||
- Prior to kthread creation for kthread code.
|
||||
|
||||
o During some prior acquisition of the lock that
|
||||
- During some prior acquisition of the lock that
|
||||
we now hold.
|
||||
|
||||
o Before mod_timer() time for a timer handler.
|
||||
- Before mod_timer() time for a timer handler.
|
||||
|
||||
There are many other possibilities involving the Linux
|
||||
kernel's wide array of primitives that cause code to
|
||||
be invoked at a later time.
|
||||
|
||||
o The pointer being compared against also came from
|
||||
- The pointer being compared against also came from
|
||||
rcu_dereference(). In this case, both pointers depend
|
||||
on one rcu_dereference() or another, so you get proper
|
||||
ordering either way.
|
||||
|
@ -159,13 +162,13 @@ o Be very careful about comparing pointers obtained from
|
|||
of such an RCU usage bug is shown in the section titled
|
||||
"EXAMPLE OF AMPLIFIED RCU-USAGE BUG".
|
||||
|
||||
o All of the accesses following the comparison are stores,
|
||||
- All of the accesses following the comparison are stores,
|
||||
so that a control dependency preserves the needed ordering.
|
||||
That said, it is easy to get control dependencies wrong.
|
||||
Please see the "CONTROL DEPENDENCIES" section of
|
||||
Documentation/memory-barriers.txt for more details.
|
||||
|
||||
o The pointers are not equal -and- the compiler does
|
||||
- The pointers are not equal -and- the compiler does
|
||||
not have enough information to deduce the value of the
|
||||
pointer. Note that the volatile cast in rcu_dereference()
|
||||
will normally prevent the compiler from knowing too much.
|
||||
|
@ -175,7 +178,7 @@ o Be very careful about comparing pointers obtained from
|
|||
comparison will provide exactly the information that the
|
||||
compiler needs to deduce the value of the pointer.
|
||||
|
||||
o Disable any value-speculation optimizations that your compiler
|
||||
- Disable any value-speculation optimizations that your compiler
|
||||
might provide, especially if you are making use of feedback-based
|
||||
optimizations that take data collected from prior runs. Such
|
||||
value-speculation optimizations reorder operations by design.
|
||||
|
@ -188,11 +191,12 @@ o Disable any value-speculation optimizations that your compiler
|
|||
|
||||
|
||||
EXAMPLE OF AMPLIFIED RCU-USAGE BUG
|
||||
----------------------------------
|
||||
|
||||
Because updaters can run concurrently with RCU readers, RCU readers can
|
||||
see stale and/or inconsistent values. If RCU readers need fresh or
|
||||
consistent values, which they sometimes do, they need to take proper
|
||||
precautions. To see this, consider the following code fragment:
|
||||
precautions. To see this, consider the following code fragment::
|
||||
|
||||
struct foo {
|
||||
int a;
|
||||
|
@ -244,7 +248,7 @@ to some reordering from the compiler and CPUs is beside the point.
|
|||
|
||||
But suppose that the reader needs a consistent view?
|
||||
|
||||
Then one approach is to use locking, for example, as follows:
|
||||
Then one approach is to use locking, for example, as follows::
|
||||
|
||||
struct foo {
|
||||
int a;
|
||||
|
@ -299,6 +303,7 @@ As always, use the right tool for the job!
|
|||
|
||||
|
||||
EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH
|
||||
-----------------------------------------
|
||||
|
||||
If a pointer obtained from rcu_dereference() compares not-equal to some
|
||||
other pointer, the compiler normally has no clue what the value of the
|
||||
|
@ -308,7 +313,7 @@ guarantees that RCU depends on. And the volatile cast in rcu_dereference()
|
|||
should prevent the compiler from guessing the value.
|
||||
|
||||
But without rcu_dereference(), the compiler knows more than you might
|
||||
expect. Consider the following code fragment:
|
||||
expect. Consider the following code fragment::
|
||||
|
||||
struct foo {
|
||||
int a;
|
||||
|
@ -354,6 +359,7 @@ dereference the resulting pointer.
|
|||
|
||||
|
||||
WHICH MEMBER OF THE rcu_dereference() FAMILY SHOULD YOU USE?
|
||||
------------------------------------------------------------
|
||||
|
||||
First, please avoid using rcu_dereference_raw() and also please avoid
|
||||
using rcu_dereference_check() and rcu_dereference_protected() with a
|
||||
|
@ -370,7 +376,7 @@ member of the rcu_dereference() to use in various situations:
|
|||
|
||||
2. If the access might be within an RCU read-side critical section
|
||||
on the one hand, or protected by (say) my_lock on the other,
|
||||
use rcu_dereference_check(), for example:
|
||||
use rcu_dereference_check(), for example::
|
||||
|
||||
p1 = rcu_dereference_check(p->rcu_protected_pointer,
|
||||
lockdep_is_held(&my_lock));
|
||||
|
@ -378,14 +384,14 @@ member of the rcu_dereference() to use in various situations:
|
|||
|
||||
3. If the access might be within an RCU read-side critical section
|
||||
on the one hand, or protected by either my_lock or your_lock on
|
||||
the other, again use rcu_dereference_check(), for example:
|
||||
the other, again use rcu_dereference_check(), for example::
|
||||
|
||||
p1 = rcu_dereference_check(p->rcu_protected_pointer,
|
||||
lockdep_is_held(&my_lock) ||
|
||||
lockdep_is_held(&your_lock));
|
||||
|
||||
4. If the access is on the update side, so that it is always protected
|
||||
by my_lock, use rcu_dereference_protected():
|
||||
by my_lock, use rcu_dereference_protected()::
|
||||
|
||||
p1 = rcu_dereference_protected(p->rcu_protected_pointer,
|
||||
lockdep_is_held(&my_lock));
|
||||
|
@ -410,18 +416,19 @@ member of the rcu_dereference() to use in various situations:
|
|||
|
||||
|
||||
SPARSE CHECKING OF RCU-PROTECTED POINTERS
|
||||
-----------------------------------------
|
||||
|
||||
The sparse static-analysis tool checks for direct access to RCU-protected
|
||||
pointers, which can result in "interesting" bugs due to compiler
|
||||
optimizations involving invented loads and perhaps also load tearing.
|
||||
For example, suppose someone mistakenly does something like this:
|
||||
For example, suppose someone mistakenly does something like this::
|
||||
|
||||
p = q->rcu_protected_pointer;
|
||||
do_something_with(p->a);
|
||||
do_something_else_with(p->b);
|
||||
|
||||
If register pressure is high, the compiler might optimize "p" out
|
||||
of existence, transforming the code to something like this:
|
||||
of existence, transforming the code to something like this::
|
||||
|
||||
do_something_with(q->rcu_protected_pointer->a);
|
||||
do_something_else_with(q->rcu_protected_pointer->b);
|
||||
|
@ -435,7 +442,7 @@ Load tearing could of course result in dereferencing a mashup of a pair
|
|||
of pointers, which also might fatally disappoint your code.
|
||||
|
||||
These problems could have been avoided simply by making the code instead
|
||||
read as follows:
|
||||
read as follows::
|
||||
|
||||
p = rcu_dereference(q->rcu_protected_pointer);
|
||||
do_something_with(p->a);
|
||||
|
@ -448,7 +455,7 @@ or as a formal parameter, with "__rcu", which tells sparse to complain if
|
|||
this pointer is accessed directly. It will also cause sparse to complain
|
||||
if a pointer not marked with "__rcu" is accessed using rcu_dereference()
|
||||
and friends. For example, ->rcu_protected_pointer might be declared as
|
||||
follows:
|
||||
follows::
|
||||
|
||||
struct foo __rcu *rcu_protected_pointer;
|
||||
|
|
@ -1,4 +1,7 @@
|
|||
.. _rcu_barrier:
|
||||
|
||||
RCU and Unloadable Modules
|
||||
==========================
|
||||
|
||||
[Originally published in LWN Jan. 14, 2007: http://lwn.net/Articles/217484/]
|
||||
|
||||
|
@ -21,7 +24,7 @@ given that readers might well leave absolutely no trace of their
|
|||
presence? There is a synchronize_rcu() primitive that blocks until all
|
||||
pre-existing readers have completed. An updater wishing to delete an
|
||||
element p from a linked list might do the following, while holding an
|
||||
appropriate lock, of course:
|
||||
appropriate lock, of course::
|
||||
|
||||
list_del_rcu(p);
|
||||
synchronize_rcu();
|
||||
|
@ -32,13 +35,13 @@ primitive must be used instead. This primitive takes a pointer to an
|
|||
rcu_head struct placed within the RCU-protected data structure and
|
||||
another pointer to a function that may be invoked later to free that
|
||||
structure. Code to delete an element p from the linked list from IRQ
|
||||
context might then be as follows:
|
||||
context might then be as follows::
|
||||
|
||||
list_del_rcu(p);
|
||||
call_rcu(&p->rcu, p_callback);
|
||||
|
||||
Since call_rcu() never blocks, this code can safely be used from within
|
||||
IRQ context. The function p_callback() might be defined as follows:
|
||||
IRQ context. The function p_callback() might be defined as follows::
|
||||
|
||||
static void p_callback(struct rcu_head *rp)
|
||||
{
|
||||
|
@ -49,6 +52,7 @@ IRQ context. The function p_callback() might be defined as follows:
|
|||
|
||||
|
||||
Unloading Modules That Use call_rcu()
|
||||
-------------------------------------
|
||||
|
||||
But what if p_callback is defined in an unloadable module?
|
||||
|
||||
|
@ -69,10 +73,11 @@ in realtime kernels in order to avoid excessive scheduling latencies.
|
|||
|
||||
|
||||
rcu_barrier()
|
||||
-------------
|
||||
|
||||
We instead need the rcu_barrier() primitive. Rather than waiting for
|
||||
a grace period to elapse, rcu_barrier() waits for all outstanding RCU
|
||||
callbacks to complete. Please note that rcu_barrier() does -not- imply
|
||||
callbacks to complete. Please note that rcu_barrier() does **not** imply
|
||||
synchronize_rcu(), in particular, if there are no RCU callbacks queued
|
||||
anywhere, rcu_barrier() is within its rights to return immediately,
|
||||
without waiting for a grace period to elapse.
|
||||
|
@ -88,79 +93,79 @@ must match the flavor of rcu_barrier() with that of call_rcu(). If your
|
|||
module uses multiple flavors of call_rcu(), then it must also use multiple
|
||||
flavors of rcu_barrier() when unloading that module. For example, if
|
||||
it uses call_rcu(), call_srcu() on srcu_struct_1, and call_srcu() on
|
||||
srcu_struct_2(), then the following three lines of code will be required
|
||||
when unloading:
|
||||
srcu_struct_2, then the following three lines of code will be required
|
||||
when unloading::
|
||||
|
||||
1 rcu_barrier();
|
||||
2 srcu_barrier(&srcu_struct_1);
|
||||
3 srcu_barrier(&srcu_struct_2);
|
||||
|
||||
The rcutorture module makes use of rcu_barrier() in its exit function
|
||||
as follows:
|
||||
as follows::
|
||||
|
||||
1 static void
|
||||
2 rcu_torture_cleanup(void)
|
||||
3 {
|
||||
4 int i;
|
||||
1 static void
|
||||
2 rcu_torture_cleanup(void)
|
||||
3 {
|
||||
4 int i;
|
||||
5
|
||||
6 fullstop = 1;
|
||||
7 if (shuffler_task != NULL) {
|
||||
6 fullstop = 1;
|
||||
7 if (shuffler_task != NULL) {
|
||||
8 VERBOSE_PRINTK_STRING("Stopping rcu_torture_shuffle task");
|
||||
9 kthread_stop(shuffler_task);
|
||||
10 }
|
||||
11 shuffler_task = NULL;
|
||||
12
|
||||
13 if (writer_task != NULL) {
|
||||
14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task");
|
||||
15 kthread_stop(writer_task);
|
||||
16 }
|
||||
17 writer_task = NULL;
|
||||
18
|
||||
19 if (reader_tasks != NULL) {
|
||||
20 for (i = 0; i < nrealreaders; i++) {
|
||||
21 if (reader_tasks[i] != NULL) {
|
||||
22 VERBOSE_PRINTK_STRING(
|
||||
23 "Stopping rcu_torture_reader task");
|
||||
24 kthread_stop(reader_tasks[i]);
|
||||
25 }
|
||||
26 reader_tasks[i] = NULL;
|
||||
27 }
|
||||
28 kfree(reader_tasks);
|
||||
29 reader_tasks = NULL;
|
||||
30 }
|
||||
31 rcu_torture_current = NULL;
|
||||
32
|
||||
33 if (fakewriter_tasks != NULL) {
|
||||
34 for (i = 0; i < nfakewriters; i++) {
|
||||
35 if (fakewriter_tasks[i] != NULL) {
|
||||
36 VERBOSE_PRINTK_STRING(
|
||||
37 "Stopping rcu_torture_fakewriter task");
|
||||
38 kthread_stop(fakewriter_tasks[i]);
|
||||
39 }
|
||||
40 fakewriter_tasks[i] = NULL;
|
||||
41 }
|
||||
42 kfree(fakewriter_tasks);
|
||||
43 fakewriter_tasks = NULL;
|
||||
44 }
|
||||
45
|
||||
46 if (stats_task != NULL) {
|
||||
47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task");
|
||||
48 kthread_stop(stats_task);
|
||||
49 }
|
||||
50 stats_task = NULL;
|
||||
51
|
||||
52 /* Wait for all RCU callbacks to fire. */
|
||||
53 rcu_barrier();
|
||||
54
|
||||
55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */
|
||||
56
|
||||
57 if (cur_ops->cleanup != NULL)
|
||||
58 cur_ops->cleanup();
|
||||
59 if (atomic_read(&n_rcu_torture_error))
|
||||
60 rcu_torture_print_module_parms("End of test: FAILURE");
|
||||
61 else
|
||||
62 rcu_torture_print_module_parms("End of test: SUCCESS");
|
||||
63 }
|
||||
10 }
|
||||
11 shuffler_task = NULL;
|
||||
12
|
||||
13 if (writer_task != NULL) {
|
||||
14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task");
|
||||
15 kthread_stop(writer_task);
|
||||
16 }
|
||||
17 writer_task = NULL;
|
||||
18
|
||||
19 if (reader_tasks != NULL) {
|
||||
20 for (i = 0; i < nrealreaders; i++) {
|
||||
21 if (reader_tasks[i] != NULL) {
|
||||
22 VERBOSE_PRINTK_STRING(
|
||||
23 "Stopping rcu_torture_reader task");
|
||||
24 kthread_stop(reader_tasks[i]);
|
||||
25 }
|
||||
26 reader_tasks[i] = NULL;
|
||||
27 }
|
||||
28 kfree(reader_tasks);
|
||||
29 reader_tasks = NULL;
|
||||
30 }
|
||||
31 rcu_torture_current = NULL;
|
||||
32
|
||||
33 if (fakewriter_tasks != NULL) {
|
||||
34 for (i = 0; i < nfakewriters; i++) {
|
||||
35 if (fakewriter_tasks[i] != NULL) {
|
||||
36 VERBOSE_PRINTK_STRING(
|
||||
37 "Stopping rcu_torture_fakewriter task");
|
||||
38 kthread_stop(fakewriter_tasks[i]);
|
||||
39 }
|
||||
40 fakewriter_tasks[i] = NULL;
|
||||
41 }
|
||||
42 kfree(fakewriter_tasks);
|
||||
43 fakewriter_tasks = NULL;
|
||||
44 }
|
||||
45
|
||||
46 if (stats_task != NULL) {
|
||||
47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task");
|
||||
48 kthread_stop(stats_task);
|
||||
49 }
|
||||
50 stats_task = NULL;
|
||||
51
|
||||
52 /* Wait for all RCU callbacks to fire. */
|
||||
53 rcu_barrier();
|
||||
54
|
||||
55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */
|
||||
56
|
||||
57 if (cur_ops->cleanup != NULL)
|
||||
58 cur_ops->cleanup();
|
||||
59 if (atomic_read(&n_rcu_torture_error))
|
||||
60 rcu_torture_print_module_parms("End of test: FAILURE");
|
||||
61 else
|
||||
62 rcu_torture_print_module_parms("End of test: SUCCESS");
|
||||
63 }
|
||||
|
||||
Line 6 sets a global variable that prevents any RCU callbacks from
|
||||
re-posting themselves. This will not be necessary in most cases, since
|
||||
|
@ -176,9 +181,14 @@ for any pre-existing callbacks to complete.
|
|||
Then lines 55-62 print status and do operation-specific cleanup, and
|
||||
then return, permitting the module-unload operation to be completed.
|
||||
|
||||
Quick Quiz #1: Is there any other situation where rcu_barrier() might
|
||||
.. _rcubarrier_quiz_1:
|
||||
|
||||
Quick Quiz #1:
|
||||
Is there any other situation where rcu_barrier() might
|
||||
be required?
|
||||
|
||||
:ref:`Answer to Quick Quiz #1 <answer_rcubarrier_quiz_1>`
|
||||
|
||||
Your module might have additional complications. For example, if your
|
||||
module invokes call_rcu() from timers, you will need to first cancel all
|
||||
the timers, and only then invoke rcu_barrier() to wait for any remaining
|
||||
|
@ -188,11 +198,12 @@ Of course, if you module uses call_rcu(), you will need to invoke
|
|||
rcu_barrier() before unloading. Similarly, if your module uses
|
||||
call_srcu(), you will need to invoke srcu_barrier() before unloading,
|
||||
and on the same srcu_struct structure. If your module uses call_rcu()
|
||||
-and- call_srcu(), then you will need to invoke rcu_barrier() -and-
|
||||
**and** call_srcu(), then you will need to invoke rcu_barrier() **and**
|
||||
srcu_barrier().
|
||||
|
||||
|
||||
Implementing rcu_barrier()
|
||||
--------------------------
|
||||
|
||||
Dipankar Sarma's implementation of rcu_barrier() makes use of the fact
|
||||
that RCU callbacks are never reordered once queued on one of the per-CPU
|
||||
|
@ -200,19 +211,19 @@ queues. His implementation queues an RCU callback on each of the per-CPU
|
|||
callback queues, and then waits until they have all started executing, at
|
||||
which point, all earlier RCU callbacks are guaranteed to have completed.
|
||||
|
||||
The original code for rcu_barrier() was as follows:
|
||||
The original code for rcu_barrier() was as follows::
|
||||
|
||||
1 void rcu_barrier(void)
|
||||
2 {
|
||||
3 BUG_ON(in_interrupt());
|
||||
4 /* Take cpucontrol mutex to protect against CPU hotplug */
|
||||
5 mutex_lock(&rcu_barrier_mutex);
|
||||
6 init_completion(&rcu_barrier_completion);
|
||||
7 atomic_set(&rcu_barrier_cpu_count, 0);
|
||||
8 on_each_cpu(rcu_barrier_func, NULL, 0, 1);
|
||||
9 wait_for_completion(&rcu_barrier_completion);
|
||||
10 mutex_unlock(&rcu_barrier_mutex);
|
||||
11 }
|
||||
1 void rcu_barrier(void)
|
||||
2 {
|
||||
3 BUG_ON(in_interrupt());
|
||||
4 /* Take cpucontrol mutex to protect against CPU hotplug */
|
||||
5 mutex_lock(&rcu_barrier_mutex);
|
||||
6 init_completion(&rcu_barrier_completion);
|
||||
7 atomic_set(&rcu_barrier_cpu_count, 0);
|
||||
8 on_each_cpu(rcu_barrier_func, NULL, 0, 1);
|
||||
9 wait_for_completion(&rcu_barrier_completion);
|
||||
10 mutex_unlock(&rcu_barrier_mutex);
|
||||
11 }
|
||||
|
||||
Line 3 verifies that the caller is in process context, and lines 5 and 10
|
||||
use rcu_barrier_mutex to ensure that only one rcu_barrier() is using the
|
||||
|
@ -226,18 +237,18 @@ This code was rewritten in 2008 and several times thereafter, but this
|
|||
still gives the general idea.
|
||||
|
||||
The rcu_barrier_func() runs on each CPU, where it invokes call_rcu()
|
||||
to post an RCU callback, as follows:
|
||||
to post an RCU callback, as follows::
|
||||
|
||||
1 static void rcu_barrier_func(void *notused)
|
||||
2 {
|
||||
3 int cpu = smp_processor_id();
|
||||
4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
|
||||
5 struct rcu_head *head;
|
||||
1 static void rcu_barrier_func(void *notused)
|
||||
2 {
|
||||
3 int cpu = smp_processor_id();
|
||||
4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
|
||||
5 struct rcu_head *head;
|
||||
6
|
||||
7 head = &rdp->barrier;
|
||||
8 atomic_inc(&rcu_barrier_cpu_count);
|
||||
9 call_rcu(head, rcu_barrier_callback);
|
||||
10 }
|
||||
7 head = &rdp->barrier;
|
||||
8 atomic_inc(&rcu_barrier_cpu_count);
|
||||
9 call_rcu(head, rcu_barrier_callback);
|
||||
10 }
|
||||
|
||||
Lines 3 and 4 locate RCU's internal per-CPU rcu_data structure,
|
||||
which contains the struct rcu_head that needed for the later call to
|
||||
|
@ -248,20 +259,25 @@ the current CPU's queue.
|
|||
|
||||
The rcu_barrier_callback() function simply atomically decrements the
|
||||
rcu_barrier_cpu_count variable and finalizes the completion when it
|
||||
reaches zero, as follows:
|
||||
reaches zero, as follows::
|
||||
|
||||
1 static void rcu_barrier_callback(struct rcu_head *notused)
|
||||
2 {
|
||||
3 if (atomic_dec_and_test(&rcu_barrier_cpu_count))
|
||||
4 complete(&rcu_barrier_completion);
|
||||
3 if (atomic_dec_and_test(&rcu_barrier_cpu_count))
|
||||
4 complete(&rcu_barrier_completion);
|
||||
5 }
|
||||
|
||||
Quick Quiz #2: What happens if CPU 0's rcu_barrier_func() executes
|
||||
.. _rcubarrier_quiz_2:
|
||||
|
||||
Quick Quiz #2:
|
||||
What happens if CPU 0's rcu_barrier_func() executes
|
||||
immediately (thus incrementing rcu_barrier_cpu_count to the
|
||||
value one), but the other CPU's rcu_barrier_func() invocations
|
||||
are delayed for a full grace period? Couldn't this result in
|
||||
rcu_barrier() returning prematurely?
|
||||
|
||||
:ref:`Answer to Quick Quiz #2 <answer_rcubarrier_quiz_2>`
|
||||
|
||||
The current rcu_barrier() implementation is more complex, due to the need
|
||||
to avoid disturbing idle CPUs (especially on battery-powered systems)
|
||||
and the need to minimally disturb non-idle CPUs in real-time systems.
|
||||
|
@ -269,6 +285,7 @@ However, the code above illustrates the concepts.
|
|||
|
||||
|
||||
rcu_barrier() Summary
|
||||
---------------------
|
||||
|
||||
The rcu_barrier() primitive has seen relatively little use, since most
|
||||
code using RCU is in the core kernel rather than in modules. However, if
|
||||
|
@ -277,8 +294,12 @@ so that your module may be safely unloaded.
|
|||
|
||||
|
||||
Answers to Quick Quizzes
|
||||
------------------------
|
||||
|
||||
Quick Quiz #1: Is there any other situation where rcu_barrier() might
|
||||
.. _answer_rcubarrier_quiz_1:
|
||||
|
||||
Quick Quiz #1:
|
||||
Is there any other situation where rcu_barrier() might
|
||||
be required?
|
||||
|
||||
Answer: Interestingly enough, rcu_barrier() was not originally
|
||||
|
@ -292,7 +313,12 @@ Answer: Interestingly enough, rcu_barrier() was not originally
|
|||
implementing rcutorture, and found that rcu_barrier() solves
|
||||
this problem as well.
|
||||
|
||||
Quick Quiz #2: What happens if CPU 0's rcu_barrier_func() executes
|
||||
:ref:`Back to Quick Quiz #1 <rcubarrier_quiz_1>`
|
||||
|
||||
.. _answer_rcubarrier_quiz_2:
|
||||
|
||||
Quick Quiz #2:
|
||||
What happens if CPU 0's rcu_barrier_func() executes
|
||||
immediately (thus incrementing rcu_barrier_cpu_count to the
|
||||
value one), but the other CPU's rcu_barrier_func() invocations
|
||||
are delayed for a full grace period? Couldn't this result in
|
||||
|
@ -323,3 +349,5 @@ Answer: This cannot happen. The reason is that on_each_cpu() has its last
|
|||
is to add an rcu_read_lock() before line 8 of rcu_barrier()
|
||||
and an rcu_read_unlock() after line 8 of this same function. If
|
||||
you can think of a better change, please let me know!
|
||||
|
||||
:ref:`Back to Quick Quiz #2 <rcubarrier_quiz_2>`
|
|
@ -225,18 +225,13 @@ an estimate of the total number of RCU callbacks queued across all CPUs
|
|||
In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed
|
||||
for each CPU:
|
||||
|
||||
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 Nonlazy posted: ..D
|
||||
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 dyntick_enabled: 1
|
||||
|
||||
The "last_accelerate:" prints the low-order 16 bits (in hex) of the
|
||||
jiffies counter when this CPU last invoked rcu_try_advance_all_cbs()
|
||||
from rcu_needs_cpu() or last invoked rcu_accelerate_cbs() from
|
||||
rcu_prepare_for_idle(). The "Nonlazy posted:" indicates lazy-callback
|
||||
status, so that an "l" indicates that all callbacks were lazy at the start
|
||||
of the last idle period and an "L" indicates that there are currently
|
||||
no non-lazy callbacks (in both cases, "." is printed otherwise, as
|
||||
shown above) and "D" indicates that dyntick-idle processing is enabled
|
||||
("." is printed otherwise, for example, if disabled via the "nohz="
|
||||
kernel boot parameter).
|
||||
rcu_prepare_for_idle(). "dyntick_enabled: 1" indicates that dyntick-idle
|
||||
processing is enabled.
|
||||
|
||||
If the grace period ends just as the stall warning starts printing,
|
||||
there will be a spurious stall-warning message, which will include
|
||||
|
|
|
@ -1,15 +1,18 @@
|
|||
.. _whatisrcu_doc:
|
||||
|
||||
What is RCU? -- "Read, Copy, Update"
|
||||
======================================
|
||||
|
||||
Please note that the "What is RCU?" LWN series is an excellent place
|
||||
to start learning about RCU:
|
||||
|
||||
1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
|
||||
2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
|
||||
3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
|
||||
4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/
|
||||
2010 Big API Table http://lwn.net/Articles/419086/
|
||||
5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/
|
||||
2014 Big API Table http://lwn.net/Articles/609973/
|
||||
| 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
|
||||
| 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
|
||||
| 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
|
||||
| 4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/
|
||||
| 2010 Big API Table http://lwn.net/Articles/419086/
|
||||
| 5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/
|
||||
| 2014 Big API Table http://lwn.net/Articles/609973/
|
||||
|
||||
|
||||
What is RCU?
|
||||
|
@ -24,14 +27,21 @@ the experience has been that different people must take different paths
|
|||
to arrive at an understanding of RCU. This document provides several
|
||||
different paths, as follows:
|
||||
|
||||
1. RCU OVERVIEW
|
||||
2. WHAT IS RCU'S CORE API?
|
||||
3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
|
||||
4. WHAT IF MY UPDATING THREAD CANNOT BLOCK?
|
||||
5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
|
||||
6. ANALOGY WITH READER-WRITER LOCKING
|
||||
7. FULL LIST OF RCU APIs
|
||||
8. ANSWERS TO QUICK QUIZZES
|
||||
:ref:`1. RCU OVERVIEW <1_whatisRCU>`
|
||||
|
||||
:ref:`2. WHAT IS RCU'S CORE API? <2_whatisRCU>`
|
||||
|
||||
:ref:`3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? <3_whatisRCU>`
|
||||
|
||||
:ref:`4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? <4_whatisRCU>`
|
||||
|
||||
:ref:`5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? <5_whatisRCU>`
|
||||
|
||||
:ref:`6. ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>`
|
||||
|
||||
:ref:`7. FULL LIST OF RCU APIs <7_whatisRCU>`
|
||||
|
||||
:ref:`8. ANSWERS TO QUICK QUIZZES <8_whatisRCU>`
|
||||
|
||||
People who prefer starting with a conceptual overview should focus on
|
||||
Section 1, though most readers will profit by reading this section at
|
||||
|
@ -49,8 +59,10 @@ everything, feel free to read the whole thing -- but if you are really
|
|||
that type of person, you have perused the source code and will therefore
|
||||
never need this document anyway. ;-)
|
||||
|
||||
.. _1_whatisRCU:
|
||||
|
||||
1. RCU OVERVIEW
|
||||
----------------
|
||||
|
||||
The basic idea behind RCU is to split updates into "removal" and
|
||||
"reclamation" phases. The removal phase removes references to data items
|
||||
|
@ -116,8 +128,10 @@ So how the heck can a reclaimer tell when a reader is done, given
|
|||
that readers are not doing any sort of synchronization operations???
|
||||
Read on to learn about how RCU's API makes this easy.
|
||||
|
||||
.. _2_whatisRCU:
|
||||
|
||||
2. WHAT IS RCU'S CORE API?
|
||||
---------------------------
|
||||
|
||||
The core RCU API is quite small:
|
||||
|
||||
|
@ -136,7 +150,7 @@ later. See the kernel docbook documentation for more info, or look directly
|
|||
at the function header comments.
|
||||
|
||||
rcu_read_lock()
|
||||
|
||||
^^^^^^^^^^^^^^^
|
||||
void rcu_read_lock(void);
|
||||
|
||||
Used by a reader to inform the reclaimer that the reader is
|
||||
|
@ -150,7 +164,7 @@ rcu_read_lock()
|
|||
longer-term references to data structures.
|
||||
|
||||
rcu_read_unlock()
|
||||
|
||||
^^^^^^^^^^^^^^^^^
|
||||
void rcu_read_unlock(void);
|
||||
|
||||
Used by a reader to inform the reclaimer that the reader is
|
||||
|
@ -158,15 +172,15 @@ rcu_read_unlock()
|
|||
read-side critical sections may be nested and/or overlapping.
|
||||
|
||||
synchronize_rcu()
|
||||
|
||||
^^^^^^^^^^^^^^^^^
|
||||
void synchronize_rcu(void);
|
||||
|
||||
Marks the end of updater code and the beginning of reclaimer
|
||||
code. It does this by blocking until all pre-existing RCU
|
||||
read-side critical sections on all CPUs have completed.
|
||||
Note that synchronize_rcu() will -not- necessarily wait for
|
||||
Note that synchronize_rcu() will **not** necessarily wait for
|
||||
any subsequent RCU read-side critical sections to complete.
|
||||
For example, consider the following sequence of events:
|
||||
For example, consider the following sequence of events::
|
||||
|
||||
CPU 0 CPU 1 CPU 2
|
||||
----------------- ------------------------- ---------------
|
||||
|
@ -182,7 +196,7 @@ synchronize_rcu()
|
|||
any that begin after synchronize_rcu() is invoked.
|
||||
|
||||
Of course, synchronize_rcu() does not necessarily return
|
||||
-immediately- after the last pre-existing RCU read-side critical
|
||||
**immediately** after the last pre-existing RCU read-side critical
|
||||
section completes. For one thing, there might well be scheduling
|
||||
delays. For another thing, many RCU implementations process
|
||||
requests in batches in order to improve efficiencies, which can
|
||||
|
@ -211,10 +225,10 @@ synchronize_rcu()
|
|||
checklist.txt for some approaches to limiting the update rate.
|
||||
|
||||
rcu_assign_pointer()
|
||||
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
void rcu_assign_pointer(p, typeof(p) v);
|
||||
|
||||
Yes, rcu_assign_pointer() -is- implemented as a macro, though it
|
||||
Yes, rcu_assign_pointer() **is** implemented as a macro, though it
|
||||
would be cool to be able to declare a function in this manner.
|
||||
(Compiler experts will no doubt disagree.)
|
||||
|
||||
|
@ -231,7 +245,7 @@ rcu_assign_pointer()
|
|||
the _rcu list-manipulation primitives such as list_add_rcu().
|
||||
|
||||
rcu_dereference()
|
||||
|
||||
^^^^^^^^^^^^^^^^^
|
||||
typeof(p) rcu_dereference(p);
|
||||
|
||||
Like rcu_assign_pointer(), rcu_dereference() must be implemented
|
||||
|
@ -248,13 +262,13 @@ rcu_dereference()
|
|||
|
||||
Common coding practice uses rcu_dereference() to copy an
|
||||
RCU-protected pointer to a local variable, then dereferences
|
||||
this local variable, for example as follows:
|
||||
this local variable, for example as follows::
|
||||
|
||||
p = rcu_dereference(head.next);
|
||||
return p->data;
|
||||
|
||||
However, in this case, one could just as easily combine these
|
||||
into one statement:
|
||||
into one statement::
|
||||
|
||||
return rcu_dereference(head.next)->data;
|
||||
|
||||
|
@ -266,8 +280,8 @@ rcu_dereference()
|
|||
unnecessary overhead on Alpha CPUs.
|
||||
|
||||
Note that the value returned by rcu_dereference() is valid
|
||||
only within the enclosing RCU read-side critical section [1].
|
||||
For example, the following is -not- legal:
|
||||
only within the enclosing RCU read-side critical section [1]_.
|
||||
For example, the following is **not** legal::
|
||||
|
||||
rcu_read_lock();
|
||||
p = rcu_dereference(head.next);
|
||||
|
@ -290,9 +304,9 @@ rcu_dereference()
|
|||
at any time, including immediately after the rcu_dereference().
|
||||
And, again like rcu_assign_pointer(), rcu_dereference() is
|
||||
typically used indirectly, via the _rcu list-manipulation
|
||||
primitives, such as list_for_each_entry_rcu() [2].
|
||||
primitives, such as list_for_each_entry_rcu() [2]_.
|
||||
|
||||
[1] The variant rcu_dereference_protected() can be used outside
|
||||
.. [1] The variant rcu_dereference_protected() can be used outside
|
||||
of an RCU read-side critical section as long as the usage is
|
||||
protected by locks acquired by the update-side code. This variant
|
||||
avoids the lockdep warning that would happen when using (for
|
||||
|
@ -305,7 +319,7 @@ rcu_dereference()
|
|||
a lockdep splat is emitted. See Documentation/RCU/Design/Requirements/Requirements.rst
|
||||
and the API's code comments for more details and example usage.
|
||||
|
||||
[2] If the list_for_each_entry_rcu() instance might be used by
|
||||
.. [2] If the list_for_each_entry_rcu() instance might be used by
|
||||
update-side code as well as by RCU readers, then an additional
|
||||
lockdep expression can be added to its list of arguments.
|
||||
For example, given an additional "lock_is_held(&mylock)" argument,
|
||||
|
@ -315,6 +329,7 @@ rcu_dereference()
|
|||
|
||||
The following diagram shows how each API communicates among the
|
||||
reader, updater, and reclaimer.
|
||||
::
|
||||
|
||||
|
||||
rcu_assign_pointer()
|
||||
|
@ -375,12 +390,16 @@ c. RCU applied to scheduler and interrupt/NMI-handler tasks.
|
|||
Again, most uses will be of (a). The (b) and (c) cases are important
|
||||
for specialized uses, but are relatively uncommon.
|
||||
|
||||
.. _3_whatisRCU:
|
||||
|
||||
3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
|
||||
-----------------------------------------------
|
||||
|
||||
This section shows a simple use of the core RCU API to protect a
|
||||
global pointer to a dynamically allocated structure. More-typical
|
||||
uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt.
|
||||
uses of RCU may be found in :ref:`listRCU.rst <list_rcu_doc>`,
|
||||
:ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst <NMI_rcu_doc>`.
|
||||
::
|
||||
|
||||
struct foo {
|
||||
int a;
|
||||
|
@ -440,40 +459,43 @@ uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt.
|
|||
|
||||
So, to sum up:
|
||||
|
||||
o Use rcu_read_lock() and rcu_read_unlock() to guard RCU
|
||||
- Use rcu_read_lock() and rcu_read_unlock() to guard RCU
|
||||
read-side critical sections.
|
||||
|
||||
o Within an RCU read-side critical section, use rcu_dereference()
|
||||
- Within an RCU read-side critical section, use rcu_dereference()
|
||||
to dereference RCU-protected pointers.
|
||||
|
||||
o Use some solid scheme (such as locks or semaphores) to
|
||||
- Use some solid scheme (such as locks or semaphores) to
|
||||
keep concurrent updates from interfering with each other.
|
||||
|
||||
o Use rcu_assign_pointer() to update an RCU-protected pointer.
|
||||
- Use rcu_assign_pointer() to update an RCU-protected pointer.
|
||||
This primitive protects concurrent readers from the updater,
|
||||
-not- concurrent updates from each other! You therefore still
|
||||
**not** concurrent updates from each other! You therefore still
|
||||
need to use locking (or something similar) to keep concurrent
|
||||
rcu_assign_pointer() primitives from interfering with each other.
|
||||
|
||||
o Use synchronize_rcu() -after- removing a data element from an
|
||||
RCU-protected data structure, but -before- reclaiming/freeing
|
||||
- Use synchronize_rcu() **after** removing a data element from an
|
||||
RCU-protected data structure, but **before** reclaiming/freeing
|
||||
the data element, in order to wait for the completion of all
|
||||
RCU read-side critical sections that might be referencing that
|
||||
data item.
|
||||
|
||||
See checklist.txt for additional rules to follow when using RCU.
|
||||
And again, more-typical uses of RCU may be found in listRCU.txt,
|
||||
arrayRCU.txt, and NMI-RCU.txt.
|
||||
And again, more-typical uses of RCU may be found in :ref:`listRCU.rst
|
||||
<list_rcu_doc>`, :ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst
|
||||
<NMI_rcu_doc>`.
|
||||
|
||||
.. _4_whatisRCU:
|
||||
|
||||
4. WHAT IF MY UPDATING THREAD CANNOT BLOCK?
|
||||
--------------------------------------------
|
||||
|
||||
In the example above, foo_update_a() blocks until a grace period elapses.
|
||||
This is quite simple, but in some cases one cannot afford to wait so
|
||||
long -- there might be other high-priority work to be done.
|
||||
|
||||
In such cases, one uses call_rcu() rather than synchronize_rcu().
|
||||
The call_rcu() API is as follows:
|
||||
The call_rcu() API is as follows::
|
||||
|
||||
void call_rcu(struct rcu_head * head,
|
||||
void (*func)(struct rcu_head *head));
|
||||
|
@ -481,7 +503,7 @@ The call_rcu() API is as follows:
|
|||
This function invokes func(head) after a grace period has elapsed.
|
||||
This invocation might happen from either softirq or process context,
|
||||
so the function is not permitted to block. The foo struct needs to
|
||||
have an rcu_head structure added, perhaps as follows:
|
||||
have an rcu_head structure added, perhaps as follows::
|
||||
|
||||
struct foo {
|
||||
int a;
|
||||
|
@ -490,7 +512,7 @@ have an rcu_head structure added, perhaps as follows:
|
|||
struct rcu_head rcu;
|
||||
};
|
||||
|
||||
The foo_update_a() function might then be written as follows:
|
||||
The foo_update_a() function might then be written as follows::
|
||||
|
||||
/*
|
||||
* Create a new struct foo that is the same as the one currently
|
||||
|
@ -520,7 +542,7 @@ The foo_update_a() function might then be written as follows:
|
|||
call_rcu(&old_fp->rcu, foo_reclaim);
|
||||
}
|
||||
|
||||
The foo_reclaim() function might appear as follows:
|
||||
The foo_reclaim() function might appear as follows::
|
||||
|
||||
void foo_reclaim(struct rcu_head *rp)
|
||||
{
|
||||
|
@ -544,7 +566,7 @@ namely foo_reclaim().
|
|||
The summary of advice is the same as for the previous section, except
|
||||
that we are now using call_rcu() rather than synchronize_rcu():
|
||||
|
||||
o Use call_rcu() -after- removing a data element from an
|
||||
- Use call_rcu() **after** removing a data element from an
|
||||
RCU-protected data structure in order to register a callback
|
||||
function that will be invoked after the completion of all RCU
|
||||
read-side critical sections that might be referencing that
|
||||
|
@ -552,14 +574,16 @@ o Use call_rcu() -after- removing a data element from an
|
|||
|
||||
If the callback for call_rcu() is not doing anything more than calling
|
||||
kfree() on the structure, you can use kfree_rcu() instead of call_rcu()
|
||||
to avoid having to write your own callback:
|
||||
to avoid having to write your own callback::
|
||||
|
||||
kfree_rcu(old_fp, rcu);
|
||||
|
||||
Again, see checklist.txt for additional rules governing the use of RCU.
|
||||
|
||||
.. _5_whatisRCU:
|
||||
|
||||
5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
|
||||
------------------------------------------------
|
||||
|
||||
One of the nice things about RCU is that it has extremely simple "toy"
|
||||
implementations that are a good first step towards understanding the
|
||||
|
@ -579,7 +603,7 @@ more details on the current implementation as of early 2004.
|
|||
|
||||
|
||||
5A. "TOY" IMPLEMENTATION #1: LOCKING
|
||||
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
This section presents a "toy" RCU implementation that is based on
|
||||
familiar locking primitives. Its overhead makes it a non-starter for
|
||||
real-life use, as does its lack of scalability. It is also unsuitable
|
||||
|
@ -591,7 +615,7 @@ you allow nested rcu_read_lock() calls, you can deadlock.
|
|||
However, it is probably the easiest implementation to relate to, so is
|
||||
a good starting point.
|
||||
|
||||
It is extremely simple:
|
||||
It is extremely simple::
|
||||
|
||||
static DEFINE_RWLOCK(rcu_gp_mutex);
|
||||
|
||||
|
@ -614,7 +638,7 @@ It is extremely simple:
|
|||
|
||||
[You can ignore rcu_assign_pointer() and rcu_dereference() without missing
|
||||
much. But here are simplified versions anyway. And whatever you do,
|
||||
don't forget about them when submitting patches making use of RCU!]
|
||||
don't forget about them when submitting patches making use of RCU!]::
|
||||
|
||||
#define rcu_assign_pointer(p, v) \
|
||||
({ \
|
||||
|
@ -647,18 +671,23 @@ that the only thing that can block rcu_read_lock() is a synchronize_rcu().
|
|||
But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex,
|
||||
so there can be no deadlock cycle.
|
||||
|
||||
Quick Quiz #1: Why is this argument naive? How could a deadlock
|
||||
.. _quiz_1:
|
||||
|
||||
Quick Quiz #1:
|
||||
Why is this argument naive? How could a deadlock
|
||||
occur when using this algorithm in a real-world Linux
|
||||
kernel? How could this deadlock be avoided?
|
||||
|
||||
:ref:`Answers to Quick Quiz <8_whatisRCU>`
|
||||
|
||||
5B. "TOY" EXAMPLE #2: CLASSIC RCU
|
||||
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
This section presents a "toy" RCU implementation that is based on
|
||||
"classic RCU". It is also short on performance (but only for updates) and
|
||||
on features such as hotplug CPU and the ability to run in CONFIG_PREEMPT
|
||||
kernels. The definitions of rcu_dereference() and rcu_assign_pointer()
|
||||
are the same as those shown in the preceding section, so they are omitted.
|
||||
::
|
||||
|
||||
void rcu_read_lock(void) { }
|
||||
|
||||
|
@ -683,14 +712,14 @@ CPU in turn. The run_on() primitive can be implemented straightforwardly
|
|||
in terms of the sched_setaffinity() primitive. Of course, a somewhat less
|
||||
"toy" implementation would restore the affinity upon completion rather
|
||||
than just leaving all tasks running on the last CPU, but when I said
|
||||
"toy", I meant -toy-!
|
||||
"toy", I meant **toy**!
|
||||
|
||||
So how the heck is this supposed to work???
|
||||
|
||||
Remember that it is illegal to block while in an RCU read-side critical
|
||||
section. Therefore, if a given CPU executes a context switch, we know
|
||||
that it must have completed all preceding RCU read-side critical sections.
|
||||
Once -all- CPUs have executed a context switch, then -all- preceding
|
||||
Once **all** CPUs have executed a context switch, then **all** preceding
|
||||
RCU read-side critical sections will have completed.
|
||||
|
||||
So, suppose that we remove a data item from its structure and then invoke
|
||||
|
@ -698,19 +727,32 @@ synchronize_rcu(). Once synchronize_rcu() returns, we are guaranteed
|
|||
that there are no RCU read-side critical sections holding a reference
|
||||
to that data item, so we can safely reclaim it.
|
||||
|
||||
Quick Quiz #2: Give an example where Classic RCU's read-side
|
||||
overhead is -negative-.
|
||||
.. _quiz_2:
|
||||
|
||||
Quick Quiz #3: If it is illegal to block in an RCU read-side
|
||||
Quick Quiz #2:
|
||||
Give an example where Classic RCU's read-side
|
||||
overhead is **negative**.
|
||||
|
||||
:ref:`Answers to Quick Quiz <8_whatisRCU>`
|
||||
|
||||
.. _quiz_3:
|
||||
|
||||
Quick Quiz #3:
|
||||
If it is illegal to block in an RCU read-side
|
||||
critical section, what the heck do you do in
|
||||
PREEMPT_RT, where normal spinlocks can block???
|
||||
|
||||
:ref:`Answers to Quick Quiz <8_whatisRCU>`
|
||||
|
||||
.. _6_whatisRCU:
|
||||
|
||||
6. ANALOGY WITH READER-WRITER LOCKING
|
||||
--------------------------------------
|
||||
|
||||
Although RCU can be used in many different ways, a very common use of
|
||||
RCU is analogous to reader-writer locking. The following unified
|
||||
diff shows how closely related RCU and reader-writer locking can be.
|
||||
::
|
||||
|
||||
@@ -5,5 +5,5 @@ struct el {
|
||||
int data;
|
||||
|
@ -762,7 +804,7 @@ diff shows how closely related RCU and reader-writer locking can be.
|
|||
return 0;
|
||||
}
|
||||
|
||||
Or, for those who prefer a side-by-side listing:
|
||||
Or, for those who prefer a side-by-side listing::
|
||||
|
||||
1 struct el { 1 struct el {
|
||||
2 struct list_head list; 2 struct list_head list;
|
||||
|
@ -774,40 +816,44 @@ Or, for those who prefer a side-by-side listing:
|
|||
8 rwlock_t listmutex; 8 spinlock_t listmutex;
|
||||
9 struct el head; 9 struct el head;
|
||||
|
||||
1 int search(long key, int *result) 1 int search(long key, int *result)
|
||||
2 { 2 {
|
||||
3 struct list_head *lp; 3 struct list_head *lp;
|
||||
4 struct el *p; 4 struct el *p;
|
||||
5 5
|
||||
6 read_lock(&listmutex); 6 rcu_read_lock();
|
||||
7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) {
|
||||
8 if (p->key == key) { 8 if (p->key == key) {
|
||||
9 *result = p->data; 9 *result = p->data;
|
||||
10 read_unlock(&listmutex); 10 rcu_read_unlock();
|
||||
11 return 1; 11 return 1;
|
||||
12 } 12 }
|
||||
13 } 13 }
|
||||
14 read_unlock(&listmutex); 14 rcu_read_unlock();
|
||||
15 return 0; 15 return 0;
|
||||
16 } 16 }
|
||||
::
|
||||
|
||||
1 int delete(long key) 1 int delete(long key)
|
||||
2 { 2 {
|
||||
3 struct el *p; 3 struct el *p;
|
||||
4 4
|
||||
5 write_lock(&listmutex); 5 spin_lock(&listmutex);
|
||||
6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) {
|
||||
7 if (p->key == key) { 7 if (p->key == key) {
|
||||
8 list_del(&p->list); 8 list_del_rcu(&p->list);
|
||||
9 write_unlock(&listmutex); 9 spin_unlock(&listmutex);
|
||||
10 synchronize_rcu();
|
||||
10 kfree(p); 11 kfree(p);
|
||||
11 return 1; 12 return 1;
|
||||
12 } 13 }
|
||||
13 } 14 }
|
||||
14 write_unlock(&listmutex); 15 spin_unlock(&listmutex);
|
||||
15 return 0; 16 return 0;
|
||||
16 } 17 }
|
||||
1 int search(long key, int *result) 1 int search(long key, int *result)
|
||||
2 { 2 {
|
||||
3 struct list_head *lp; 3 struct list_head *lp;
|
||||
4 struct el *p; 4 struct el *p;
|
||||
5 5
|
||||
6 read_lock(&listmutex); 6 rcu_read_lock();
|
||||
7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) {
|
||||
8 if (p->key == key) { 8 if (p->key == key) {
|
||||
9 *result = p->data; 9 *result = p->data;
|
||||
10 read_unlock(&listmutex); 10 rcu_read_unlock();
|
||||
11 return 1; 11 return 1;
|
||||
12 } 12 }
|
||||
13 } 13 }
|
||||
14 read_unlock(&listmutex); 14 rcu_read_unlock();
|
||||
15 return 0; 15 return 0;
|
||||
16 } 16 }
|
||||
|
||||
::
|
||||
|
||||
1 int delete(long key) 1 int delete(long key)
|
||||
2 { 2 {
|
||||
3 struct el *p; 3 struct el *p;
|
||||
4 4
|
||||
5 write_lock(&listmutex); 5 spin_lock(&listmutex);
|
||||
6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) {
|
||||
7 if (p->key == key) { 7 if (p->key == key) {
|
||||
8 list_del(&p->list); 8 list_del_rcu(&p->list);
|
||||
9 write_unlock(&listmutex); 9 spin_unlock(&listmutex);
|
||||
10 synchronize_rcu();
|
||||
10 kfree(p); 11 kfree(p);
|
||||
11 return 1; 12 return 1;
|
||||
12 } 13 }
|
||||
13 } 14 }
|
||||
14 write_unlock(&listmutex); 15 spin_unlock(&listmutex);
|
||||
15 return 0; 16 return 0;
|
||||
16 } 17 }
|
||||
|
||||
Either way, the differences are quite small. Read-side locking moves
|
||||
to rcu_read_lock() and rcu_read_unlock, update-side locking moves from
|
||||
|
@ -825,22 +871,27 @@ delete() can now block. If this is a problem, there is a callback-based
|
|||
mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can
|
||||
be used in place of synchronize_rcu().
|
||||
|
||||
.. _7_whatisRCU:
|
||||
|
||||
7. FULL LIST OF RCU APIs
|
||||
-------------------------
|
||||
|
||||
The RCU APIs are documented in docbook-format header comments in the
|
||||
Linux-kernel source code, but it helps to have a full list of the
|
||||
APIs, since there does not appear to be a way to categorize them
|
||||
in docbook. Here is the list, by category.
|
||||
|
||||
RCU list traversal:
|
||||
RCU list traversal::
|
||||
|
||||
list_entry_rcu
|
||||
list_entry_lockless
|
||||
list_first_entry_rcu
|
||||
list_next_rcu
|
||||
list_for_each_entry_rcu
|
||||
list_for_each_entry_continue_rcu
|
||||
list_for_each_entry_from_rcu
|
||||
list_first_or_null_rcu
|
||||
list_next_or_null_rcu
|
||||
hlist_first_rcu
|
||||
hlist_next_rcu
|
||||
hlist_pprev_rcu
|
||||
|
@ -854,7 +905,7 @@ RCU list traversal:
|
|||
hlist_bl_first_rcu
|
||||
hlist_bl_for_each_entry_rcu
|
||||
|
||||
RCU pointer/list update:
|
||||
RCU pointer/list update::
|
||||
|
||||
rcu_assign_pointer
|
||||
list_add_rcu
|
||||
|
@ -864,10 +915,12 @@ RCU pointer/list update:
|
|||
hlist_add_behind_rcu
|
||||
hlist_add_before_rcu
|
||||
hlist_add_head_rcu
|
||||
hlist_add_tail_rcu
|
||||
hlist_del_rcu
|
||||
hlist_del_init_rcu
|
||||
hlist_replace_rcu
|
||||
list_splice_init_rcu()
|
||||
list_splice_init_rcu
|
||||
list_splice_tail_init_rcu
|
||||
hlist_nulls_del_init_rcu
|
||||
hlist_nulls_del_rcu
|
||||
hlist_nulls_add_head_rcu
|
||||
|
@ -876,7 +929,9 @@ RCU pointer/list update:
|
|||
hlist_bl_del_rcu
|
||||
hlist_bl_set_first_rcu
|
||||
|
||||
RCU: Critical sections Grace period Barrier
|
||||
RCU::
|
||||
|
||||
Critical sections Grace period Barrier
|
||||
|
||||
rcu_read_lock synchronize_net rcu_barrier
|
||||
rcu_read_unlock synchronize_rcu
|
||||
|
@ -885,7 +940,9 @@ RCU: Critical sections Grace period Barrier
|
|||
rcu_dereference_check kfree_rcu
|
||||
rcu_dereference_protected
|
||||
|
||||
bh: Critical sections Grace period Barrier
|
||||
bh::
|
||||
|
||||
Critical sections Grace period Barrier
|
||||
|
||||
rcu_read_lock_bh call_rcu rcu_barrier
|
||||
rcu_read_unlock_bh synchronize_rcu
|
||||
|
@ -896,7 +953,9 @@ bh: Critical sections Grace period Barrier
|
|||
rcu_dereference_bh_protected
|
||||
rcu_read_lock_bh_held
|
||||
|
||||
sched: Critical sections Grace period Barrier
|
||||
sched::
|
||||
|
||||
Critical sections Grace period Barrier
|
||||
|
||||
rcu_read_lock_sched call_rcu rcu_barrier
|
||||
rcu_read_unlock_sched synchronize_rcu
|
||||
|
@ -910,7 +969,9 @@ sched: Critical sections Grace period Barrier
|
|||
rcu_read_lock_sched_held
|
||||
|
||||
|
||||
SRCU: Critical sections Grace period Barrier
|
||||
SRCU::
|
||||
|
||||
Critical sections Grace period Barrier
|
||||
|
||||
srcu_read_lock call_srcu srcu_barrier
|
||||
srcu_read_unlock synchronize_srcu
|
||||
|
@ -918,13 +979,14 @@ SRCU: Critical sections Grace period Barrier
|
|||
srcu_dereference_check
|
||||
srcu_read_lock_held
|
||||
|
||||
SRCU: Initialization/cleanup
|
||||
SRCU: Initialization/cleanup::
|
||||
|
||||
DEFINE_SRCU
|
||||
DEFINE_STATIC_SRCU
|
||||
init_srcu_struct
|
||||
cleanup_srcu_struct
|
||||
|
||||
All: lockdep-checked RCU-protected pointer access
|
||||
All: lockdep-checked RCU-protected pointer access::
|
||||
|
||||
rcu_access_pointer
|
||||
rcu_dereference_raw
|
||||
|
@ -974,15 +1036,19 @@ g. Otherwise, use RCU.
|
|||
Of course, this all assumes that you have determined that RCU is in fact
|
||||
the right tool for your job.
|
||||
|
||||
.. _8_whatisRCU:
|
||||
|
||||
8. ANSWERS TO QUICK QUIZZES
|
||||
----------------------------
|
||||
|
||||
Quick Quiz #1: Why is this argument naive? How could a deadlock
|
||||
Quick Quiz #1:
|
||||
Why is this argument naive? How could a deadlock
|
||||
occur when using this algorithm in a real-world Linux
|
||||
kernel? [Referring to the lock-based "toy" RCU
|
||||
algorithm.]
|
||||
|
||||
Answer: Consider the following sequence of events:
|
||||
Answer:
|
||||
Consider the following sequence of events:
|
||||
|
||||
1. CPU 0 acquires some unrelated lock, call it
|
||||
"problematic_lock", disabling irq via
|
||||
|
@ -1021,10 +1087,14 @@ Answer: Consider the following sequence of events:
|
|||
approach where tasks in RCU read-side critical sections
|
||||
cannot be blocked by tasks executing synchronize_rcu().
|
||||
|
||||
Quick Quiz #2: Give an example where Classic RCU's read-side
|
||||
overhead is -negative-.
|
||||
:ref:`Back to Quick Quiz #1 <quiz_1>`
|
||||
|
||||
Answer: Imagine a single-CPU system with a non-CONFIG_PREEMPT
|
||||
Quick Quiz #2:
|
||||
Give an example where Classic RCU's read-side
|
||||
overhead is **negative**.
|
||||
|
||||
Answer:
|
||||
Imagine a single-CPU system with a non-CONFIG_PREEMPT
|
||||
kernel where a routing table is used by process-context
|
||||
code, but can be updated by irq-context code (for example,
|
||||
by an "ICMP REDIRECT" packet). The usual way of handling
|
||||
|
@ -1046,11 +1116,15 @@ Answer: Imagine a single-CPU system with a non-CONFIG_PREEMPT
|
|||
even the theoretical possibility of negative overhead for
|
||||
a synchronization primitive is a bit unexpected. ;-)
|
||||
|
||||
Quick Quiz #3: If it is illegal to block in an RCU read-side
|
||||
:ref:`Back to Quick Quiz #2 <quiz_2>`
|
||||
|
||||
Quick Quiz #3:
|
||||
If it is illegal to block in an RCU read-side
|
||||
critical section, what the heck do you do in
|
||||
PREEMPT_RT, where normal spinlocks can block???
|
||||
|
||||
Answer: Just as PREEMPT_RT permits preemption of spinlock
|
||||
Answer:
|
||||
Just as PREEMPT_RT permits preemption of spinlock
|
||||
critical sections, it permits preemption of RCU
|
||||
read-side critical sections. It also permits
|
||||
spinlocks blocking while in RCU read-side critical
|
||||
|
@ -1069,6 +1143,7 @@ Answer: Just as PREEMPT_RT permits preemption of spinlock
|
|||
Besides, how does the computer know what pizza parlor
|
||||
the human being went to???
|
||||
|
||||
:ref:`Back to Quick Quiz #3 <quiz_3>`
|
||||
|
||||
ACKNOWLEDGEMENTS
|
||||
|
|
@ -3978,6 +3978,19 @@
|
|||
test until boot completes in order to avoid
|
||||
interference.
|
||||
|
||||
rcuperf.kfree_rcu_test= [KNL]
|
||||
Set to measure performance of kfree_rcu() flooding.
|
||||
|
||||
rcuperf.kfree_nthreads= [KNL]
|
||||
The number of threads running loops of kfree_rcu().
|
||||
|
||||
rcuperf.kfree_alloc_num= [KNL]
|
||||
Number of allocations and frees done in an iteration.
|
||||
|
||||
rcuperf.kfree_loops= [KNL]
|
||||
Number of loops doing rcuperf.kfree_alloc_num number
|
||||
of allocations and frees.
|
||||
|
||||
rcuperf.nreaders= [KNL]
|
||||
Set number of RCU readers. The value -1 selects
|
||||
N, where N is the number of CPUs. A value
|
||||
|
|
|
@ -18,8 +18,6 @@
|
|||
* mb() prevents loads and stores being reordered across this point.
|
||||
* rmb() prevents loads being reordered across this point.
|
||||
* wmb() prevents stores being reordered across this point.
|
||||
* read_barrier_depends() prevents data-dependent loads being reordered
|
||||
* across this point (nop on PPC).
|
||||
*
|
||||
* *mb() variants without smp_ prefix must order all types of memory
|
||||
* operations with one another. sync is the only instruction sufficient
|
||||
|
|
|
@ -281,8 +281,8 @@ void mt76_rx_aggr_stop(struct mt76_dev *dev, struct mt76_wcid *wcid, u8 tidno)
|
|||
{
|
||||
struct mt76_rx_tid *tid = NULL;
|
||||
|
||||
rcu_swap_protected(wcid->aggr[tidno], tid,
|
||||
lockdep_is_held(&dev->mutex));
|
||||
tid = rcu_replace_pointer(wcid->aggr[tidno], tid,
|
||||
lockdep_is_held(&dev->mutex));
|
||||
if (tid) {
|
||||
mt76_rx_aggr_shutdown(dev, tid);
|
||||
kfree_rcu(tid, rcu_head);
|
||||
|
|
|
@ -23,6 +23,13 @@
|
|||
#define LIST_HEAD(name) \
|
||||
struct list_head name = LIST_HEAD_INIT(name)
|
||||
|
||||
/**
|
||||
* INIT_LIST_HEAD - Initialize a list_head structure
|
||||
* @list: list_head structure to be initialized.
|
||||
*
|
||||
* Initializes the list_head to point to itself. If it is a list header,
|
||||
* the result is an empty list.
|
||||
*/
|
||||
static inline void INIT_LIST_HEAD(struct list_head *list)
|
||||
{
|
||||
WRITE_ONCE(list->next, list);
|
||||
|
@ -120,12 +127,6 @@ static inline void __list_del_clearprev(struct list_head *entry)
|
|||
entry->prev = NULL;
|
||||
}
|
||||
|
||||
/**
|
||||
* list_del - deletes entry from list.
|
||||
* @entry: the element to delete from the list.
|
||||
* Note: list_empty() on entry does not return true after this, the entry is
|
||||
* in an undefined state.
|
||||
*/
|
||||
static inline void __list_del_entry(struct list_head *entry)
|
||||
{
|
||||
if (!__list_del_entry_valid(entry))
|
||||
|
@ -134,6 +135,12 @@ static inline void __list_del_entry(struct list_head *entry)
|
|||
__list_del(entry->prev, entry->next);
|
||||
}
|
||||
|
||||
/**
|
||||
* list_del - deletes entry from list.
|
||||
* @entry: the element to delete from the list.
|
||||
* Note: list_empty() on entry does not return true after this, the entry is
|
||||
* in an undefined state.
|
||||
*/
|
||||
static inline void list_del(struct list_head *entry)
|
||||
{
|
||||
__list_del_entry(entry);
|
||||
|
@ -157,8 +164,15 @@ static inline void list_replace(struct list_head *old,
|
|||
new->prev->next = new;
|
||||
}
|
||||
|
||||
/**
|
||||
* list_replace_init - replace old entry by new one and initialize the old one
|
||||
* @old : the element to be replaced
|
||||
* @new : the new element to insert
|
||||
*
|
||||
* If @old was empty, it will be overwritten.
|
||||
*/
|
||||
static inline void list_replace_init(struct list_head *old,
|
||||
struct list_head *new)
|
||||
struct list_head *new)
|
||||
{
|
||||
list_replace(old, new);
|
||||
INIT_LIST_HEAD(old);
|
||||
|
@ -744,11 +758,36 @@ static inline void INIT_HLIST_NODE(struct hlist_node *h)
|
|||
h->pprev = NULL;
|
||||
}
|
||||
|
||||
/**
|
||||
* hlist_unhashed - Has node been removed from list and reinitialized?
|
||||
* @h: Node to be checked
|
||||
*
|
||||
* Not that not all removal functions will leave a node in unhashed
|
||||
* state. For example, hlist_nulls_del_init_rcu() does leave the
|
||||
* node in unhashed state, but hlist_nulls_del() does not.
|
||||
*/
|
||||
static inline int hlist_unhashed(const struct hlist_node *h)
|
||||
{
|
||||
return !h->pprev;
|
||||
}
|
||||
|
||||
/**
|
||||
* hlist_unhashed_lockless - Version of hlist_unhashed for lockless use
|
||||
* @h: Node to be checked
|
||||
*
|
||||
* This variant of hlist_unhashed() must be used in lockless contexts
|
||||
* to avoid potential load-tearing. The READ_ONCE() is paired with the
|
||||
* various WRITE_ONCE() in hlist helpers that are defined below.
|
||||
*/
|
||||
static inline int hlist_unhashed_lockless(const struct hlist_node *h)
|
||||
{
|
||||
return !READ_ONCE(h->pprev);
|
||||
}
|
||||
|
||||
/**
|
||||
* hlist_empty - Is the specified hlist_head structure an empty hlist?
|
||||
* @h: Structure to check.
|
||||
*/
|
||||
static inline int hlist_empty(const struct hlist_head *h)
|
||||
{
|
||||
return !READ_ONCE(h->first);
|
||||
|
@ -761,9 +800,16 @@ static inline void __hlist_del(struct hlist_node *n)
|
|||
|
||||
WRITE_ONCE(*pprev, next);
|
||||
if (next)
|
||||
next->pprev = pprev;
|
||||
WRITE_ONCE(next->pprev, pprev);
|
||||
}
|
||||
|
||||
/**
|
||||
* hlist_del - Delete the specified hlist_node from its list
|
||||
* @n: Node to delete.
|
||||
*
|
||||
* Note that this function leaves the node in hashed state. Use
|
||||
* hlist_del_init() or similar instead to unhash @n.
|
||||
*/
|
||||
static inline void hlist_del(struct hlist_node *n)
|
||||
{
|
||||
__hlist_del(n);
|
||||
|
@ -771,6 +817,12 @@ static inline void hlist_del(struct hlist_node *n)
|
|||
n->pprev = LIST_POISON2;
|
||||
}
|
||||
|
||||
/**
|
||||
* hlist_del_init - Delete the specified hlist_node from its list and initialize
|
||||
* @n: Node to delete.
|
||||
*
|
||||
* Note that this function leaves the node in unhashed state.
|
||||
*/
|
||||
static inline void hlist_del_init(struct hlist_node *n)
|
||||
{
|
||||
if (!hlist_unhashed(n)) {
|
||||
|
@ -779,51 +831,83 @@ static inline void hlist_del_init(struct hlist_node *n)
|
|||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* hlist_add_head - add a new entry at the beginning of the hlist
|
||||
* @n: new entry to be added
|
||||
* @h: hlist head to add it after
|
||||
*
|
||||
* Insert a new entry after the specified head.
|
||||
* This is good for implementing stacks.
|
||||
*/
|
||||
static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h)
|
||||
{
|
||||
struct hlist_node *first = h->first;
|
||||
n->next = first;
|
||||
WRITE_ONCE(n->next, first);
|
||||
if (first)
|
||||
first->pprev = &n->next;
|
||||
WRITE_ONCE(first->pprev, &n->next);
|
||||
WRITE_ONCE(h->first, n);
|
||||
n->pprev = &h->first;
|
||||
WRITE_ONCE(n->pprev, &h->first);
|
||||
}
|
||||
|
||||
/* next must be != NULL */
|
||||
/**
|
||||
* hlist_add_before - add a new entry before the one specified
|
||||
* @n: new entry to be added
|
||||
* @next: hlist node to add it before, which must be non-NULL
|
||||
*/
|
||||
static inline void hlist_add_before(struct hlist_node *n,
|
||||
struct hlist_node *next)
|
||||
struct hlist_node *next)
|
||||
{
|
||||
n->pprev = next->pprev;
|
||||
n->next = next;
|
||||
next->pprev = &n->next;
|
||||
WRITE_ONCE(n->pprev, next->pprev);
|
||||
WRITE_ONCE(n->next, next);
|
||||
WRITE_ONCE(next->pprev, &n->next);
|
||||
WRITE_ONCE(*(n->pprev), n);
|
||||
}
|
||||
|
||||
/**
|
||||
* hlist_add_behing - add a new entry after the one specified
|
||||
* @n: new entry to be added
|
||||
* @prev: hlist node to add it after, which must be non-NULL
|
||||
*/
|
||||
static inline void hlist_add_behind(struct hlist_node *n,
|
||||
struct hlist_node *prev)
|
||||
{
|
||||
n->next = prev->next;
|
||||
prev->next = n;
|
||||
n->pprev = &prev->next;
|
||||
WRITE_ONCE(n->next, prev->next);
|
||||
WRITE_ONCE(prev->next, n);
|
||||
WRITE_ONCE(n->pprev, &prev->next);
|
||||
|
||||
if (n->next)
|
||||
n->next->pprev = &n->next;
|
||||
WRITE_ONCE(n->next->pprev, &n->next);
|
||||
}
|
||||
|
||||
/* after that we'll appear to be on some hlist and hlist_del will work */
|
||||
/**
|
||||
* hlist_add_fake - create a fake hlist consisting of a single headless node
|
||||
* @n: Node to make a fake list out of
|
||||
*
|
||||
* This makes @n appear to be its own predecessor on a headless hlist.
|
||||
* The point of this is to allow things like hlist_del() to work correctly
|
||||
* in cases where there is no list.
|
||||
*/
|
||||
static inline void hlist_add_fake(struct hlist_node *n)
|
||||
{
|
||||
n->pprev = &n->next;
|
||||
}
|
||||
|
||||
/**
|
||||
* hlist_fake: Is this node a fake hlist?
|
||||
* @h: Node to check for being a self-referential fake hlist.
|
||||
*/
|
||||
static inline bool hlist_fake(struct hlist_node *h)
|
||||
{
|
||||
return h->pprev == &h->next;
|
||||
}
|
||||
|
||||
/*
|
||||
/**
|
||||
* hlist_is_singular_node - is node the only element of the specified hlist?
|
||||
* @n: Node to check for singularity.
|
||||
* @h: Header for potentially singular list.
|
||||
*
|
||||
* Check whether the node is the only node of the head without
|
||||
* accessing head:
|
||||
* accessing head, thus avoiding unnecessary cache misses.
|
||||
*/
|
||||
static inline bool
|
||||
hlist_is_singular_node(struct hlist_node *n, struct hlist_head *h)
|
||||
|
@ -831,7 +915,11 @@ hlist_is_singular_node(struct hlist_node *n, struct hlist_head *h)
|
|||
return !n->next && n->pprev == &h->first;
|
||||
}
|
||||
|
||||
/*
|
||||
/**
|
||||
* hlist_move_list - Move an hlist
|
||||
* @old: hlist_head for old list.
|
||||
* @new: hlist_head for new list.
|
||||
*
|
||||
* Move a list from one list head to another. Fixup the pprev
|
||||
* reference of the first entry if it exists.
|
||||
*/
|
||||
|
|
|
@ -56,11 +56,33 @@ static inline unsigned long get_nulls_value(const struct hlist_nulls_node *ptr)
|
|||
return ((unsigned long)ptr) >> 1;
|
||||
}
|
||||
|
||||
/**
|
||||
* hlist_nulls_unhashed - Has node been removed and reinitialized?
|
||||
* @h: Node to be checked
|
||||
*
|
||||
* Not that not all removal functions will leave a node in unhashed state.
|
||||
* For example, hlist_del_init_rcu() leaves the node in unhashed state,
|
||||
* but hlist_nulls_del() does not.
|
||||
*/
|
||||
static inline int hlist_nulls_unhashed(const struct hlist_nulls_node *h)
|
||||
{
|
||||
return !h->pprev;
|
||||
}
|
||||
|
||||
/**
|
||||
* hlist_nulls_unhashed_lockless - Has node been removed and reinitialized?
|
||||
* @h: Node to be checked
|
||||
*
|
||||
* Not that not all removal functions will leave a node in unhashed state.
|
||||
* For example, hlist_del_init_rcu() leaves the node in unhashed state,
|
||||
* but hlist_nulls_del() does not. Unlike hlist_nulls_unhashed(), this
|
||||
* function may be used locklessly.
|
||||
*/
|
||||
static inline int hlist_nulls_unhashed_lockless(const struct hlist_nulls_node *h)
|
||||
{
|
||||
return !READ_ONCE(h->pprev);
|
||||
}
|
||||
|
||||
static inline int hlist_nulls_empty(const struct hlist_nulls_head *h)
|
||||
{
|
||||
return is_a_nulls(READ_ONCE(h->first));
|
||||
|
@ -72,10 +94,10 @@ static inline void hlist_nulls_add_head(struct hlist_nulls_node *n,
|
|||
struct hlist_nulls_node *first = h->first;
|
||||
|
||||
n->next = first;
|
||||
n->pprev = &h->first;
|
||||
WRITE_ONCE(n->pprev, &h->first);
|
||||
h->first = n;
|
||||
if (!is_a_nulls(first))
|
||||
first->pprev = &n->next;
|
||||
WRITE_ONCE(first->pprev, &n->next);
|
||||
}
|
||||
|
||||
static inline void __hlist_nulls_del(struct hlist_nulls_node *n)
|
||||
|
@ -85,13 +107,13 @@ static inline void __hlist_nulls_del(struct hlist_nulls_node *n)
|
|||
|
||||
WRITE_ONCE(*pprev, next);
|
||||
if (!is_a_nulls(next))
|
||||
next->pprev = pprev;
|
||||
WRITE_ONCE(next->pprev, pprev);
|
||||
}
|
||||
|
||||
static inline void hlist_nulls_del(struct hlist_nulls_node *n)
|
||||
{
|
||||
__hlist_nulls_del(n);
|
||||
n->pprev = LIST_POISON2;
|
||||
WRITE_ONCE(n->pprev, LIST_POISON2);
|
||||
}
|
||||
|
||||
/**
|
||||
|
|
|
@ -22,7 +22,6 @@ struct rcu_cblist {
|
|||
struct rcu_head *head;
|
||||
struct rcu_head **tail;
|
||||
long len;
|
||||
long len_lazy;
|
||||
};
|
||||
|
||||
#define RCU_CBLIST_INITIALIZER(n) { .head = NULL, .tail = &n.head }
|
||||
|
@ -73,7 +72,6 @@ struct rcu_segcblist {
|
|||
#else
|
||||
long len;
|
||||
#endif
|
||||
long len_lazy;
|
||||
u8 enabled;
|
||||
u8 offloaded;
|
||||
};
|
||||
|
|
|
@ -40,6 +40,16 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
|
|||
*/
|
||||
#define list_next_rcu(list) (*((struct list_head __rcu **)(&(list)->next)))
|
||||
|
||||
/**
|
||||
* list_tail_rcu - returns the prev pointer of the head of the list
|
||||
* @head: the head of the list
|
||||
*
|
||||
* Note: This should only be used with the list header, and even then
|
||||
* only if list_del() and similar primitives are not also used on the
|
||||
* list header.
|
||||
*/
|
||||
#define list_tail_rcu(head) (*((struct list_head __rcu **)(&(head)->prev)))
|
||||
|
||||
/*
|
||||
* Check during list traversal that we are within an RCU reader
|
||||
*/
|
||||
|
@ -173,7 +183,7 @@ static inline void hlist_del_init_rcu(struct hlist_node *n)
|
|||
{
|
||||
if (!hlist_unhashed(n)) {
|
||||
__hlist_del(n);
|
||||
n->pprev = NULL;
|
||||
WRITE_ONCE(n->pprev, NULL);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -361,7 +371,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
|
|||
* @pos: the type * to use as a loop cursor.
|
||||
* @head: the head for your list.
|
||||
* @member: the name of the list_head within the struct.
|
||||
* @cond: optional lockdep expression if called from non-RCU protection.
|
||||
* @cond...: optional lockdep expression if called from non-RCU protection.
|
||||
*
|
||||
* This list-traversal primitive may safely run concurrently with
|
||||
* the _rcu list-mutation primitives such as list_add_rcu()
|
||||
|
@ -473,7 +483,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
|
|||
static inline void hlist_del_rcu(struct hlist_node *n)
|
||||
{
|
||||
__hlist_del(n);
|
||||
n->pprev = LIST_POISON2;
|
||||
WRITE_ONCE(n->pprev, LIST_POISON2);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -489,11 +499,11 @@ static inline void hlist_replace_rcu(struct hlist_node *old,
|
|||
struct hlist_node *next = old->next;
|
||||
|
||||
new->next = next;
|
||||
new->pprev = old->pprev;
|
||||
WRITE_ONCE(new->pprev, old->pprev);
|
||||
rcu_assign_pointer(*(struct hlist_node __rcu **)new->pprev, new);
|
||||
if (next)
|
||||
new->next->pprev = &new->next;
|
||||
old->pprev = LIST_POISON2;
|
||||
WRITE_ONCE(new->next->pprev, &new->next);
|
||||
WRITE_ONCE(old->pprev, LIST_POISON2);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -528,10 +538,10 @@ static inline void hlist_add_head_rcu(struct hlist_node *n,
|
|||
struct hlist_node *first = h->first;
|
||||
|
||||
n->next = first;
|
||||
n->pprev = &h->first;
|
||||
WRITE_ONCE(n->pprev, &h->first);
|
||||
rcu_assign_pointer(hlist_first_rcu(h), n);
|
||||
if (first)
|
||||
first->pprev = &n->next;
|
||||
WRITE_ONCE(first->pprev, &n->next);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -564,7 +574,7 @@ static inline void hlist_add_tail_rcu(struct hlist_node *n,
|
|||
|
||||
if (last) {
|
||||
n->next = last->next;
|
||||
n->pprev = &last->next;
|
||||
WRITE_ONCE(n->pprev, &last->next);
|
||||
rcu_assign_pointer(hlist_next_rcu(last), n);
|
||||
} else {
|
||||
hlist_add_head_rcu(n, h);
|
||||
|
@ -592,10 +602,10 @@ static inline void hlist_add_tail_rcu(struct hlist_node *n,
|
|||
static inline void hlist_add_before_rcu(struct hlist_node *n,
|
||||
struct hlist_node *next)
|
||||
{
|
||||
n->pprev = next->pprev;
|
||||
WRITE_ONCE(n->pprev, next->pprev);
|
||||
n->next = next;
|
||||
rcu_assign_pointer(hlist_pprev_rcu(n), n);
|
||||
next->pprev = &n->next;
|
||||
WRITE_ONCE(next->pprev, &n->next);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -620,10 +630,10 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
|
|||
struct hlist_node *prev)
|
||||
{
|
||||
n->next = prev->next;
|
||||
n->pprev = &prev->next;
|
||||
WRITE_ONCE(n->pprev, &prev->next);
|
||||
rcu_assign_pointer(hlist_next_rcu(prev), n);
|
||||
if (n->next)
|
||||
n->next->pprev = &n->next;
|
||||
WRITE_ONCE(n->next->pprev, &n->next);
|
||||
}
|
||||
|
||||
#define __hlist_for_each_rcu(pos, head) \
|
||||
|
@ -636,7 +646,7 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
|
|||
* @pos: the type * to use as a loop cursor.
|
||||
* @head: the head for your list.
|
||||
* @member: the name of the hlist_node within the struct.
|
||||
* @cond: optional lockdep expression if called from non-RCU protection.
|
||||
* @cond...: optional lockdep expression if called from non-RCU protection.
|
||||
*
|
||||
* This list-traversal primitive may safely run concurrently with
|
||||
* the _rcu list-mutation primitives such as hlist_add_head_rcu()
|
||||
|
|
|
@ -34,13 +34,21 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
|
|||
{
|
||||
if (!hlist_nulls_unhashed(n)) {
|
||||
__hlist_nulls_del(n);
|
||||
n->pprev = NULL;
|
||||
WRITE_ONCE(n->pprev, NULL);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* hlist_nulls_first_rcu - returns the first element of the hash list.
|
||||
* @head: the head of the list.
|
||||
*/
|
||||
#define hlist_nulls_first_rcu(head) \
|
||||
(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
|
||||
|
||||
/**
|
||||
* hlist_nulls_next_rcu - returns the element of the list after @node.
|
||||
* @node: element of the list.
|
||||
*/
|
||||
#define hlist_nulls_next_rcu(node) \
|
||||
(*((struct hlist_nulls_node __rcu __force **)&(node)->next))
|
||||
|
||||
|
@ -66,7 +74,7 @@ static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
|
|||
static inline void hlist_nulls_del_rcu(struct hlist_nulls_node *n)
|
||||
{
|
||||
__hlist_nulls_del(n);
|
||||
n->pprev = LIST_POISON2;
|
||||
WRITE_ONCE(n->pprev, LIST_POISON2);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -94,10 +102,10 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
|
|||
struct hlist_nulls_node *first = h->first;
|
||||
|
||||
n->next = first;
|
||||
n->pprev = &h->first;
|
||||
WRITE_ONCE(n->pprev, &h->first);
|
||||
rcu_assign_pointer(hlist_nulls_first_rcu(h), n);
|
||||
if (!is_a_nulls(first))
|
||||
first->pprev = &n->next;
|
||||
WRITE_ONCE(first->pprev, &n->next);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -141,7 +149,7 @@ static inline void hlist_nulls_add_tail_rcu(struct hlist_nulls_node *n,
|
|||
* hlist_nulls_for_each_entry_rcu - iterate over rcu list of given type
|
||||
* @tpos: the type * to use as a loop cursor.
|
||||
* @pos: the &struct hlist_nulls_node to use as a loop cursor.
|
||||
* @head: the head for your list.
|
||||
* @head: the head of the list.
|
||||
* @member: the name of the hlist_nulls_node within the struct.
|
||||
*
|
||||
* The barrier() is needed to make sure compiler doesn't cache first element [1],
|
||||
|
@ -161,7 +169,7 @@ static inline void hlist_nulls_add_tail_rcu(struct hlist_nulls_node *n,
|
|||
* iterate over list of given type safe against removal of list entry
|
||||
* @tpos: the type * to use as a loop cursor.
|
||||
* @pos: the &struct hlist_nulls_node to use as a loop cursor.
|
||||
* @head: the head for your list.
|
||||
* @head: the head of the list.
|
||||
* @member: the name of the hlist_nulls_node within the struct.
|
||||
*/
|
||||
#define hlist_nulls_for_each_entry_safe(tpos, pos, head, member) \
|
||||
|
|
|
@ -154,7 +154,7 @@ static inline void exit_tasks_rcu_finish(void) { }
|
|||
*
|
||||
* This macro resembles cond_resched(), except that it is defined to
|
||||
* report potential quiescent states to RCU-tasks even if the cond_resched()
|
||||
* machinery were to be shut off, as some advocate for PREEMPT kernels.
|
||||
* machinery were to be shut off, as some advocate for PREEMPTION kernels.
|
||||
*/
|
||||
#define cond_resched_tasks_rcu_qs() \
|
||||
do { \
|
||||
|
@ -167,7 +167,7 @@ do { \
|
|||
* TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
|
||||
*/
|
||||
|
||||
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
|
||||
#if defined(CONFIG_TREE_RCU)
|
||||
#include <linux/rcutree.h>
|
||||
#elif defined(CONFIG_TINY_RCU)
|
||||
#include <linux/rcutiny.h>
|
||||
|
@ -400,22 +400,6 @@ do { \
|
|||
__tmp; \
|
||||
})
|
||||
|
||||
/**
|
||||
* rcu_swap_protected() - swap an RCU and a regular pointer
|
||||
* @rcu_ptr: RCU pointer
|
||||
* @ptr: regular pointer
|
||||
* @c: the conditions under which the dereference will take place
|
||||
*
|
||||
* Perform swap(@rcu_ptr, @ptr) where @rcu_ptr is an RCU-annotated pointer and
|
||||
* @c is the argument that is passed to the rcu_dereference_protected() call
|
||||
* used to read that pointer.
|
||||
*/
|
||||
#define rcu_swap_protected(rcu_ptr, ptr, c) do { \
|
||||
typeof(ptr) __tmp = rcu_dereference_protected((rcu_ptr), (c)); \
|
||||
rcu_assign_pointer((rcu_ptr), (ptr)); \
|
||||
(ptr) = __tmp; \
|
||||
} while (0)
|
||||
|
||||
/**
|
||||
* rcu_access_pointer() - fetch RCU pointer with no dereferencing
|
||||
* @p: The pointer to read
|
||||
|
@ -598,10 +582,10 @@ do { \
|
|||
*
|
||||
* You can avoid reading and understanding the next paragraph by
|
||||
* following this rule: don't put anything in an rcu_read_lock() RCU
|
||||
* read-side critical section that would block in a !PREEMPT kernel.
|
||||
* read-side critical section that would block in a !PREEMPTION kernel.
|
||||
* But if you want the full story, read on!
|
||||
*
|
||||
* In non-preemptible RCU implementations (TREE_RCU and TINY_RCU),
|
||||
* In non-preemptible RCU implementations (pure TREE_RCU and TINY_RCU),
|
||||
* it is illegal to block while in an RCU read-side critical section.
|
||||
* In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPTION
|
||||
* kernel builds, RCU read-side critical sections may be preempted,
|
||||
|
@ -912,4 +896,8 @@ rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f)
|
|||
return false;
|
||||
}
|
||||
|
||||
/* kernel/ksysfs.c definitions */
|
||||
extern int rcu_expedited;
|
||||
extern int rcu_normal;
|
||||
|
||||
#endif /* __LINUX_RCUPDATE_H */
|
||||
|
|
|
@ -85,6 +85,7 @@ static inline void rcu_scheduler_starting(void) { }
|
|||
static inline void rcu_end_inkernel_boot(void) { }
|
||||
static inline bool rcu_is_watching(void) { return true; }
|
||||
static inline void rcu_momentary_dyntick_idle(void) { }
|
||||
static inline void kfree_rcu_scheduler_running(void) { }
|
||||
|
||||
/* Avoid RCU read-side critical sections leaking across. */
|
||||
static inline void rcu_all_qs(void) { barrier(); }
|
||||
|
|
|
@ -38,6 +38,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
|
|||
void rcu_barrier(void);
|
||||
bool rcu_eqs_special_set(int cpu);
|
||||
void rcu_momentary_dyntick_idle(void);
|
||||
void kfree_rcu_scheduler_running(void);
|
||||
unsigned long get_state_synchronize_rcu(void);
|
||||
void cond_synchronize_rcu(unsigned long oldstate);
|
||||
|
||||
|
|
|
@ -109,8 +109,10 @@ enum tick_dep_bits {
|
|||
TICK_DEP_BIT_PERF_EVENTS = 1,
|
||||
TICK_DEP_BIT_SCHED = 2,
|
||||
TICK_DEP_BIT_CLOCK_UNSTABLE = 3,
|
||||
TICK_DEP_BIT_RCU = 4
|
||||
TICK_DEP_BIT_RCU = 4,
|
||||
TICK_DEP_BIT_RCU_EXP = 5
|
||||
};
|
||||
#define TICK_DEP_BIT_MAX TICK_DEP_BIT_RCU_EXP
|
||||
|
||||
#define TICK_DEP_MASK_NONE 0
|
||||
#define TICK_DEP_MASK_POSIX_TIMER (1 << TICK_DEP_BIT_POSIX_TIMER)
|
||||
|
@ -118,6 +120,7 @@ enum tick_dep_bits {
|
|||
#define TICK_DEP_MASK_SCHED (1 << TICK_DEP_BIT_SCHED)
|
||||
#define TICK_DEP_MASK_CLOCK_UNSTABLE (1 << TICK_DEP_BIT_CLOCK_UNSTABLE)
|
||||
#define TICK_DEP_MASK_RCU (1 << TICK_DEP_BIT_RCU)
|
||||
#define TICK_DEP_MASK_RCU_EXP (1 << TICK_DEP_BIT_RCU_EXP)
|
||||
|
||||
#ifdef CONFIG_NO_HZ_COMMON
|
||||
extern bool tick_nohz_enabled;
|
||||
|
|
|
@ -41,7 +41,7 @@ TRACE_EVENT(rcu_utilization,
|
|||
TP_printk("%s", __entry->s)
|
||||
);
|
||||
|
||||
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
|
||||
#if defined(CONFIG_TREE_RCU)
|
||||
|
||||
/*
|
||||
* Tracepoint for grace-period events. Takes a string identifying the
|
||||
|
@ -432,7 +432,7 @@ TRACE_EVENT_RCU(rcu_fqs,
|
|||
__entry->cpu, __entry->qsevent)
|
||||
);
|
||||
|
||||
#endif /* #if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) */
|
||||
#endif /* #if defined(CONFIG_TREE_RCU) */
|
||||
|
||||
/*
|
||||
* Tracepoint for dyntick-idle entry/exit events. These take a string
|
||||
|
@ -449,7 +449,7 @@ TRACE_EVENT_RCU(rcu_fqs,
|
|||
*/
|
||||
TRACE_EVENT_RCU(rcu_dyntick,
|
||||
|
||||
TP_PROTO(const char *polarity, long oldnesting, long newnesting, atomic_t dynticks),
|
||||
TP_PROTO(const char *polarity, long oldnesting, long newnesting, int dynticks),
|
||||
|
||||
TP_ARGS(polarity, oldnesting, newnesting, dynticks),
|
||||
|
||||
|
@ -464,7 +464,7 @@ TRACE_EVENT_RCU(rcu_dyntick,
|
|||
__entry->polarity = polarity;
|
||||
__entry->oldnesting = oldnesting;
|
||||
__entry->newnesting = newnesting;
|
||||
__entry->dynticks = atomic_read(&dynticks);
|
||||
__entry->dynticks = dynticks;
|
||||
),
|
||||
|
||||
TP_printk("%s %lx %lx %#3x", __entry->polarity,
|
||||
|
@ -481,16 +481,14 @@ TRACE_EVENT_RCU(rcu_dyntick,
|
|||
*/
|
||||
TRACE_EVENT_RCU(rcu_callback,
|
||||
|
||||
TP_PROTO(const char *rcuname, struct rcu_head *rhp, long qlen_lazy,
|
||||
long qlen),
|
||||
TP_PROTO(const char *rcuname, struct rcu_head *rhp, long qlen),
|
||||
|
||||
TP_ARGS(rcuname, rhp, qlen_lazy, qlen),
|
||||
TP_ARGS(rcuname, rhp, qlen),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(const char *, rcuname)
|
||||
__field(void *, rhp)
|
||||
__field(void *, func)
|
||||
__field(long, qlen_lazy)
|
||||
__field(long, qlen)
|
||||
),
|
||||
|
||||
|
@ -498,13 +496,12 @@ TRACE_EVENT_RCU(rcu_callback,
|
|||
__entry->rcuname = rcuname;
|
||||
__entry->rhp = rhp;
|
||||
__entry->func = rhp->func;
|
||||
__entry->qlen_lazy = qlen_lazy;
|
||||
__entry->qlen = qlen;
|
||||
),
|
||||
|
||||
TP_printk("%s rhp=%p func=%ps %ld/%ld",
|
||||
TP_printk("%s rhp=%p func=%ps %ld",
|
||||
__entry->rcuname, __entry->rhp, __entry->func,
|
||||
__entry->qlen_lazy, __entry->qlen)
|
||||
__entry->qlen)
|
||||
);
|
||||
|
||||
/*
|
||||
|
@ -518,15 +515,14 @@ TRACE_EVENT_RCU(rcu_callback,
|
|||
TRACE_EVENT_RCU(rcu_kfree_callback,
|
||||
|
||||
TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset,
|
||||
long qlen_lazy, long qlen),
|
||||
long qlen),
|
||||
|
||||
TP_ARGS(rcuname, rhp, offset, qlen_lazy, qlen),
|
||||
TP_ARGS(rcuname, rhp, offset, qlen),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(const char *, rcuname)
|
||||
__field(void *, rhp)
|
||||
__field(unsigned long, offset)
|
||||
__field(long, qlen_lazy)
|
||||
__field(long, qlen)
|
||||
),
|
||||
|
||||
|
@ -534,13 +530,12 @@ TRACE_EVENT_RCU(rcu_kfree_callback,
|
|||
__entry->rcuname = rcuname;
|
||||
__entry->rhp = rhp;
|
||||
__entry->offset = offset;
|
||||
__entry->qlen_lazy = qlen_lazy;
|
||||
__entry->qlen = qlen;
|
||||
),
|
||||
|
||||
TP_printk("%s rhp=%p func=%ld %ld/%ld",
|
||||
TP_printk("%s rhp=%p func=%ld %ld",
|
||||
__entry->rcuname, __entry->rhp, __entry->offset,
|
||||
__entry->qlen_lazy, __entry->qlen)
|
||||
__entry->qlen)
|
||||
);
|
||||
|
||||
/*
|
||||
|
@ -552,27 +547,24 @@ TRACE_EVENT_RCU(rcu_kfree_callback,
|
|||
*/
|
||||
TRACE_EVENT_RCU(rcu_batch_start,
|
||||
|
||||
TP_PROTO(const char *rcuname, long qlen_lazy, long qlen, long blimit),
|
||||
TP_PROTO(const char *rcuname, long qlen, long blimit),
|
||||
|
||||
TP_ARGS(rcuname, qlen_lazy, qlen, blimit),
|
||||
TP_ARGS(rcuname, qlen, blimit),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(const char *, rcuname)
|
||||
__field(long, qlen_lazy)
|
||||
__field(long, qlen)
|
||||
__field(long, blimit)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->rcuname = rcuname;
|
||||
__entry->qlen_lazy = qlen_lazy;
|
||||
__entry->qlen = qlen;
|
||||
__entry->blimit = blimit;
|
||||
),
|
||||
|
||||
TP_printk("%s CBs=%ld/%ld bl=%ld",
|
||||
__entry->rcuname, __entry->qlen_lazy, __entry->qlen,
|
||||
__entry->blimit)
|
||||
TP_printk("%s CBs=%ld bl=%ld",
|
||||
__entry->rcuname, __entry->qlen, __entry->blimit)
|
||||
);
|
||||
|
||||
/*
|
||||
|
|
|
@ -7,7 +7,7 @@ menu "RCU Subsystem"
|
|||
|
||||
config TREE_RCU
|
||||
bool
|
||||
default y if !PREEMPTION && SMP
|
||||
default y if SMP
|
||||
help
|
||||
This option selects the RCU implementation that is
|
||||
designed for very large SMP system with hundreds or
|
||||
|
@ -17,6 +17,7 @@ config TREE_RCU
|
|||
config PREEMPT_RCU
|
||||
bool
|
||||
default y if PREEMPTION
|
||||
select TREE_RCU
|
||||
help
|
||||
This option selects the RCU implementation that is
|
||||
designed for very large SMP systems with hundreds or
|
||||
|
@ -78,7 +79,7 @@ config TASKS_RCU
|
|||
user-mode execution as quiescent states.
|
||||
|
||||
config RCU_STALL_COMMON
|
||||
def_bool ( TREE_RCU || PREEMPT_RCU )
|
||||
def_bool TREE_RCU
|
||||
help
|
||||
This option enables RCU CPU stall code that is common between
|
||||
the TINY and TREE variants of RCU. The purpose is to allow
|
||||
|
@ -86,13 +87,13 @@ config RCU_STALL_COMMON
|
|||
making these warnings mandatory for the tree variants.
|
||||
|
||||
config RCU_NEED_SEGCBLIST
|
||||
def_bool ( TREE_RCU || PREEMPT_RCU || TREE_SRCU )
|
||||
def_bool ( TREE_RCU || TREE_SRCU )
|
||||
|
||||
config RCU_FANOUT
|
||||
int "Tree-based hierarchical RCU fanout value"
|
||||
range 2 64 if 64BIT
|
||||
range 2 32 if !64BIT
|
||||
depends on (TREE_RCU || PREEMPT_RCU) && RCU_EXPERT
|
||||
depends on TREE_RCU && RCU_EXPERT
|
||||
default 64 if 64BIT
|
||||
default 32 if !64BIT
|
||||
help
|
||||
|
@ -112,7 +113,7 @@ config RCU_FANOUT_LEAF
|
|||
int "Tree-based hierarchical RCU leaf-level fanout value"
|
||||
range 2 64 if 64BIT
|
||||
range 2 32 if !64BIT
|
||||
depends on (TREE_RCU || PREEMPT_RCU) && RCU_EXPERT
|
||||
depends on TREE_RCU && RCU_EXPERT
|
||||
default 16
|
||||
help
|
||||
This option controls the leaf-level fanout of hierarchical
|
||||
|
@ -187,7 +188,7 @@ config RCU_BOOST_DELAY
|
|||
|
||||
config RCU_NOCB_CPU
|
||||
bool "Offload RCU callback processing from boot-selected CPUs"
|
||||
depends on TREE_RCU || PREEMPT_RCU
|
||||
depends on TREE_RCU
|
||||
depends on RCU_EXPERT || NO_HZ_FULL
|
||||
default n
|
||||
help
|
||||
|
@ -200,8 +201,8 @@ config RCU_NOCB_CPU
|
|||
specified at boot time by the rcu_nocbs parameter. For each
|
||||
such CPU, a kthread ("rcuox/N") will be created to invoke
|
||||
callbacks, where the "N" is the CPU being offloaded, and where
|
||||
the "p" for RCU-preempt (PREEMPT kernels) and "s" for RCU-sched
|
||||
(!PREEMPT kernels). Nothing prevents this kthread from running
|
||||
the "p" for RCU-preempt (PREEMPTION kernels) and "s" for RCU-sched
|
||||
(!PREEMPTION kernels). Nothing prevents this kthread from running
|
||||
on the specified CPUs, but (1) the kthreads may be preempted
|
||||
between each callback, and (2) affinity or cgroups can be used
|
||||
to force the kthreads to run on whatever set of CPUs is desired.
|
||||
|
|
|
@ -9,6 +9,5 @@ obj-$(CONFIG_TINY_SRCU) += srcutiny.o
|
|||
obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
|
||||
obj-$(CONFIG_RCU_PERF_TEST) += rcuperf.o
|
||||
obj-$(CONFIG_TREE_RCU) += tree.o
|
||||
obj-$(CONFIG_PREEMPT_RCU) += tree.o
|
||||
obj-$(CONFIG_TINY_RCU) += tiny.o
|
||||
obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o
|
||||
|
|
|
@ -198,33 +198,6 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head)
|
|||
}
|
||||
#endif /* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
|
||||
|
||||
void kfree(const void *);
|
||||
|
||||
/*
|
||||
* Reclaim the specified callback, either by invoking it (non-lazy case)
|
||||
* or freeing it directly (lazy case). Return true if lazy, false otherwise.
|
||||
*/
|
||||
static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
|
||||
{
|
||||
rcu_callback_t f;
|
||||
unsigned long offset = (unsigned long)head->func;
|
||||
|
||||
rcu_lock_acquire(&rcu_callback_map);
|
||||
if (__is_kfree_rcu_offset(offset)) {
|
||||
trace_rcu_invoke_kfree_callback(rn, head, offset);
|
||||
kfree((void *)head - offset);
|
||||
rcu_lock_release(&rcu_callback_map);
|
||||
return true;
|
||||
} else {
|
||||
trace_rcu_invoke_callback(rn, head);
|
||||
f = head->func;
|
||||
WRITE_ONCE(head->func, (rcu_callback_t)0L);
|
||||
f(head);
|
||||
rcu_lock_release(&rcu_callback_map);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
#ifdef CONFIG_RCU_STALL_COMMON
|
||||
|
||||
extern int rcu_cpu_stall_ftrace_dump;
|
||||
|
@ -281,7 +254,7 @@ void rcu_test_sync_prims(void);
|
|||
*/
|
||||
extern void resched_cpu(int cpu);
|
||||
|
||||
#if defined(SRCU) || !defined(TINY_RCU)
|
||||
#if defined(CONFIG_SRCU) || !defined(CONFIG_TINY_RCU)
|
||||
|
||||
#include <linux/rcu_node_tree.h>
|
||||
|
||||
|
@ -418,7 +391,7 @@ do { \
|
|||
#define raw_lockdep_assert_held_rcu_node(p) \
|
||||
lockdep_assert_held(&ACCESS_PRIVATE(p, lock))
|
||||
|
||||
#endif /* #if defined(SRCU) || !defined(TINY_RCU) */
|
||||
#endif /* #if defined(CONFIG_SRCU) || !defined(CONFIG_TINY_RCU) */
|
||||
|
||||
#ifdef CONFIG_SRCU
|
||||
void srcu_init(void);
|
||||
|
@ -454,7 +427,7 @@ enum rcutorture_type {
|
|||
INVALID_RCU_FLAVOR
|
||||
};
|
||||
|
||||
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
|
||||
#if defined(CONFIG_TREE_RCU)
|
||||
void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
|
||||
unsigned long *gp_seq);
|
||||
void do_trace_rcu_torture_read(const char *rcutorturename,
|
||||
|
|
|
@ -20,14 +20,10 @@ void rcu_cblist_init(struct rcu_cblist *rclp)
|
|||
rclp->head = NULL;
|
||||
rclp->tail = &rclp->head;
|
||||
rclp->len = 0;
|
||||
rclp->len_lazy = 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Enqueue an rcu_head structure onto the specified callback list.
|
||||
* This function assumes that the callback is non-lazy because it
|
||||
* is intended for use by no-CBs CPUs, which do not distinguish
|
||||
* between lazy and non-lazy RCU callbacks.
|
||||
*/
|
||||
void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp)
|
||||
{
|
||||
|
@ -54,7 +50,6 @@ void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
|
|||
else
|
||||
drclp->tail = &drclp->head;
|
||||
drclp->len = srclp->len;
|
||||
drclp->len_lazy = srclp->len_lazy;
|
||||
if (!rhp) {
|
||||
rcu_cblist_init(srclp);
|
||||
} else {
|
||||
|
@ -62,16 +57,12 @@ void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
|
|||
srclp->head = rhp;
|
||||
srclp->tail = &rhp->next;
|
||||
WRITE_ONCE(srclp->len, 1);
|
||||
srclp->len_lazy = 0;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Dequeue the oldest rcu_head structure from the specified callback
|
||||
* list. This function assumes that the callback is non-lazy, but
|
||||
* the caller can later invoke rcu_cblist_dequeued_lazy() if it
|
||||
* finds otherwise (and if it cares about laziness). This allows
|
||||
* different users to have different ways of determining laziness.
|
||||
* list.
|
||||
*/
|
||||
struct rcu_head *rcu_cblist_dequeue(struct rcu_cblist *rclp)
|
||||
{
|
||||
|
@ -161,7 +152,6 @@ void rcu_segcblist_init(struct rcu_segcblist *rsclp)
|
|||
for (i = 0; i < RCU_CBLIST_NSEGS; i++)
|
||||
rsclp->tails[i] = &rsclp->head;
|
||||
rcu_segcblist_set_len(rsclp, 0);
|
||||
rsclp->len_lazy = 0;
|
||||
rsclp->enabled = 1;
|
||||
}
|
||||
|
||||
|
@ -173,7 +163,6 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
|
|||
{
|
||||
WARN_ON_ONCE(!rcu_segcblist_empty(rsclp));
|
||||
WARN_ON_ONCE(rcu_segcblist_n_cbs(rsclp));
|
||||
WARN_ON_ONCE(rcu_segcblist_n_lazy_cbs(rsclp));
|
||||
rsclp->enabled = 0;
|
||||
}
|
||||
|
||||
|
@ -253,11 +242,9 @@ bool rcu_segcblist_nextgp(struct rcu_segcblist *rsclp, unsigned long *lp)
|
|||
* absolutely not OK for it to ever miss posting a callback.
|
||||
*/
|
||||
void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
|
||||
struct rcu_head *rhp, bool lazy)
|
||||
struct rcu_head *rhp)
|
||||
{
|
||||
rcu_segcblist_inc_len(rsclp);
|
||||
if (lazy)
|
||||
rsclp->len_lazy++;
|
||||
smp_mb(); /* Ensure counts are updated before callback is enqueued. */
|
||||
rhp->next = NULL;
|
||||
WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp);
|
||||
|
@ -275,15 +262,13 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
|
|||
* period. You have been warned.
|
||||
*/
|
||||
bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
|
||||
struct rcu_head *rhp, bool lazy)
|
||||
struct rcu_head *rhp)
|
||||
{
|
||||
int i;
|
||||
|
||||
if (rcu_segcblist_n_cbs(rsclp) == 0)
|
||||
return false;
|
||||
rcu_segcblist_inc_len(rsclp);
|
||||
if (lazy)
|
||||
rsclp->len_lazy++;
|
||||
smp_mb(); /* Ensure counts are updated before callback is entrained. */
|
||||
rhp->next = NULL;
|
||||
for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--)
|
||||
|
@ -307,8 +292,6 @@ bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
|
|||
void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp,
|
||||
struct rcu_cblist *rclp)
|
||||
{
|
||||
rclp->len_lazy += rsclp->len_lazy;
|
||||
rsclp->len_lazy = 0;
|
||||
rclp->len = rcu_segcblist_xchg_len(rsclp, 0);
|
||||
}
|
||||
|
||||
|
@ -361,9 +344,7 @@ void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
|
|||
void rcu_segcblist_insert_count(struct rcu_segcblist *rsclp,
|
||||
struct rcu_cblist *rclp)
|
||||
{
|
||||
rsclp->len_lazy += rclp->len_lazy;
|
||||
rcu_segcblist_add_len(rsclp, rclp->len);
|
||||
rclp->len_lazy = 0;
|
||||
rclp->len = 0;
|
||||
}
|
||||
|
||||
|
|
|
@ -15,15 +15,6 @@ static inline long rcu_cblist_n_cbs(struct rcu_cblist *rclp)
|
|||
return READ_ONCE(rclp->len);
|
||||
}
|
||||
|
||||
/*
|
||||
* Account for the fact that a previously dequeued callback turned out
|
||||
* to be marked as lazy.
|
||||
*/
|
||||
static inline void rcu_cblist_dequeued_lazy(struct rcu_cblist *rclp)
|
||||
{
|
||||
rclp->len_lazy--;
|
||||
}
|
||||
|
||||
void rcu_cblist_init(struct rcu_cblist *rclp);
|
||||
void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp);
|
||||
void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
|
||||
|
@ -59,18 +50,6 @@ static inline long rcu_segcblist_n_cbs(struct rcu_segcblist *rsclp)
|
|||
#endif
|
||||
}
|
||||
|
||||
/* Return number of lazy callbacks in segmented callback list. */
|
||||
static inline long rcu_segcblist_n_lazy_cbs(struct rcu_segcblist *rsclp)
|
||||
{
|
||||
return rsclp->len_lazy;
|
||||
}
|
||||
|
||||
/* Return number of lazy callbacks in segmented callback list. */
|
||||
static inline long rcu_segcblist_n_nonlazy_cbs(struct rcu_segcblist *rsclp)
|
||||
{
|
||||
return rcu_segcblist_n_cbs(rsclp) - rsclp->len_lazy;
|
||||
}
|
||||
|
||||
/*
|
||||
* Is the specified rcu_segcblist enabled, for example, not corresponding
|
||||
* to an offline CPU?
|
||||
|
@ -106,9 +85,9 @@ struct rcu_head *rcu_segcblist_first_cb(struct rcu_segcblist *rsclp);
|
|||
struct rcu_head *rcu_segcblist_first_pend_cb(struct rcu_segcblist *rsclp);
|
||||
bool rcu_segcblist_nextgp(struct rcu_segcblist *rsclp, unsigned long *lp);
|
||||
void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
|
||||
struct rcu_head *rhp, bool lazy);
|
||||
struct rcu_head *rhp);
|
||||
bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
|
||||
struct rcu_head *rhp, bool lazy);
|
||||
struct rcu_head *rhp);
|
||||
void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp,
|
||||
struct rcu_cblist *rclp);
|
||||
void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
|
||||
|
|
|
@ -86,6 +86,7 @@ torture_param(bool, shutdown, RCUPERF_SHUTDOWN,
|
|||
"Shutdown at end of performance tests.");
|
||||
torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
|
||||
torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable");
|
||||
torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() perf test?");
|
||||
|
||||
static char *perf_type = "rcu";
|
||||
module_param(perf_type, charp, 0444);
|
||||
|
@ -105,8 +106,8 @@ static atomic_t n_rcu_perf_writer_finished;
|
|||
static wait_queue_head_t shutdown_wq;
|
||||
static u64 t_rcu_perf_writer_started;
|
||||
static u64 t_rcu_perf_writer_finished;
|
||||
static unsigned long b_rcu_perf_writer_started;
|
||||
static unsigned long b_rcu_perf_writer_finished;
|
||||
static unsigned long b_rcu_gp_test_started;
|
||||
static unsigned long b_rcu_gp_test_finished;
|
||||
static DEFINE_PER_CPU(atomic_t, n_async_inflight);
|
||||
|
||||
#define MAX_MEAS 10000
|
||||
|
@ -378,10 +379,10 @@ rcu_perf_writer(void *arg)
|
|||
if (atomic_inc_return(&n_rcu_perf_writer_started) >= nrealwriters) {
|
||||
t_rcu_perf_writer_started = t;
|
||||
if (gp_exp) {
|
||||
b_rcu_perf_writer_started =
|
||||
b_rcu_gp_test_started =
|
||||
cur_ops->exp_completed() / 2;
|
||||
} else {
|
||||
b_rcu_perf_writer_started = cur_ops->get_gp_seq();
|
||||
b_rcu_gp_test_started = cur_ops->get_gp_seq();
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -429,10 +430,10 @@ retry:
|
|||
PERFOUT_STRING("Test complete");
|
||||
t_rcu_perf_writer_finished = t;
|
||||
if (gp_exp) {
|
||||
b_rcu_perf_writer_finished =
|
||||
b_rcu_gp_test_finished =
|
||||
cur_ops->exp_completed() / 2;
|
||||
} else {
|
||||
b_rcu_perf_writer_finished =
|
||||
b_rcu_gp_test_finished =
|
||||
cur_ops->get_gp_seq();
|
||||
}
|
||||
if (shutdown) {
|
||||
|
@ -515,8 +516,8 @@ rcu_perf_cleanup(void)
|
|||
t_rcu_perf_writer_finished -
|
||||
t_rcu_perf_writer_started,
|
||||
ngps,
|
||||
rcuperf_seq_diff(b_rcu_perf_writer_finished,
|
||||
b_rcu_perf_writer_started));
|
||||
rcuperf_seq_diff(b_rcu_gp_test_finished,
|
||||
b_rcu_gp_test_started));
|
||||
for (i = 0; i < nrealwriters; i++) {
|
||||
if (!writer_durations)
|
||||
break;
|
||||
|
@ -584,6 +585,159 @@ rcu_perf_shutdown(void *arg)
|
|||
return -EINVAL;
|
||||
}
|
||||
|
||||
/*
|
||||
* kfree_rcu() performance tests: Start a kfree_rcu() loop on all CPUs for number
|
||||
* of iterations and measure total time and number of GP for all iterations to complete.
|
||||
*/
|
||||
|
||||
torture_param(int, kfree_nthreads, -1, "Number of threads running loops of kfree_rcu().");
|
||||
torture_param(int, kfree_alloc_num, 8000, "Number of allocations and frees done in an iteration.");
|
||||
torture_param(int, kfree_loops, 10, "Number of loops doing kfree_alloc_num allocations and frees.");
|
||||
|
||||
static struct task_struct **kfree_reader_tasks;
|
||||
static int kfree_nrealthreads;
|
||||
static atomic_t n_kfree_perf_thread_started;
|
||||
static atomic_t n_kfree_perf_thread_ended;
|
||||
|
||||
struct kfree_obj {
|
||||
char kfree_obj[8];
|
||||
struct rcu_head rh;
|
||||
};
|
||||
|
||||
static int
|
||||
kfree_perf_thread(void *arg)
|
||||
{
|
||||
int i, loop = 0;
|
||||
long me = (long)arg;
|
||||
struct kfree_obj *alloc_ptr;
|
||||
u64 start_time, end_time;
|
||||
|
||||
VERBOSE_PERFOUT_STRING("kfree_perf_thread task started");
|
||||
set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
|
||||
set_user_nice(current, MAX_NICE);
|
||||
|
||||
start_time = ktime_get_mono_fast_ns();
|
||||
|
||||
if (atomic_inc_return(&n_kfree_perf_thread_started) >= kfree_nrealthreads) {
|
||||
if (gp_exp)
|
||||
b_rcu_gp_test_started = cur_ops->exp_completed() / 2;
|
||||
else
|
||||
b_rcu_gp_test_started = cur_ops->get_gp_seq();
|
||||
}
|
||||
|
||||
do {
|
||||
for (i = 0; i < kfree_alloc_num; i++) {
|
||||
alloc_ptr = kmalloc(sizeof(struct kfree_obj), GFP_KERNEL);
|
||||
if (!alloc_ptr)
|
||||
return -ENOMEM;
|
||||
|
||||
kfree_rcu(alloc_ptr, rh);
|
||||
}
|
||||
|
||||
cond_resched();
|
||||
} while (!torture_must_stop() && ++loop < kfree_loops);
|
||||
|
||||
if (atomic_inc_return(&n_kfree_perf_thread_ended) >= kfree_nrealthreads) {
|
||||
end_time = ktime_get_mono_fast_ns();
|
||||
|
||||
if (gp_exp)
|
||||
b_rcu_gp_test_finished = cur_ops->exp_completed() / 2;
|
||||
else
|
||||
b_rcu_gp_test_finished = cur_ops->get_gp_seq();
|
||||
|
||||
pr_alert("Total time taken by all kfree'ers: %llu ns, loops: %d, batches: %ld\n",
|
||||
(unsigned long long)(end_time - start_time), kfree_loops,
|
||||
rcuperf_seq_diff(b_rcu_gp_test_finished, b_rcu_gp_test_started));
|
||||
if (shutdown) {
|
||||
smp_mb(); /* Assign before wake. */
|
||||
wake_up(&shutdown_wq);
|
||||
}
|
||||
}
|
||||
|
||||
torture_kthread_stopping("kfree_perf_thread");
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void
|
||||
kfree_perf_cleanup(void)
|
||||
{
|
||||
int i;
|
||||
|
||||
if (torture_cleanup_begin())
|
||||
return;
|
||||
|
||||
if (kfree_reader_tasks) {
|
||||
for (i = 0; i < kfree_nrealthreads; i++)
|
||||
torture_stop_kthread(kfree_perf_thread,
|
||||
kfree_reader_tasks[i]);
|
||||
kfree(kfree_reader_tasks);
|
||||
}
|
||||
|
||||
torture_cleanup_end();
|
||||
}
|
||||
|
||||
/*
|
||||
* shutdown kthread. Just waits to be awakened, then shuts down system.
|
||||
*/
|
||||
static int
|
||||
kfree_perf_shutdown(void *arg)
|
||||
{
|
||||
do {
|
||||
wait_event(shutdown_wq,
|
||||
atomic_read(&n_kfree_perf_thread_ended) >=
|
||||
kfree_nrealthreads);
|
||||
} while (atomic_read(&n_kfree_perf_thread_ended) < kfree_nrealthreads);
|
||||
|
||||
smp_mb(); /* Wake before output. */
|
||||
|
||||
kfree_perf_cleanup();
|
||||
kernel_power_off();
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
static int __init
|
||||
kfree_perf_init(void)
|
||||
{
|
||||
long i;
|
||||
int firsterr = 0;
|
||||
|
||||
kfree_nrealthreads = compute_real(kfree_nthreads);
|
||||
/* Start up the kthreads. */
|
||||
if (shutdown) {
|
||||
init_waitqueue_head(&shutdown_wq);
|
||||
firsterr = torture_create_kthread(kfree_perf_shutdown, NULL,
|
||||
shutdown_task);
|
||||
if (firsterr)
|
||||
goto unwind;
|
||||
schedule_timeout_uninterruptible(1);
|
||||
}
|
||||
|
||||
kfree_reader_tasks = kcalloc(kfree_nrealthreads, sizeof(kfree_reader_tasks[0]),
|
||||
GFP_KERNEL);
|
||||
if (kfree_reader_tasks == NULL) {
|
||||
firsterr = -ENOMEM;
|
||||
goto unwind;
|
||||
}
|
||||
|
||||
for (i = 0; i < kfree_nrealthreads; i++) {
|
||||
firsterr = torture_create_kthread(kfree_perf_thread, (void *)i,
|
||||
kfree_reader_tasks[i]);
|
||||
if (firsterr)
|
||||
goto unwind;
|
||||
}
|
||||
|
||||
while (atomic_read(&n_kfree_perf_thread_started) < kfree_nrealthreads)
|
||||
schedule_timeout_uninterruptible(1);
|
||||
|
||||
torture_init_end();
|
||||
return 0;
|
||||
|
||||
unwind:
|
||||
torture_init_end();
|
||||
kfree_perf_cleanup();
|
||||
return firsterr;
|
||||
}
|
||||
|
||||
static int __init
|
||||
rcu_perf_init(void)
|
||||
{
|
||||
|
@ -616,6 +770,9 @@ rcu_perf_init(void)
|
|||
if (cur_ops->init)
|
||||
cur_ops->init();
|
||||
|
||||
if (kfree_rcu_test)
|
||||
return kfree_perf_init();
|
||||
|
||||
nrealwriters = compute_real(nwriters);
|
||||
nrealreaders = compute_real(nreaders);
|
||||
atomic_set(&n_rcu_perf_reader_started, 0);
|
||||
|
|
|
@ -1661,43 +1661,52 @@ static void rcu_torture_fwd_prog_cb(struct rcu_head *rhp)
|
|||
struct rcu_fwd_cb {
|
||||
struct rcu_head rh;
|
||||
struct rcu_fwd_cb *rfc_next;
|
||||
struct rcu_fwd *rfc_rfp;
|
||||
int rfc_gps;
|
||||
};
|
||||
static DEFINE_SPINLOCK(rcu_fwd_lock);
|
||||
static struct rcu_fwd_cb *rcu_fwd_cb_head;
|
||||
static struct rcu_fwd_cb **rcu_fwd_cb_tail = &rcu_fwd_cb_head;
|
||||
static long n_launders_cb;
|
||||
static unsigned long rcu_fwd_startat;
|
||||
static bool rcu_fwd_emergency_stop;
|
||||
|
||||
#define MAX_FWD_CB_JIFFIES (8 * HZ) /* Maximum CB test duration. */
|
||||
#define MIN_FWD_CB_LAUNDERS 3 /* This many CB invocations to count. */
|
||||
#define MIN_FWD_CBS_LAUNDERED 100 /* Number of counted CBs. */
|
||||
#define FWD_CBS_HIST_DIV 10 /* Histogram buckets/second. */
|
||||
#define N_LAUNDERS_HIST (2 * MAX_FWD_CB_JIFFIES / (HZ / FWD_CBS_HIST_DIV))
|
||||
|
||||
struct rcu_launder_hist {
|
||||
long n_launders;
|
||||
unsigned long launder_gp_seq;
|
||||
};
|
||||
#define N_LAUNDERS_HIST (2 * MAX_FWD_CB_JIFFIES / (HZ / FWD_CBS_HIST_DIV))
|
||||
static struct rcu_launder_hist n_launders_hist[N_LAUNDERS_HIST];
|
||||
static unsigned long rcu_launder_gp_seq_start;
|
||||
|
||||
static void rcu_torture_fwd_cb_hist(void)
|
||||
struct rcu_fwd {
|
||||
spinlock_t rcu_fwd_lock;
|
||||
struct rcu_fwd_cb *rcu_fwd_cb_head;
|
||||
struct rcu_fwd_cb **rcu_fwd_cb_tail;
|
||||
long n_launders_cb;
|
||||
unsigned long rcu_fwd_startat;
|
||||
struct rcu_launder_hist n_launders_hist[N_LAUNDERS_HIST];
|
||||
unsigned long rcu_launder_gp_seq_start;
|
||||
};
|
||||
|
||||
struct rcu_fwd *rcu_fwds;
|
||||
bool rcu_fwd_emergency_stop;
|
||||
|
||||
static void rcu_torture_fwd_cb_hist(struct rcu_fwd *rfp)
|
||||
{
|
||||
unsigned long gps;
|
||||
unsigned long gps_old;
|
||||
int i;
|
||||
int j;
|
||||
|
||||
for (i = ARRAY_SIZE(n_launders_hist) - 1; i > 0; i--)
|
||||
if (n_launders_hist[i].n_launders > 0)
|
||||
for (i = ARRAY_SIZE(rfp->n_launders_hist) - 1; i > 0; i--)
|
||||
if (rfp->n_launders_hist[i].n_launders > 0)
|
||||
break;
|
||||
pr_alert("%s: Callback-invocation histogram (duration %lu jiffies):",
|
||||
__func__, jiffies - rcu_fwd_startat);
|
||||
gps_old = rcu_launder_gp_seq_start;
|
||||
__func__, jiffies - rfp->rcu_fwd_startat);
|
||||
gps_old = rfp->rcu_launder_gp_seq_start;
|
||||
for (j = 0; j <= i; j++) {
|
||||
gps = n_launders_hist[j].launder_gp_seq;
|
||||
gps = rfp->n_launders_hist[j].launder_gp_seq;
|
||||
pr_cont(" %ds/%d: %ld:%ld",
|
||||
j + 1, FWD_CBS_HIST_DIV, n_launders_hist[j].n_launders,
|
||||
j + 1, FWD_CBS_HIST_DIV,
|
||||
rfp->n_launders_hist[j].n_launders,
|
||||
rcutorture_seq_diff(gps, gps_old));
|
||||
gps_old = gps;
|
||||
}
|
||||
|
@ -1711,26 +1720,27 @@ static void rcu_torture_fwd_cb_cr(struct rcu_head *rhp)
|
|||
int i;
|
||||
struct rcu_fwd_cb *rfcp = container_of(rhp, struct rcu_fwd_cb, rh);
|
||||
struct rcu_fwd_cb **rfcpp;
|
||||
struct rcu_fwd *rfp = rfcp->rfc_rfp;
|
||||
|
||||
rfcp->rfc_next = NULL;
|
||||
rfcp->rfc_gps++;
|
||||
spin_lock_irqsave(&rcu_fwd_lock, flags);
|
||||
rfcpp = rcu_fwd_cb_tail;
|
||||
rcu_fwd_cb_tail = &rfcp->rfc_next;
|
||||
spin_lock_irqsave(&rfp->rcu_fwd_lock, flags);
|
||||
rfcpp = rfp->rcu_fwd_cb_tail;
|
||||
rfp->rcu_fwd_cb_tail = &rfcp->rfc_next;
|
||||
WRITE_ONCE(*rfcpp, rfcp);
|
||||
WRITE_ONCE(n_launders_cb, n_launders_cb + 1);
|
||||
i = ((jiffies - rcu_fwd_startat) / (HZ / FWD_CBS_HIST_DIV));
|
||||
if (i >= ARRAY_SIZE(n_launders_hist))
|
||||
i = ARRAY_SIZE(n_launders_hist) - 1;
|
||||
n_launders_hist[i].n_launders++;
|
||||
n_launders_hist[i].launder_gp_seq = cur_ops->get_gp_seq();
|
||||
spin_unlock_irqrestore(&rcu_fwd_lock, flags);
|
||||
WRITE_ONCE(rfp->n_launders_cb, rfp->n_launders_cb + 1);
|
||||
i = ((jiffies - rfp->rcu_fwd_startat) / (HZ / FWD_CBS_HIST_DIV));
|
||||
if (i >= ARRAY_SIZE(rfp->n_launders_hist))
|
||||
i = ARRAY_SIZE(rfp->n_launders_hist) - 1;
|
||||
rfp->n_launders_hist[i].n_launders++;
|
||||
rfp->n_launders_hist[i].launder_gp_seq = cur_ops->get_gp_seq();
|
||||
spin_unlock_irqrestore(&rfp->rcu_fwd_lock, flags);
|
||||
}
|
||||
|
||||
// Give the scheduler a chance, even on nohz_full CPUs.
|
||||
static void rcu_torture_fwd_prog_cond_resched(unsigned long iter)
|
||||
{
|
||||
if (IS_ENABLED(CONFIG_PREEMPT) && IS_ENABLED(CONFIG_NO_HZ_FULL)) {
|
||||
if (IS_ENABLED(CONFIG_PREEMPTION) && IS_ENABLED(CONFIG_NO_HZ_FULL)) {
|
||||
// Real call_rcu() floods hit userspace, so emulate that.
|
||||
if (need_resched() || (iter & 0xfff))
|
||||
schedule();
|
||||
|
@ -1744,23 +1754,23 @@ static void rcu_torture_fwd_prog_cond_resched(unsigned long iter)
|
|||
* Free all callbacks on the rcu_fwd_cb_head list, either because the
|
||||
* test is over or because we hit an OOM event.
|
||||
*/
|
||||
static unsigned long rcu_torture_fwd_prog_cbfree(void)
|
||||
static unsigned long rcu_torture_fwd_prog_cbfree(struct rcu_fwd *rfp)
|
||||
{
|
||||
unsigned long flags;
|
||||
unsigned long freed = 0;
|
||||
struct rcu_fwd_cb *rfcp;
|
||||
|
||||
for (;;) {
|
||||
spin_lock_irqsave(&rcu_fwd_lock, flags);
|
||||
rfcp = rcu_fwd_cb_head;
|
||||
spin_lock_irqsave(&rfp->rcu_fwd_lock, flags);
|
||||
rfcp = rfp->rcu_fwd_cb_head;
|
||||
if (!rfcp) {
|
||||
spin_unlock_irqrestore(&rcu_fwd_lock, flags);
|
||||
spin_unlock_irqrestore(&rfp->rcu_fwd_lock, flags);
|
||||
break;
|
||||
}
|
||||
rcu_fwd_cb_head = rfcp->rfc_next;
|
||||
if (!rcu_fwd_cb_head)
|
||||
rcu_fwd_cb_tail = &rcu_fwd_cb_head;
|
||||
spin_unlock_irqrestore(&rcu_fwd_lock, flags);
|
||||
rfp->rcu_fwd_cb_head = rfcp->rfc_next;
|
||||
if (!rfp->rcu_fwd_cb_head)
|
||||
rfp->rcu_fwd_cb_tail = &rfp->rcu_fwd_cb_head;
|
||||
spin_unlock_irqrestore(&rfp->rcu_fwd_lock, flags);
|
||||
kfree(rfcp);
|
||||
freed++;
|
||||
rcu_torture_fwd_prog_cond_resched(freed);
|
||||
|
@ -1774,7 +1784,8 @@ static unsigned long rcu_torture_fwd_prog_cbfree(void)
|
|||
}
|
||||
|
||||
/* Carry out need_resched()/cond_resched() forward-progress testing. */
|
||||
static void rcu_torture_fwd_prog_nr(int *tested, int *tested_tries)
|
||||
static void rcu_torture_fwd_prog_nr(struct rcu_fwd *rfp,
|
||||
int *tested, int *tested_tries)
|
||||
{
|
||||
unsigned long cver;
|
||||
unsigned long dur;
|
||||
|
@ -1804,8 +1815,8 @@ static void rcu_torture_fwd_prog_nr(int *tested, int *tested_tries)
|
|||
sd = cur_ops->stall_dur() + 1;
|
||||
sd4 = (sd + fwd_progress_div - 1) / fwd_progress_div;
|
||||
dur = sd4 + torture_random(&trs) % (sd - sd4);
|
||||
WRITE_ONCE(rcu_fwd_startat, jiffies);
|
||||
stopat = rcu_fwd_startat + dur;
|
||||
WRITE_ONCE(rfp->rcu_fwd_startat, jiffies);
|
||||
stopat = rfp->rcu_fwd_startat + dur;
|
||||
while (time_before(jiffies, stopat) &&
|
||||
!shutdown_time_arrived() &&
|
||||
!READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) {
|
||||
|
@ -1840,7 +1851,7 @@ static void rcu_torture_fwd_prog_nr(int *tested, int *tested_tries)
|
|||
}
|
||||
|
||||
/* Carry out call_rcu() forward-progress testing. */
|
||||
static void rcu_torture_fwd_prog_cr(void)
|
||||
static void rcu_torture_fwd_prog_cr(struct rcu_fwd *rfp)
|
||||
{
|
||||
unsigned long cver;
|
||||
unsigned long flags;
|
||||
|
@ -1864,23 +1875,23 @@ static void rcu_torture_fwd_prog_cr(void)
|
|||
/* Loop continuously posting RCU callbacks. */
|
||||
WRITE_ONCE(rcu_fwd_cb_nodelay, true);
|
||||
cur_ops->sync(); /* Later readers see above write. */
|
||||
WRITE_ONCE(rcu_fwd_startat, jiffies);
|
||||
stopat = rcu_fwd_startat + MAX_FWD_CB_JIFFIES;
|
||||
WRITE_ONCE(rfp->rcu_fwd_startat, jiffies);
|
||||
stopat = rfp->rcu_fwd_startat + MAX_FWD_CB_JIFFIES;
|
||||
n_launders = 0;
|
||||
n_launders_cb = 0;
|
||||
rfp->n_launders_cb = 0; // Hoist initialization for multi-kthread
|
||||
n_launders_sa = 0;
|
||||
n_max_cbs = 0;
|
||||
n_max_gps = 0;
|
||||
for (i = 0; i < ARRAY_SIZE(n_launders_hist); i++)
|
||||
n_launders_hist[i].n_launders = 0;
|
||||
for (i = 0; i < ARRAY_SIZE(rfp->n_launders_hist); i++)
|
||||
rfp->n_launders_hist[i].n_launders = 0;
|
||||
cver = READ_ONCE(rcu_torture_current_version);
|
||||
gps = cur_ops->get_gp_seq();
|
||||
rcu_launder_gp_seq_start = gps;
|
||||
rfp->rcu_launder_gp_seq_start = gps;
|
||||
tick_dep_set_task(current, TICK_DEP_BIT_RCU);
|
||||
while (time_before(jiffies, stopat) &&
|
||||
!shutdown_time_arrived() &&
|
||||
!READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) {
|
||||
rfcp = READ_ONCE(rcu_fwd_cb_head);
|
||||
rfcp = READ_ONCE(rfp->rcu_fwd_cb_head);
|
||||
rfcpn = NULL;
|
||||
if (rfcp)
|
||||
rfcpn = READ_ONCE(rfcp->rfc_next);
|
||||
|
@ -1888,7 +1899,7 @@ static void rcu_torture_fwd_prog_cr(void)
|
|||
if (rfcp->rfc_gps >= MIN_FWD_CB_LAUNDERS &&
|
||||
++n_max_gps >= MIN_FWD_CBS_LAUNDERED)
|
||||
break;
|
||||
rcu_fwd_cb_head = rfcpn;
|
||||
rfp->rcu_fwd_cb_head = rfcpn;
|
||||
n_launders++;
|
||||
n_launders_sa++;
|
||||
} else {
|
||||
|
@ -1900,6 +1911,7 @@ static void rcu_torture_fwd_prog_cr(void)
|
|||
n_max_cbs++;
|
||||
n_launders_sa = 0;
|
||||
rfcp->rfc_gps = 0;
|
||||
rfcp->rfc_rfp = rfp;
|
||||
}
|
||||
cur_ops->call(&rfcp->rh, rcu_torture_fwd_cb_cr);
|
||||
rcu_torture_fwd_prog_cond_resched(n_launders + n_max_cbs);
|
||||
|
@ -1910,22 +1922,22 @@ static void rcu_torture_fwd_prog_cr(void)
|
|||
}
|
||||
}
|
||||
stoppedat = jiffies;
|
||||
n_launders_cb_snap = READ_ONCE(n_launders_cb);
|
||||
n_launders_cb_snap = READ_ONCE(rfp->n_launders_cb);
|
||||
cver = READ_ONCE(rcu_torture_current_version) - cver;
|
||||
gps = rcutorture_seq_diff(cur_ops->get_gp_seq(), gps);
|
||||
cur_ops->cb_barrier(); /* Wait for callbacks to be invoked. */
|
||||
(void)rcu_torture_fwd_prog_cbfree();
|
||||
(void)rcu_torture_fwd_prog_cbfree(rfp);
|
||||
|
||||
if (!torture_must_stop() && !READ_ONCE(rcu_fwd_emergency_stop) &&
|
||||
!shutdown_time_arrived()) {
|
||||
WARN_ON(n_max_gps < MIN_FWD_CBS_LAUNDERED);
|
||||
pr_alert("%s Duration %lu barrier: %lu pending %ld n_launders: %ld n_launders_sa: %ld n_max_gps: %ld n_max_cbs: %ld cver %ld gps %ld\n",
|
||||
__func__,
|
||||
stoppedat - rcu_fwd_startat, jiffies - stoppedat,
|
||||
stoppedat - rfp->rcu_fwd_startat, jiffies - stoppedat,
|
||||
n_launders + n_max_cbs - n_launders_cb_snap,
|
||||
n_launders, n_launders_sa,
|
||||
n_max_gps, n_max_cbs, cver, gps);
|
||||
rcu_torture_fwd_cb_hist();
|
||||
rcu_torture_fwd_cb_hist(rfp);
|
||||
}
|
||||
schedule_timeout_uninterruptible(HZ); /* Let CBs drain. */
|
||||
tick_dep_clear_task(current, TICK_DEP_BIT_RCU);
|
||||
|
@ -1940,20 +1952,22 @@ static void rcu_torture_fwd_prog_cr(void)
|
|||
static int rcutorture_oom_notify(struct notifier_block *self,
|
||||
unsigned long notused, void *nfreed)
|
||||
{
|
||||
struct rcu_fwd *rfp = rcu_fwds;
|
||||
|
||||
WARN(1, "%s invoked upon OOM during forward-progress testing.\n",
|
||||
__func__);
|
||||
rcu_torture_fwd_cb_hist();
|
||||
rcu_fwd_progress_check(1 + (jiffies - READ_ONCE(rcu_fwd_startat)) / 2);
|
||||
rcu_torture_fwd_cb_hist(rfp);
|
||||
rcu_fwd_progress_check(1 + (jiffies - READ_ONCE(rfp->rcu_fwd_startat)) / 2);
|
||||
WRITE_ONCE(rcu_fwd_emergency_stop, true);
|
||||
smp_mb(); /* Emergency stop before free and wait to avoid hangs. */
|
||||
pr_info("%s: Freed %lu RCU callbacks.\n",
|
||||
__func__, rcu_torture_fwd_prog_cbfree());
|
||||
__func__, rcu_torture_fwd_prog_cbfree(rfp));
|
||||
rcu_barrier();
|
||||
pr_info("%s: Freed %lu RCU callbacks.\n",
|
||||
__func__, rcu_torture_fwd_prog_cbfree());
|
||||
__func__, rcu_torture_fwd_prog_cbfree(rfp));
|
||||
rcu_barrier();
|
||||
pr_info("%s: Freed %lu RCU callbacks.\n",
|
||||
__func__, rcu_torture_fwd_prog_cbfree());
|
||||
__func__, rcu_torture_fwd_prog_cbfree(rfp));
|
||||
smp_mb(); /* Frees before return to avoid redoing OOM. */
|
||||
(*(unsigned long *)nfreed)++; /* Forward progress CBs freed! */
|
||||
pr_info("%s returning after OOM processing.\n", __func__);
|
||||
|
@ -1967,6 +1981,7 @@ static struct notifier_block rcutorture_oom_nb = {
|
|||
/* Carry out grace-period forward-progress testing. */
|
||||
static int rcu_torture_fwd_prog(void *args)
|
||||
{
|
||||
struct rcu_fwd *rfp = args;
|
||||
int tested = 0;
|
||||
int tested_tries = 0;
|
||||
|
||||
|
@ -1978,8 +1993,8 @@ static int rcu_torture_fwd_prog(void *args)
|
|||
schedule_timeout_interruptible(fwd_progress_holdoff * HZ);
|
||||
WRITE_ONCE(rcu_fwd_emergency_stop, false);
|
||||
register_oom_notifier(&rcutorture_oom_nb);
|
||||
rcu_torture_fwd_prog_nr(&tested, &tested_tries);
|
||||
rcu_torture_fwd_prog_cr();
|
||||
rcu_torture_fwd_prog_nr(rfp, &tested, &tested_tries);
|
||||
rcu_torture_fwd_prog_cr(rfp);
|
||||
unregister_oom_notifier(&rcutorture_oom_nb);
|
||||
|
||||
/* Avoid slow periods, better to test when busy. */
|
||||
|
@ -1995,6 +2010,8 @@ static int rcu_torture_fwd_prog(void *args)
|
|||
/* If forward-progress checking is requested and feasible, spawn the thread. */
|
||||
static int __init rcu_torture_fwd_prog_init(void)
|
||||
{
|
||||
struct rcu_fwd *rfp;
|
||||
|
||||
if (!fwd_progress)
|
||||
return 0; /* Not requested, so don't do it. */
|
||||
if (!cur_ops->stall_dur || cur_ops->stall_dur() <= 0 ||
|
||||
|
@ -2013,8 +2030,12 @@ static int __init rcu_torture_fwd_prog_init(void)
|
|||
fwd_progress_holdoff = 1;
|
||||
if (fwd_progress_div <= 0)
|
||||
fwd_progress_div = 4;
|
||||
return torture_create_kthread(rcu_torture_fwd_prog,
|
||||
NULL, fwd_prog_task);
|
||||
rfp = kzalloc(sizeof(*rfp), GFP_KERNEL);
|
||||
if (!rfp)
|
||||
return -ENOMEM;
|
||||
spin_lock_init(&rfp->rcu_fwd_lock);
|
||||
rfp->rcu_fwd_cb_tail = &rfp->rcu_fwd_cb_head;
|
||||
return torture_create_kthread(rcu_torture_fwd_prog, rfp, fwd_prog_task);
|
||||
}
|
||||
|
||||
/* Callback function for RCU barrier testing. */
|
||||
|
|
|
@ -103,7 +103,7 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock);
|
|||
|
||||
/*
|
||||
* Workqueue handler to drive one grace period and invoke any callbacks
|
||||
* that become ready as a result. Single-CPU and !PREEMPT operation
|
||||
* that become ready as a result. Single-CPU and !PREEMPTION operation
|
||||
* means that we get away with murder on synchronization. ;-)
|
||||
*/
|
||||
void srcu_drive_gp(struct work_struct *wp)
|
||||
|
|
|
@ -530,7 +530,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
|
|||
idx = rcu_seq_state(ssp->srcu_gp_seq);
|
||||
WARN_ON_ONCE(idx != SRCU_STATE_SCAN2);
|
||||
cbdelay = srcu_get_delay(ssp);
|
||||
ssp->srcu_last_gp_end = ktime_get_mono_fast_ns();
|
||||
WRITE_ONCE(ssp->srcu_last_gp_end, ktime_get_mono_fast_ns());
|
||||
rcu_seq_end(&ssp->srcu_gp_seq);
|
||||
gpseq = rcu_seq_current(&ssp->srcu_gp_seq);
|
||||
if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, gpseq))
|
||||
|
@ -762,6 +762,7 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
|
|||
unsigned long flags;
|
||||
struct srcu_data *sdp;
|
||||
unsigned long t;
|
||||
unsigned long tlast;
|
||||
|
||||
/* If the local srcu_data structure has callbacks, not idle. */
|
||||
local_irq_save(flags);
|
||||
|
@ -780,9 +781,9 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
|
|||
|
||||
/* First, see if enough time has passed since the last GP. */
|
||||
t = ktime_get_mono_fast_ns();
|
||||
tlast = READ_ONCE(ssp->srcu_last_gp_end);
|
||||
if (exp_holdoff == 0 ||
|
||||
time_in_range_open(t, ssp->srcu_last_gp_end,
|
||||
ssp->srcu_last_gp_end + exp_holdoff))
|
||||
time_in_range_open(t, tlast, tlast + exp_holdoff))
|
||||
return false; /* Too soon after last GP. */
|
||||
|
||||
/* Next, check for probable idleness. */
|
||||
|
@ -853,7 +854,7 @@ static void __call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp,
|
|||
local_irq_save(flags);
|
||||
sdp = this_cpu_ptr(ssp->sda);
|
||||
spin_lock_rcu_node(sdp);
|
||||
rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp, false);
|
||||
rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp);
|
||||
rcu_segcblist_advance(&sdp->srcu_cblist,
|
||||
rcu_seq_current(&ssp->srcu_gp_seq));
|
||||
s = rcu_seq_snap(&ssp->srcu_gp_seq);
|
||||
|
@ -1052,7 +1053,7 @@ void srcu_barrier(struct srcu_struct *ssp)
|
|||
sdp->srcu_barrier_head.func = srcu_barrier_cb;
|
||||
debug_rcu_head_queue(&sdp->srcu_barrier_head);
|
||||
if (!rcu_segcblist_entrain(&sdp->srcu_cblist,
|
||||
&sdp->srcu_barrier_head, 0)) {
|
||||
&sdp->srcu_barrier_head)) {
|
||||
debug_rcu_head_unqueue(&sdp->srcu_barrier_head);
|
||||
atomic_dec(&ssp->srcu_barrier_cpu_cnt);
|
||||
}
|
||||
|
|
|
@ -22,6 +22,7 @@
|
|||
#include <linux/time.h>
|
||||
#include <linux/cpu.h>
|
||||
#include <linux/prefetch.h>
|
||||
#include <linux/slab.h>
|
||||
|
||||
#include "rcu.h"
|
||||
|
||||
|
@ -73,6 +74,31 @@ void rcu_sched_clock_irq(int user)
|
|||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Reclaim the specified callback, either by invoking it for non-kfree cases or
|
||||
* freeing it directly (for kfree). Return true if kfreeing, false otherwise.
|
||||
*/
|
||||
static inline bool rcu_reclaim_tiny(struct rcu_head *head)
|
||||
{
|
||||
rcu_callback_t f;
|
||||
unsigned long offset = (unsigned long)head->func;
|
||||
|
||||
rcu_lock_acquire(&rcu_callback_map);
|
||||
if (__is_kfree_rcu_offset(offset)) {
|
||||
trace_rcu_invoke_kfree_callback("", head, offset);
|
||||
kfree((void *)head - offset);
|
||||
rcu_lock_release(&rcu_callback_map);
|
||||
return true;
|
||||
}
|
||||
|
||||
trace_rcu_invoke_callback("", head);
|
||||
f = head->func;
|
||||
WRITE_ONCE(head->func, (rcu_callback_t)0L);
|
||||
f(head);
|
||||
rcu_lock_release(&rcu_callback_map);
|
||||
return false;
|
||||
}
|
||||
|
||||
/* Invoke the RCU callbacks whose grace period has elapsed. */
|
||||
static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused)
|
||||
{
|
||||
|
@ -100,7 +126,7 @@ static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused
|
|||
prefetch(next);
|
||||
debug_rcu_head_unqueue(list);
|
||||
local_bh_disable();
|
||||
__rcu_reclaim("", list);
|
||||
rcu_reclaim_tiny(list);
|
||||
local_bh_enable();
|
||||
list = next;
|
||||
}
|
||||
|
|
|
@ -43,7 +43,6 @@
|
|||
#include <uapi/linux/sched/types.h>
|
||||
#include <linux/prefetch.h>
|
||||
#include <linux/delay.h>
|
||||
#include <linux/stop_machine.h>
|
||||
#include <linux/random.h>
|
||||
#include <linux/trace_events.h>
|
||||
#include <linux/suspend.h>
|
||||
|
@ -55,6 +54,7 @@
|
|||
#include <linux/oom.h>
|
||||
#include <linux/smpboot.h>
|
||||
#include <linux/jiffies.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/sched/isolation.h>
|
||||
#include <linux/sched/clock.h>
|
||||
#include "../time/tick-internal.h"
|
||||
|
@ -84,7 +84,7 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
|
|||
.dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE,
|
||||
.dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR),
|
||||
};
|
||||
struct rcu_state rcu_state = {
|
||||
static struct rcu_state rcu_state = {
|
||||
.level = { &rcu_state.node[0] },
|
||||
.gp_state = RCU_GP_IDLE,
|
||||
.gp_seq = (0UL - 300UL) << RCU_SEQ_CTR_SHIFT,
|
||||
|
@ -188,7 +188,7 @@ EXPORT_SYMBOL_GPL(rcu_get_gp_kthreads_prio);
|
|||
* held, but the bit corresponding to the current CPU will be stable
|
||||
* in most contexts.
|
||||
*/
|
||||
unsigned long rcu_rnp_online_cpus(struct rcu_node *rnp)
|
||||
static unsigned long rcu_rnp_online_cpus(struct rcu_node *rnp)
|
||||
{
|
||||
return READ_ONCE(rnp->qsmaskinitnext);
|
||||
}
|
||||
|
@ -294,7 +294,7 @@ static void rcu_dynticks_eqs_online(void)
|
|||
*
|
||||
* No ordering, as we are sampling CPU-local information.
|
||||
*/
|
||||
bool rcu_dynticks_curr_cpu_in_eqs(void)
|
||||
static bool rcu_dynticks_curr_cpu_in_eqs(void)
|
||||
{
|
||||
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
|
||||
|
||||
|
@ -305,7 +305,7 @@ bool rcu_dynticks_curr_cpu_in_eqs(void)
|
|||
* Snapshot the ->dynticks counter with full ordering so as to allow
|
||||
* stable comparison of this counter with past and future snapshots.
|
||||
*/
|
||||
int rcu_dynticks_snap(struct rcu_data *rdp)
|
||||
static int rcu_dynticks_snap(struct rcu_data *rdp)
|
||||
{
|
||||
int snap = atomic_add_return(0, &rdp->dynticks);
|
||||
|
||||
|
@ -528,16 +528,6 @@ static struct rcu_node *rcu_get_root(void)
|
|||
return &rcu_state.node[0];
|
||||
}
|
||||
|
||||
/*
|
||||
* Convert a ->gp_state value to a character string.
|
||||
*/
|
||||
static const char *gp_state_getname(short gs)
|
||||
{
|
||||
if (gs < 0 || gs >= ARRAY_SIZE(gp_state_names))
|
||||
return "???";
|
||||
return gp_state_names[gs];
|
||||
}
|
||||
|
||||
/*
|
||||
* Send along grace-period-related data for rcutorture diagnostics.
|
||||
*/
|
||||
|
@ -577,7 +567,7 @@ static void rcu_eqs_enter(bool user)
|
|||
}
|
||||
|
||||
lockdep_assert_irqs_disabled();
|
||||
trace_rcu_dyntick(TPS("Start"), rdp->dynticks_nesting, 0, rdp->dynticks);
|
||||
trace_rcu_dyntick(TPS("Start"), rdp->dynticks_nesting, 0, atomic_read(&rdp->dynticks));
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
|
||||
rdp = this_cpu_ptr(&rcu_data);
|
||||
do_nocb_deferred_wakeup(rdp);
|
||||
|
@ -650,14 +640,15 @@ static __always_inline void rcu_nmi_exit_common(bool irq)
|
|||
* leave it in non-RCU-idle state.
|
||||
*/
|
||||
if (rdp->dynticks_nmi_nesting != 1) {
|
||||
trace_rcu_dyntick(TPS("--="), rdp->dynticks_nmi_nesting, rdp->dynticks_nmi_nesting - 2, rdp->dynticks);
|
||||
trace_rcu_dyntick(TPS("--="), rdp->dynticks_nmi_nesting, rdp->dynticks_nmi_nesting - 2,
|
||||
atomic_read(&rdp->dynticks));
|
||||
WRITE_ONCE(rdp->dynticks_nmi_nesting, /* No store tearing. */
|
||||
rdp->dynticks_nmi_nesting - 2);
|
||||
return;
|
||||
}
|
||||
|
||||
/* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
|
||||
trace_rcu_dyntick(TPS("Startirq"), rdp->dynticks_nmi_nesting, 0, rdp->dynticks);
|
||||
trace_rcu_dyntick(TPS("Startirq"), rdp->dynticks_nmi_nesting, 0, atomic_read(&rdp->dynticks));
|
||||
WRITE_ONCE(rdp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */
|
||||
|
||||
if (irq)
|
||||
|
@ -744,7 +735,7 @@ static void rcu_eqs_exit(bool user)
|
|||
rcu_dynticks_task_exit();
|
||||
rcu_dynticks_eqs_exit();
|
||||
rcu_cleanup_after_idle();
|
||||
trace_rcu_dyntick(TPS("End"), rdp->dynticks_nesting, 1, rdp->dynticks);
|
||||
trace_rcu_dyntick(TPS("End"), rdp->dynticks_nesting, 1, atomic_read(&rdp->dynticks));
|
||||
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current));
|
||||
WRITE_ONCE(rdp->dynticks_nesting, 1);
|
||||
WARN_ON_ONCE(rdp->dynticks_nmi_nesting);
|
||||
|
@ -800,8 +791,8 @@ void rcu_user_exit(void)
|
|||
*/
|
||||
static __always_inline void rcu_nmi_enter_common(bool irq)
|
||||
{
|
||||
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
|
||||
long incby = 2;
|
||||
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
|
||||
|
||||
/* Complain about underflow. */
|
||||
WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0);
|
||||
|
@ -828,12 +819,17 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
|
|||
} else if (tick_nohz_full_cpu(rdp->cpu) &&
|
||||
rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE &&
|
||||
READ_ONCE(rdp->rcu_urgent_qs) && !rdp->rcu_forced_tick) {
|
||||
rdp->rcu_forced_tick = true;
|
||||
tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
|
||||
raw_spin_lock_rcu_node(rdp->mynode);
|
||||
// Recheck under lock.
|
||||
if (rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
|
||||
rdp->rcu_forced_tick = true;
|
||||
tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
|
||||
}
|
||||
raw_spin_unlock_rcu_node(rdp->mynode);
|
||||
}
|
||||
trace_rcu_dyntick(incby == 1 ? TPS("Endirq") : TPS("++="),
|
||||
rdp->dynticks_nmi_nesting,
|
||||
rdp->dynticks_nmi_nesting + incby, rdp->dynticks);
|
||||
rdp->dynticks_nmi_nesting + incby, atomic_read(&rdp->dynticks));
|
||||
WRITE_ONCE(rdp->dynticks_nmi_nesting, /* Prevent store tearing. */
|
||||
rdp->dynticks_nmi_nesting + incby);
|
||||
barrier();
|
||||
|
@ -898,6 +894,7 @@ void rcu_irq_enter_irqson(void)
|
|||
*/
|
||||
static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp)
|
||||
{
|
||||
raw_lockdep_assert_held_rcu_node(rdp->mynode);
|
||||
WRITE_ONCE(rdp->rcu_urgent_qs, false);
|
||||
WRITE_ONCE(rdp->rcu_need_heavy_qs, false);
|
||||
if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_forced_tick) {
|
||||
|
@ -1934,7 +1931,7 @@ rcu_report_unblock_qs_rnp(struct rcu_node *rnp, unsigned long flags)
|
|||
struct rcu_node *rnp_p;
|
||||
|
||||
raw_lockdep_assert_held_rcu_node(rnp);
|
||||
if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_PREEMPTION)) ||
|
||||
if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_PREEMPT_RCU)) ||
|
||||
WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp)) ||
|
||||
rnp->qsmask != 0) {
|
||||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||
|
@ -2146,7 +2143,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
|
|||
/* If no callbacks are ready, just return. */
|
||||
if (!rcu_segcblist_ready_cbs(&rdp->cblist)) {
|
||||
trace_rcu_batch_start(rcu_state.name,
|
||||
rcu_segcblist_n_lazy_cbs(&rdp->cblist),
|
||||
rcu_segcblist_n_cbs(&rdp->cblist), 0);
|
||||
trace_rcu_batch_end(rcu_state.name, 0,
|
||||
!rcu_segcblist_empty(&rdp->cblist),
|
||||
|
@ -2168,7 +2164,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
|
|||
if (unlikely(bl > 100))
|
||||
tlimit = local_clock() + rcu_resched_ns;
|
||||
trace_rcu_batch_start(rcu_state.name,
|
||||
rcu_segcblist_n_lazy_cbs(&rdp->cblist),
|
||||
rcu_segcblist_n_cbs(&rdp->cblist), bl);
|
||||
rcu_segcblist_extract_done_cbs(&rdp->cblist, &rcl);
|
||||
if (offloaded)
|
||||
|
@ -2179,9 +2174,19 @@ static void rcu_do_batch(struct rcu_data *rdp)
|
|||
tick_dep_set_task(current, TICK_DEP_BIT_RCU);
|
||||
rhp = rcu_cblist_dequeue(&rcl);
|
||||
for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
|
||||
rcu_callback_t f;
|
||||
|
||||
debug_rcu_head_unqueue(rhp);
|
||||
if (__rcu_reclaim(rcu_state.name, rhp))
|
||||
rcu_cblist_dequeued_lazy(&rcl);
|
||||
|
||||
rcu_lock_acquire(&rcu_callback_map);
|
||||
trace_rcu_invoke_callback(rcu_state.name, rhp);
|
||||
|
||||
f = rhp->func;
|
||||
WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
|
||||
f(rhp);
|
||||
|
||||
rcu_lock_release(&rcu_callback_map);
|
||||
|
||||
/*
|
||||
* Stop only if limit reached and CPU has something to do.
|
||||
* Note: The rcl structure counts down from zero.
|
||||
|
@ -2294,7 +2299,7 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
|
|||
mask = 0;
|
||||
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
||||
if (rnp->qsmask == 0) {
|
||||
if (!IS_ENABLED(CONFIG_PREEMPTION) ||
|
||||
if (!IS_ENABLED(CONFIG_PREEMPT_RCU) ||
|
||||
rcu_preempt_blocked_readers_cgp(rnp)) {
|
||||
/*
|
||||
* No point in scanning bits because they
|
||||
|
@ -2308,14 +2313,11 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
|
|||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||
continue;
|
||||
}
|
||||
for_each_leaf_node_possible_cpu(rnp, cpu) {
|
||||
unsigned long bit = leaf_node_cpu_bit(rnp, cpu);
|
||||
if ((rnp->qsmask & bit) != 0) {
|
||||
rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||
if (f(rdp)) {
|
||||
mask |= bit;
|
||||
rcu_disable_urgency_upon_qs(rdp);
|
||||
}
|
||||
for_each_leaf_node_cpu_mask(rnp, cpu, rnp->qsmask) {
|
||||
rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||
if (f(rdp)) {
|
||||
mask |= rdp->grpmask;
|
||||
rcu_disable_urgency_upon_qs(rdp);
|
||||
}
|
||||
}
|
||||
if (mask != 0) {
|
||||
|
@ -2474,8 +2476,8 @@ static void rcu_cpu_kthread(unsigned int cpu)
|
|||
char work, *workp = this_cpu_ptr(&rcu_data.rcu_cpu_has_work);
|
||||
int spincnt;
|
||||
|
||||
trace_rcu_utilization(TPS("Start CPU kthread@rcu_run"));
|
||||
for (spincnt = 0; spincnt < 10; spincnt++) {
|
||||
trace_rcu_utilization(TPS("Start CPU kthread@rcu_wait"));
|
||||
local_bh_disable();
|
||||
*statusp = RCU_KTHREAD_RUNNING;
|
||||
local_irq_disable();
|
||||
|
@ -2583,7 +2585,7 @@ static void rcu_leak_callback(struct rcu_head *rhp)
|
|||
* is expected to specify a CPU.
|
||||
*/
|
||||
static void
|
||||
__call_rcu(struct rcu_head *head, rcu_callback_t func, bool lazy)
|
||||
__call_rcu(struct rcu_head *head, rcu_callback_t func)
|
||||
{
|
||||
unsigned long flags;
|
||||
struct rcu_data *rdp;
|
||||
|
@ -2618,18 +2620,17 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func, bool lazy)
|
|||
if (rcu_segcblist_empty(&rdp->cblist))
|
||||
rcu_segcblist_init(&rdp->cblist);
|
||||
}
|
||||
|
||||
if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
|
||||
return; // Enqueued onto ->nocb_bypass, so just leave.
|
||||
/* If we get here, rcu_nocb_try_bypass() acquired ->nocb_lock. */
|
||||
rcu_segcblist_enqueue(&rdp->cblist, head, lazy);
|
||||
rcu_segcblist_enqueue(&rdp->cblist, head);
|
||||
if (__is_kfree_rcu_offset((unsigned long)func))
|
||||
trace_rcu_kfree_callback(rcu_state.name, head,
|
||||
(unsigned long)func,
|
||||
rcu_segcblist_n_lazy_cbs(&rdp->cblist),
|
||||
rcu_segcblist_n_cbs(&rdp->cblist));
|
||||
else
|
||||
trace_rcu_callback(rcu_state.name, head,
|
||||
rcu_segcblist_n_lazy_cbs(&rdp->cblist),
|
||||
rcu_segcblist_n_cbs(&rdp->cblist));
|
||||
|
||||
/* Go handle any RCU core processing required. */
|
||||
|
@ -2679,28 +2680,230 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func, bool lazy)
|
|||
*/
|
||||
void call_rcu(struct rcu_head *head, rcu_callback_t func)
|
||||
{
|
||||
__call_rcu(head, func, 0);
|
||||
__call_rcu(head, func);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(call_rcu);
|
||||
|
||||
|
||||
/* Maximum number of jiffies to wait before draining a batch. */
|
||||
#define KFREE_DRAIN_JIFFIES (HZ / 50)
|
||||
#define KFREE_N_BATCHES 2
|
||||
|
||||
/**
|
||||
* struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
|
||||
* @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
|
||||
* @head_free: List of kfree_rcu() objects waiting for a grace period
|
||||
* @krcp: Pointer to @kfree_rcu_cpu structure
|
||||
*/
|
||||
|
||||
struct kfree_rcu_cpu_work {
|
||||
struct rcu_work rcu_work;
|
||||
struct rcu_head *head_free;
|
||||
struct kfree_rcu_cpu *krcp;
|
||||
};
|
||||
|
||||
/**
|
||||
* struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
|
||||
* @head: List of kfree_rcu() objects not yet waiting for a grace period
|
||||
* @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
|
||||
* @lock: Synchronize access to this structure
|
||||
* @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
|
||||
* @monitor_todo: Tracks whether a @monitor_work delayed work is pending
|
||||
* @initialized: The @lock and @rcu_work fields have been initialized
|
||||
*
|
||||
* This is a per-CPU structure. The reason that it is not included in
|
||||
* the rcu_data structure is to permit this code to be extracted from
|
||||
* the RCU files. Such extraction could allow further optimization of
|
||||
* the interactions with the slab allocators.
|
||||
*/
|
||||
struct kfree_rcu_cpu {
|
||||
struct rcu_head *head;
|
||||
struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
|
||||
spinlock_t lock;
|
||||
struct delayed_work monitor_work;
|
||||
bool monitor_todo;
|
||||
bool initialized;
|
||||
};
|
||||
|
||||
static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc);
|
||||
|
||||
/*
|
||||
* Queue an RCU callback for lazy invocation after a grace period.
|
||||
* This will likely be later named something like "call_rcu_lazy()",
|
||||
* but this change will require some way of tagging the lazy RCU
|
||||
* callbacks in the list of pending callbacks. Until then, this
|
||||
* function may only be called from __kfree_rcu().
|
||||
* This function is invoked in workqueue context after a grace period.
|
||||
* It frees all the objects queued on ->head_free.
|
||||
*/
|
||||
static void kfree_rcu_work(struct work_struct *work)
|
||||
{
|
||||
unsigned long flags;
|
||||
struct rcu_head *head, *next;
|
||||
struct kfree_rcu_cpu *krcp;
|
||||
struct kfree_rcu_cpu_work *krwp;
|
||||
|
||||
krwp = container_of(to_rcu_work(work),
|
||||
struct kfree_rcu_cpu_work, rcu_work);
|
||||
krcp = krwp->krcp;
|
||||
spin_lock_irqsave(&krcp->lock, flags);
|
||||
head = krwp->head_free;
|
||||
krwp->head_free = NULL;
|
||||
spin_unlock_irqrestore(&krcp->lock, flags);
|
||||
|
||||
// List "head" is now private, so traverse locklessly.
|
||||
for (; head; head = next) {
|
||||
unsigned long offset = (unsigned long)head->func;
|
||||
|
||||
next = head->next;
|
||||
// Potentially optimize with kfree_bulk in future.
|
||||
debug_rcu_head_unqueue(head);
|
||||
rcu_lock_acquire(&rcu_callback_map);
|
||||
trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset);
|
||||
|
||||
if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset))) {
|
||||
/* Could be optimized with kfree_bulk() in future. */
|
||||
kfree((void *)head - offset);
|
||||
}
|
||||
|
||||
rcu_lock_release(&rcu_callback_map);
|
||||
cond_resched_tasks_rcu_qs();
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Schedule the kfree batch RCU work to run in workqueue context after a GP.
|
||||
*
|
||||
* This function is invoked by kfree_rcu_monitor() when the KFREE_DRAIN_JIFFIES
|
||||
* timeout has been reached.
|
||||
*/
|
||||
static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
|
||||
{
|
||||
int i;
|
||||
struct kfree_rcu_cpu_work *krwp = NULL;
|
||||
|
||||
lockdep_assert_held(&krcp->lock);
|
||||
for (i = 0; i < KFREE_N_BATCHES; i++)
|
||||
if (!krcp->krw_arr[i].head_free) {
|
||||
krwp = &(krcp->krw_arr[i]);
|
||||
break;
|
||||
}
|
||||
|
||||
// If a previous RCU batch is in progress, we cannot immediately
|
||||
// queue another one, so return false to tell caller to retry.
|
||||
if (!krwp)
|
||||
return false;
|
||||
|
||||
krwp->head_free = krcp->head;
|
||||
krcp->head = NULL;
|
||||
INIT_RCU_WORK(&krwp->rcu_work, kfree_rcu_work);
|
||||
queue_rcu_work(system_wq, &krwp->rcu_work);
|
||||
return true;
|
||||
}
|
||||
|
||||
static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
|
||||
unsigned long flags)
|
||||
{
|
||||
// Attempt to start a new batch.
|
||||
krcp->monitor_todo = false;
|
||||
if (queue_kfree_rcu_work(krcp)) {
|
||||
// Success! Our job is done here.
|
||||
spin_unlock_irqrestore(&krcp->lock, flags);
|
||||
return;
|
||||
}
|
||||
|
||||
// Previous RCU batch still in progress, try again later.
|
||||
krcp->monitor_todo = true;
|
||||
schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
|
||||
spin_unlock_irqrestore(&krcp->lock, flags);
|
||||
}
|
||||
|
||||
/*
|
||||
* This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
|
||||
* It invokes kfree_rcu_drain_unlock() to attempt to start another batch.
|
||||
*/
|
||||
static void kfree_rcu_monitor(struct work_struct *work)
|
||||
{
|
||||
unsigned long flags;
|
||||
struct kfree_rcu_cpu *krcp = container_of(work, struct kfree_rcu_cpu,
|
||||
monitor_work.work);
|
||||
|
||||
spin_lock_irqsave(&krcp->lock, flags);
|
||||
if (krcp->monitor_todo)
|
||||
kfree_rcu_drain_unlock(krcp, flags);
|
||||
else
|
||||
spin_unlock_irqrestore(&krcp->lock, flags);
|
||||
}
|
||||
|
||||
/*
|
||||
* Queue a request for lazy invocation of kfree() after a grace period.
|
||||
*
|
||||
* Each kfree_call_rcu() request is added to a batch. The batch will be drained
|
||||
* every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch
|
||||
* will be kfree'd in workqueue context. This allows us to:
|
||||
*
|
||||
* 1. Batch requests together to reduce the number of grace periods during
|
||||
* heavy kfree_rcu() load.
|
||||
*
|
||||
* 2. It makes it possible to use kfree_bulk() on a large number of
|
||||
* kfree_rcu() requests thus reducing cache misses and the per-object
|
||||
* overhead of kfree().
|
||||
*/
|
||||
void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
|
||||
{
|
||||
__call_rcu(head, func, 1);
|
||||
unsigned long flags;
|
||||
struct kfree_rcu_cpu *krcp;
|
||||
|
||||
local_irq_save(flags); // For safely calling this_cpu_ptr().
|
||||
krcp = this_cpu_ptr(&krc);
|
||||
if (krcp->initialized)
|
||||
spin_lock(&krcp->lock);
|
||||
|
||||
// Queue the object but don't yet schedule the batch.
|
||||
if (debug_rcu_head_queue(head)) {
|
||||
// Probable double kfree_rcu(), just leak.
|
||||
WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
|
||||
__func__, head);
|
||||
goto unlock_return;
|
||||
}
|
||||
head->func = func;
|
||||
head->next = krcp->head;
|
||||
krcp->head = head;
|
||||
|
||||
// Set timer to drain after KFREE_DRAIN_JIFFIES.
|
||||
if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
|
||||
!krcp->monitor_todo) {
|
||||
krcp->monitor_todo = true;
|
||||
schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
|
||||
}
|
||||
|
||||
unlock_return:
|
||||
if (krcp->initialized)
|
||||
spin_unlock(&krcp->lock);
|
||||
local_irq_restore(flags);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kfree_call_rcu);
|
||||
|
||||
void __init kfree_rcu_scheduler_running(void)
|
||||
{
|
||||
int cpu;
|
||||
unsigned long flags;
|
||||
|
||||
for_each_online_cpu(cpu) {
|
||||
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
|
||||
|
||||
spin_lock_irqsave(&krcp->lock, flags);
|
||||
if (!krcp->head || krcp->monitor_todo) {
|
||||
spin_unlock_irqrestore(&krcp->lock, flags);
|
||||
continue;
|
||||
}
|
||||
krcp->monitor_todo = true;
|
||||
schedule_delayed_work_on(cpu, &krcp->monitor_work,
|
||||
KFREE_DRAIN_JIFFIES);
|
||||
spin_unlock_irqrestore(&krcp->lock, flags);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* During early boot, any blocking grace-period wait automatically
|
||||
* implies a grace period. Later on, this is never the case for PREEMPT.
|
||||
* implies a grace period. Later on, this is never the case for PREEMPTION.
|
||||
*
|
||||
* Howevr, because a context switch is a grace period for !PREEMPT, any
|
||||
* Howevr, because a context switch is a grace period for !PREEMPTION, any
|
||||
* blocking grace-period wait automatically implies a grace period if
|
||||
* there is only one CPU online at any point time during execution of
|
||||
* either synchronize_rcu() or synchronize_rcu_expedited(). It is OK to
|
||||
|
@ -2896,7 +3099,7 @@ static void rcu_barrier_func(void *unused)
|
|||
debug_rcu_head_queue(&rdp->barrier_head);
|
||||
rcu_nocb_lock(rdp);
|
||||
WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
|
||||
if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head, 0)) {
|
||||
if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
|
||||
atomic_inc(&rcu_state.barrier_cpu_count);
|
||||
} else {
|
||||
debug_rcu_head_unqueue(&rdp->barrier_head);
|
||||
|
@ -3557,12 +3760,29 @@ static void __init rcu_dump_rcu_node_tree(void)
|
|||
struct workqueue_struct *rcu_gp_wq;
|
||||
struct workqueue_struct *rcu_par_gp_wq;
|
||||
|
||||
static void __init kfree_rcu_batch_init(void)
|
||||
{
|
||||
int cpu;
|
||||
int i;
|
||||
|
||||
for_each_possible_cpu(cpu) {
|
||||
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
|
||||
|
||||
spin_lock_init(&krcp->lock);
|
||||
for (i = 0; i < KFREE_N_BATCHES; i++)
|
||||
krcp->krw_arr[i].krcp = krcp;
|
||||
INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
|
||||
krcp->initialized = true;
|
||||
}
|
||||
}
|
||||
|
||||
void __init rcu_init(void)
|
||||
{
|
||||
int cpu;
|
||||
|
||||
rcu_early_boot_tests();
|
||||
|
||||
kfree_rcu_batch_init();
|
||||
rcu_bootup_announce();
|
||||
rcu_init_geometry();
|
||||
rcu_init_one();
|
||||
|
|
|
@ -16,7 +16,6 @@
|
|||
#include <linux/cpumask.h>
|
||||
#include <linux/seqlock.h>
|
||||
#include <linux/swait.h>
|
||||
#include <linux/stop_machine.h>
|
||||
#include <linux/rcu_node_tree.h>
|
||||
|
||||
#include "rcu_segcblist.h"
|
||||
|
@ -182,8 +181,8 @@ struct rcu_data {
|
|||
bool rcu_need_heavy_qs; /* GP old, so heavy quiescent state! */
|
||||
bool rcu_urgent_qs; /* GP old need light quiescent state. */
|
||||
bool rcu_forced_tick; /* Forced tick to provide QS. */
|
||||
bool rcu_forced_tick_exp; /* ... provide QS to expedited GP. */
|
||||
#ifdef CONFIG_RCU_FAST_NO_HZ
|
||||
bool all_lazy; /* All CPU's CBs lazy at idle start? */
|
||||
unsigned long last_accelerate; /* Last jiffy CBs were accelerated. */
|
||||
unsigned long last_advance_all; /* Last jiffy CBs were all advanced. */
|
||||
int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */
|
||||
|
@ -368,18 +367,6 @@ struct rcu_state {
|
|||
#define RCU_GP_CLEANUP 7 /* Grace-period cleanup started. */
|
||||
#define RCU_GP_CLEANED 8 /* Grace-period cleanup complete. */
|
||||
|
||||
static const char * const gp_state_names[] = {
|
||||
"RCU_GP_IDLE",
|
||||
"RCU_GP_WAIT_GPS",
|
||||
"RCU_GP_DONE_GPS",
|
||||
"RCU_GP_ONOFF",
|
||||
"RCU_GP_INIT",
|
||||
"RCU_GP_WAIT_FQS",
|
||||
"RCU_GP_DOING_FQS",
|
||||
"RCU_GP_CLEANUP",
|
||||
"RCU_GP_CLEANED",
|
||||
};
|
||||
|
||||
/*
|
||||
* In order to export the rcu_state name to the tracing tools, it
|
||||
* needs to be added in the __tracepoint_string section.
|
||||
|
@ -403,8 +390,6 @@ static const char *tp_rcu_varname __used __tracepoint_string = rcu_name;
|
|||
#define RCU_NAME rcu_name
|
||||
#endif /* #else #ifdef CONFIG_TRACING */
|
||||
|
||||
int rcu_dynticks_snap(struct rcu_data *rdp);
|
||||
|
||||
/* Forward declarations for tree_plugin.h */
|
||||
static void rcu_bootup_announce(void);
|
||||
static void rcu_qs(void);
|
||||
|
@ -415,7 +400,6 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp);
|
|||
static int rcu_print_task_exp_stall(struct rcu_node *rnp);
|
||||
static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp);
|
||||
static void rcu_flavor_sched_clock_irq(int user);
|
||||
void call_rcu(struct rcu_head *head, rcu_callback_t func);
|
||||
static void dump_blkd_tasks(struct rcu_node *rnp, int ncheck);
|
||||
static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags);
|
||||
static void rcu_preempt_boost_start_gp(struct rcu_node *rnp);
|
||||
|
|
|
@ -21,7 +21,7 @@ static void rcu_exp_gp_seq_start(void)
|
|||
}
|
||||
|
||||
/*
|
||||
* Return then value that expedited-grace-period counter will have
|
||||
* Return the value that the expedited-grace-period counter will have
|
||||
* at the end of the current grace period.
|
||||
*/
|
||||
static __maybe_unused unsigned long rcu_exp_gp_seq_endval(void)
|
||||
|
@ -39,7 +39,9 @@ static void rcu_exp_gp_seq_end(void)
|
|||
}
|
||||
|
||||
/*
|
||||
* Take a snapshot of the expedited-grace-period counter.
|
||||
* Take a snapshot of the expedited-grace-period counter, which is the
|
||||
* earliest value that will indicate that a full grace period has
|
||||
* elapsed since the current time.
|
||||
*/
|
||||
static unsigned long rcu_exp_gp_seq_snap(void)
|
||||
{
|
||||
|
@ -134,7 +136,7 @@ static void __maybe_unused sync_exp_reset_tree(void)
|
|||
rcu_for_each_node_breadth_first(rnp) {
|
||||
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
||||
WARN_ON_ONCE(rnp->expmask);
|
||||
rnp->expmask = rnp->expmaskinit;
|
||||
WRITE_ONCE(rnp->expmask, rnp->expmaskinit);
|
||||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||
}
|
||||
}
|
||||
|
@ -143,31 +145,26 @@ static void __maybe_unused sync_exp_reset_tree(void)
|
|||
* Return non-zero if there is no RCU expedited grace period in progress
|
||||
* for the specified rcu_node structure, in other words, if all CPUs and
|
||||
* tasks covered by the specified rcu_node structure have done their bit
|
||||
* for the current expedited grace period. Works only for preemptible
|
||||
* RCU -- other RCU implementation use other means.
|
||||
*
|
||||
* Caller must hold the specificed rcu_node structure's ->lock
|
||||
* for the current expedited grace period.
|
||||
*/
|
||||
static bool sync_rcu_preempt_exp_done(struct rcu_node *rnp)
|
||||
static bool sync_rcu_exp_done(struct rcu_node *rnp)
|
||||
{
|
||||
raw_lockdep_assert_held_rcu_node(rnp);
|
||||
|
||||
return rnp->exp_tasks == NULL &&
|
||||
READ_ONCE(rnp->expmask) == 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Like sync_rcu_preempt_exp_done(), but this function assumes the caller
|
||||
* doesn't hold the rcu_node's ->lock, and will acquire and release the lock
|
||||
* itself
|
||||
* Like sync_rcu_exp_done(), but where the caller does not hold the
|
||||
* rcu_node's ->lock.
|
||||
*/
|
||||
static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp)
|
||||
static bool sync_rcu_exp_done_unlocked(struct rcu_node *rnp)
|
||||
{
|
||||
unsigned long flags;
|
||||
bool ret;
|
||||
|
||||
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
||||
ret = sync_rcu_preempt_exp_done(rnp);
|
||||
ret = sync_rcu_exp_done(rnp);
|
||||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||
|
||||
return ret;
|
||||
|
@ -181,8 +178,6 @@ static bool sync_rcu_preempt_exp_done_unlocked(struct rcu_node *rnp)
|
|||
* which the task was queued or to one of that rcu_node structure's ancestors,
|
||||
* recursively up the tree. (Calm down, calm down, we do the recursion
|
||||
* iteratively!)
|
||||
*
|
||||
* Caller must hold the specified rcu_node structure's ->lock.
|
||||
*/
|
||||
static void __rcu_report_exp_rnp(struct rcu_node *rnp,
|
||||
bool wake, unsigned long flags)
|
||||
|
@ -190,8 +185,9 @@ static void __rcu_report_exp_rnp(struct rcu_node *rnp,
|
|||
{
|
||||
unsigned long mask;
|
||||
|
||||
raw_lockdep_assert_held_rcu_node(rnp);
|
||||
for (;;) {
|
||||
if (!sync_rcu_preempt_exp_done(rnp)) {
|
||||
if (!sync_rcu_exp_done(rnp)) {
|
||||
if (!rnp->expmask)
|
||||
rcu_initiate_boost(rnp, flags);
|
||||
else
|
||||
|
@ -211,7 +207,7 @@ static void __rcu_report_exp_rnp(struct rcu_node *rnp,
|
|||
rnp = rnp->parent;
|
||||
raw_spin_lock_rcu_node(rnp); /* irqs already disabled */
|
||||
WARN_ON_ONCE(!(rnp->expmask & mask));
|
||||
rnp->expmask &= ~mask;
|
||||
WRITE_ONCE(rnp->expmask, rnp->expmask & ~mask);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -234,14 +230,23 @@ static void __maybe_unused rcu_report_exp_rnp(struct rcu_node *rnp, bool wake)
|
|||
static void rcu_report_exp_cpu_mult(struct rcu_node *rnp,
|
||||
unsigned long mask, bool wake)
|
||||
{
|
||||
int cpu;
|
||||
unsigned long flags;
|
||||
struct rcu_data *rdp;
|
||||
|
||||
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
||||
if (!(rnp->expmask & mask)) {
|
||||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||
return;
|
||||
}
|
||||
rnp->expmask &= ~mask;
|
||||
WRITE_ONCE(rnp->expmask, rnp->expmask & ~mask);
|
||||
for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
|
||||
rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||
if (!IS_ENABLED(CONFIG_NO_HZ_FULL) || !rdp->rcu_forced_tick_exp)
|
||||
continue;
|
||||
rdp->rcu_forced_tick_exp = false;
|
||||
tick_dep_clear_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
|
||||
}
|
||||
__rcu_report_exp_rnp(rnp, wake, flags); /* Releases rnp->lock. */
|
||||
}
|
||||
|
||||
|
@ -345,8 +350,8 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
|
|||
/* Each pass checks a CPU for identity, offline, and idle. */
|
||||
mask_ofl_test = 0;
|
||||
for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) {
|
||||
unsigned long mask = leaf_node_cpu_bit(rnp, cpu);
|
||||
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||
unsigned long mask = rdp->grpmask;
|
||||
int snap;
|
||||
|
||||
if (raw_smp_processor_id() == cpu ||
|
||||
|
@ -372,12 +377,10 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
|
|||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||
|
||||
/* IPI the remaining CPUs for expedited quiescent state. */
|
||||
for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) {
|
||||
unsigned long mask = leaf_node_cpu_bit(rnp, cpu);
|
||||
for_each_leaf_node_cpu_mask(rnp, cpu, mask_ofl_ipi) {
|
||||
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||
unsigned long mask = rdp->grpmask;
|
||||
|
||||
if (!(mask_ofl_ipi & mask))
|
||||
continue;
|
||||
retry_ipi:
|
||||
if (rcu_dynticks_in_eqs_since(rdp, rdp->exp_dynticks_snap)) {
|
||||
mask_ofl_test |= mask;
|
||||
|
@ -389,10 +392,10 @@ retry_ipi:
|
|||
}
|
||||
ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
|
||||
put_cpu();
|
||||
if (!ret) {
|
||||
mask_ofl_ipi &= ~mask;
|
||||
/* The CPU will report the QS in response to the IPI. */
|
||||
if (!ret)
|
||||
continue;
|
||||
}
|
||||
|
||||
/* Failed, raced with CPU hotplug operation. */
|
||||
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
||||
if ((rnp->qsmaskinitnext & mask) &&
|
||||
|
@ -403,13 +406,12 @@ retry_ipi:
|
|||
schedule_timeout_uninterruptible(1);
|
||||
goto retry_ipi;
|
||||
}
|
||||
/* CPU really is offline, so we can ignore it. */
|
||||
if (!(rnp->expmask & mask))
|
||||
mask_ofl_ipi &= ~mask;
|
||||
/* CPU really is offline, so we must report its QS. */
|
||||
if (rnp->expmask & mask)
|
||||
mask_ofl_test |= mask;
|
||||
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
|
||||
}
|
||||
/* Report quiescent states for those that went offline. */
|
||||
mask_ofl_test |= mask_ofl_ipi;
|
||||
if (mask_ofl_test)
|
||||
rcu_report_exp_cpu_mult(rnp, mask_ofl_test, false);
|
||||
}
|
||||
|
@ -456,29 +458,62 @@ static void sync_rcu_exp_select_cpus(void)
|
|||
flush_work(&rnp->rew.rew_work);
|
||||
}
|
||||
|
||||
static void synchronize_sched_expedited_wait(void)
|
||||
/*
|
||||
* Wait for the expedited grace period to elapse, within time limit.
|
||||
* If the time limit is exceeded without the grace period elapsing,
|
||||
* return false, otherwise return true.
|
||||
*/
|
||||
static bool synchronize_rcu_expedited_wait_once(long tlimit)
|
||||
{
|
||||
int t;
|
||||
struct rcu_node *rnp_root = rcu_get_root();
|
||||
|
||||
t = swait_event_timeout_exclusive(rcu_state.expedited_wq,
|
||||
sync_rcu_exp_done_unlocked(rnp_root),
|
||||
tlimit);
|
||||
// Workqueues should not be signaled.
|
||||
if (t > 0 || sync_rcu_exp_done_unlocked(rnp_root))
|
||||
return true;
|
||||
WARN_ON(t < 0); /* workqueues should not be signaled. */
|
||||
return false;
|
||||
}
|
||||
|
||||
/*
|
||||
* Wait for the expedited grace period to elapse, issuing any needed
|
||||
* RCU CPU stall warnings along the way.
|
||||
*/
|
||||
static void synchronize_rcu_expedited_wait(void)
|
||||
{
|
||||
int cpu;
|
||||
unsigned long jiffies_stall;
|
||||
unsigned long jiffies_start;
|
||||
unsigned long mask;
|
||||
int ndetected;
|
||||
struct rcu_data *rdp;
|
||||
struct rcu_node *rnp;
|
||||
struct rcu_node *rnp_root = rcu_get_root();
|
||||
int ret;
|
||||
|
||||
trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
|
||||
jiffies_stall = rcu_jiffies_till_stall_check();
|
||||
jiffies_start = jiffies;
|
||||
if (IS_ENABLED(CONFIG_NO_HZ_FULL)) {
|
||||
if (synchronize_rcu_expedited_wait_once(1))
|
||||
return;
|
||||
rcu_for_each_leaf_node(rnp) {
|
||||
for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) {
|
||||
rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||
if (rdp->rcu_forced_tick_exp)
|
||||
continue;
|
||||
rdp->rcu_forced_tick_exp = true;
|
||||
tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
|
||||
}
|
||||
}
|
||||
WARN_ON_ONCE(1);
|
||||
}
|
||||
|
||||
for (;;) {
|
||||
ret = swait_event_timeout_exclusive(
|
||||
rcu_state.expedited_wq,
|
||||
sync_rcu_preempt_exp_done_unlocked(rnp_root),
|
||||
jiffies_stall);
|
||||
if (ret > 0 || sync_rcu_preempt_exp_done_unlocked(rnp_root))
|
||||
if (synchronize_rcu_expedited_wait_once(jiffies_stall))
|
||||
return;
|
||||
WARN_ON(ret < 0); /* workqueues should not be signaled. */
|
||||
if (rcu_cpu_stall_suppress)
|
||||
continue;
|
||||
panic_on_rcu_stall();
|
||||
|
@ -491,7 +526,7 @@ static void synchronize_sched_expedited_wait(void)
|
|||
struct rcu_data *rdp;
|
||||
|
||||
mask = leaf_node_cpu_bit(rnp, cpu);
|
||||
if (!(rnp->expmask & mask))
|
||||
if (!(READ_ONCE(rnp->expmask) & mask))
|
||||
continue;
|
||||
ndetected++;
|
||||
rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||
|
@ -503,17 +538,18 @@ static void synchronize_sched_expedited_wait(void)
|
|||
}
|
||||
pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
|
||||
jiffies - jiffies_start, rcu_state.expedited_sequence,
|
||||
rnp_root->expmask, ".T"[!!rnp_root->exp_tasks]);
|
||||
READ_ONCE(rnp_root->expmask),
|
||||
".T"[!!rnp_root->exp_tasks]);
|
||||
if (ndetected) {
|
||||
pr_err("blocking rcu_node structures:");
|
||||
rcu_for_each_node_breadth_first(rnp) {
|
||||
if (rnp == rnp_root)
|
||||
continue; /* printed unconditionally */
|
||||
if (sync_rcu_preempt_exp_done_unlocked(rnp))
|
||||
if (sync_rcu_exp_done_unlocked(rnp))
|
||||
continue;
|
||||
pr_cont(" l=%u:%d-%d:%#lx/%c",
|
||||
rnp->level, rnp->grplo, rnp->grphi,
|
||||
rnp->expmask,
|
||||
READ_ONCE(rnp->expmask),
|
||||
".T"[!!rnp->exp_tasks]);
|
||||
}
|
||||
pr_cont("\n");
|
||||
|
@ -521,7 +557,7 @@ static void synchronize_sched_expedited_wait(void)
|
|||
rcu_for_each_leaf_node(rnp) {
|
||||
for_each_leaf_node_possible_cpu(rnp, cpu) {
|
||||
mask = leaf_node_cpu_bit(rnp, cpu);
|
||||
if (!(rnp->expmask & mask))
|
||||
if (!(READ_ONCE(rnp->expmask) & mask))
|
||||
continue;
|
||||
dump_cpu_task(cpu);
|
||||
}
|
||||
|
@ -540,16 +576,15 @@ static void rcu_exp_wait_wake(unsigned long s)
|
|||
{
|
||||
struct rcu_node *rnp;
|
||||
|
||||
synchronize_sched_expedited_wait();
|
||||
synchronize_rcu_expedited_wait();
|
||||
|
||||
// Switch over to wakeup mode, allowing the next GP to proceed.
|
||||
// End the previous grace period only after acquiring the mutex
|
||||
// to ensure that only one GP runs concurrently with wakeups.
|
||||
mutex_lock(&rcu_state.exp_wake_mutex);
|
||||
rcu_exp_gp_seq_end();
|
||||
trace_rcu_exp_grace_period(rcu_state.name, s, TPS("end"));
|
||||
|
||||
/*
|
||||
* Switch over to wakeup mode, allowing the next GP, but -only- the
|
||||
* next GP, to proceed.
|
||||
*/
|
||||
mutex_lock(&rcu_state.exp_wake_mutex);
|
||||
|
||||
rcu_for_each_node_breadth_first(rnp) {
|
||||
if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s)) {
|
||||
spin_lock(&rnp->exp_lock);
|
||||
|
@ -559,7 +594,7 @@ static void rcu_exp_wait_wake(unsigned long s)
|
|||
spin_unlock(&rnp->exp_lock);
|
||||
}
|
||||
smp_mb(); /* All above changes before wakeup. */
|
||||
wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rcu_state.expedited_sequence) & 0x3]);
|
||||
wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
|
||||
}
|
||||
trace_rcu_exp_grace_period(rcu_state.name, s, TPS("endwake"));
|
||||
mutex_unlock(&rcu_state.exp_wake_mutex);
|
||||
|
@ -610,7 +645,7 @@ static void rcu_exp_handler(void *unused)
|
|||
* critical section. If also enabled or idle, immediately
|
||||
* report the quiescent state, otherwise defer.
|
||||
*/
|
||||
if (!t->rcu_read_lock_nesting) {
|
||||
if (!rcu_preempt_depth()) {
|
||||
if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) ||
|
||||
rcu_dynticks_curr_cpu_in_eqs()) {
|
||||
rcu_report_exp_rdp(rdp);
|
||||
|
@ -634,7 +669,7 @@ static void rcu_exp_handler(void *unused)
|
|||
* can have caused this quiescent state to already have been
|
||||
* reported, so we really do need to check ->expmask.
|
||||
*/
|
||||
if (t->rcu_read_lock_nesting > 0) {
|
||||
if (rcu_preempt_depth() > 0) {
|
||||
raw_spin_lock_irqsave_rcu_node(rnp, flags);
|
||||
if (rnp->expmask & rdp->grpmask) {
|
||||
rdp->exp_deferred_qs = true;
|
||||
|
@ -670,7 +705,7 @@ static void rcu_exp_handler(void *unused)
|
|||
}
|
||||
}
|
||||
|
||||
/* PREEMPT=y, so no PREEMPT=n expedited grace period to clean up after. */
|
||||
/* PREEMPTION=y, so no PREEMPTION=n expedited grace period to clean up after. */
|
||||
static void sync_sched_exp_online_cleanup(int cpu)
|
||||
{
|
||||
}
|
||||
|
@ -785,7 +820,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
|
|||
* implementations, it is still unfriendly to real-time workloads, so is
|
||||
* thus not recommended for any sort of common-case code. In fact, if
|
||||
* you are using synchronize_rcu_expedited() in a loop, please restructure
|
||||
* your code to batch your updates, and then Use a single synchronize_rcu()
|
||||
* your code to batch your updates, and then use a single synchronize_rcu()
|
||||
* instead.
|
||||
*
|
||||
* This has the same semantics as (but is more brutal than) synchronize_rcu().
|
||||
|
|
|
@ -220,7 +220,7 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
|
|||
* blocked tasks.
|
||||
*/
|
||||
if (!rnp->gp_tasks && (blkd_state & RCU_GP_BLKD)) {
|
||||
rnp->gp_tasks = &t->rcu_node_entry;
|
||||
WRITE_ONCE(rnp->gp_tasks, &t->rcu_node_entry);
|
||||
WARN_ON_ONCE(rnp->completedqs == rnp->gp_seq);
|
||||
}
|
||||
if (!rnp->exp_tasks && (blkd_state & RCU_EXP_BLKD))
|
||||
|
@ -290,8 +290,8 @@ void rcu_note_context_switch(bool preempt)
|
|||
|
||||
trace_rcu_utilization(TPS("Start context switch"));
|
||||
lockdep_assert_irqs_disabled();
|
||||
WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0);
|
||||
if (t->rcu_read_lock_nesting > 0 &&
|
||||
WARN_ON_ONCE(!preempt && rcu_preempt_depth() > 0);
|
||||
if (rcu_preempt_depth() > 0 &&
|
||||
!t->rcu_read_unlock_special.b.blocked) {
|
||||
|
||||
/* Possibly blocking in an RCU read-side critical section. */
|
||||
|
@ -340,7 +340,7 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
|
|||
*/
|
||||
static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp)
|
||||
{
|
||||
return rnp->gp_tasks != NULL;
|
||||
return READ_ONCE(rnp->gp_tasks) != NULL;
|
||||
}
|
||||
|
||||
/* Bias and limit values for ->rcu_read_lock_nesting. */
|
||||
|
@ -348,6 +348,21 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp)
|
|||
#define RCU_NEST_NMAX (-INT_MAX / 2)
|
||||
#define RCU_NEST_PMAX (INT_MAX / 2)
|
||||
|
||||
static void rcu_preempt_read_enter(void)
|
||||
{
|
||||
current->rcu_read_lock_nesting++;
|
||||
}
|
||||
|
||||
static void rcu_preempt_read_exit(void)
|
||||
{
|
||||
current->rcu_read_lock_nesting--;
|
||||
}
|
||||
|
||||
static void rcu_preempt_depth_set(int val)
|
||||
{
|
||||
current->rcu_read_lock_nesting = val;
|
||||
}
|
||||
|
||||
/*
|
||||
* Preemptible RCU implementation for rcu_read_lock().
|
||||
* Just increment ->rcu_read_lock_nesting, shared state will be updated
|
||||
|
@ -355,9 +370,9 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp)
|
|||
*/
|
||||
void __rcu_read_lock(void)
|
||||
{
|
||||
current->rcu_read_lock_nesting++;
|
||||
rcu_preempt_read_enter();
|
||||
if (IS_ENABLED(CONFIG_PROVE_LOCKING))
|
||||
WARN_ON_ONCE(current->rcu_read_lock_nesting > RCU_NEST_PMAX);
|
||||
WARN_ON_ONCE(rcu_preempt_depth() > RCU_NEST_PMAX);
|
||||
barrier(); /* critical section after entry code. */
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__rcu_read_lock);
|
||||
|
@ -373,19 +388,19 @@ void __rcu_read_unlock(void)
|
|||
{
|
||||
struct task_struct *t = current;
|
||||
|
||||
if (t->rcu_read_lock_nesting != 1) {
|
||||
--t->rcu_read_lock_nesting;
|
||||
if (rcu_preempt_depth() != 1) {
|
||||
rcu_preempt_read_exit();
|
||||
} else {
|
||||
barrier(); /* critical section before exit code. */
|
||||
t->rcu_read_lock_nesting = -RCU_NEST_BIAS;
|
||||
rcu_preempt_depth_set(-RCU_NEST_BIAS);
|
||||
barrier(); /* assign before ->rcu_read_unlock_special load */
|
||||
if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s)))
|
||||
rcu_read_unlock_special(t);
|
||||
barrier(); /* ->rcu_read_unlock_special load before assign */
|
||||
t->rcu_read_lock_nesting = 0;
|
||||
rcu_preempt_depth_set(0);
|
||||
}
|
||||
if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
|
||||
int rrln = t->rcu_read_lock_nesting;
|
||||
int rrln = rcu_preempt_depth();
|
||||
|
||||
WARN_ON_ONCE(rrln < 0 && rrln > RCU_NEST_NMAX);
|
||||
}
|
||||
|
@ -444,15 +459,9 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
|
|||
local_irq_restore(flags);
|
||||
return;
|
||||
}
|
||||
t->rcu_read_unlock_special.b.deferred_qs = false;
|
||||
if (special.b.need_qs) {
|
||||
t->rcu_read_unlock_special.s = 0;
|
||||
if (special.b.need_qs)
|
||||
rcu_qs();
|
||||
t->rcu_read_unlock_special.b.need_qs = false;
|
||||
if (!t->rcu_read_unlock_special.s && !rdp->exp_deferred_qs) {
|
||||
local_irq_restore(flags);
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Respond to a request by an expedited grace period for a
|
||||
|
@ -460,17 +469,11 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
|
|||
* tasks are handled when removing the task from the
|
||||
* blocked-tasks list below.
|
||||
*/
|
||||
if (rdp->exp_deferred_qs) {
|
||||
if (rdp->exp_deferred_qs)
|
||||
rcu_report_exp_rdp(rdp);
|
||||
if (!t->rcu_read_unlock_special.s) {
|
||||
local_irq_restore(flags);
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
/* Clean up if blocked during RCU read-side critical section. */
|
||||
if (special.b.blocked) {
|
||||
t->rcu_read_unlock_special.b.blocked = false;
|
||||
|
||||
/*
|
||||
* Remove this task from the list it blocked on. The task
|
||||
|
@ -485,7 +488,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
|
|||
empty_norm = !rcu_preempt_blocked_readers_cgp(rnp);
|
||||
WARN_ON_ONCE(rnp->completedqs == rnp->gp_seq &&
|
||||
(!empty_norm || rnp->qsmask));
|
||||
empty_exp = sync_rcu_preempt_exp_done(rnp);
|
||||
empty_exp = sync_rcu_exp_done(rnp);
|
||||
smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */
|
||||
np = rcu_next_node_entry(t, rnp);
|
||||
list_del_init(&t->rcu_node_entry);
|
||||
|
@ -493,7 +496,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
|
|||
trace_rcu_unlock_preempted_task(TPS("rcu_preempt"),
|
||||
rnp->gp_seq, t->pid);
|
||||
if (&t->rcu_node_entry == rnp->gp_tasks)
|
||||
rnp->gp_tasks = np;
|
||||
WRITE_ONCE(rnp->gp_tasks, np);
|
||||
if (&t->rcu_node_entry == rnp->exp_tasks)
|
||||
rnp->exp_tasks = np;
|
||||
if (IS_ENABLED(CONFIG_RCU_BOOST)) {
|
||||
|
@ -509,7 +512,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
|
|||
* Note that rcu_report_unblock_qs_rnp() releases rnp->lock,
|
||||
* so we must take a snapshot of the expedited state.
|
||||
*/
|
||||
empty_exp_now = sync_rcu_preempt_exp_done(rnp);
|
||||
empty_exp_now = sync_rcu_exp_done(rnp);
|
||||
if (!empty_norm && !rcu_preempt_blocked_readers_cgp(rnp)) {
|
||||
trace_rcu_quiescent_state_report(TPS("preempt_rcu"),
|
||||
rnp->gp_seq,
|
||||
|
@ -551,7 +554,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
|
|||
{
|
||||
return (__this_cpu_read(rcu_data.exp_deferred_qs) ||
|
||||
READ_ONCE(t->rcu_read_unlock_special.s)) &&
|
||||
t->rcu_read_lock_nesting <= 0;
|
||||
rcu_preempt_depth() <= 0;
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -564,16 +567,16 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
|
|||
static void rcu_preempt_deferred_qs(struct task_struct *t)
|
||||
{
|
||||
unsigned long flags;
|
||||
bool couldrecurse = t->rcu_read_lock_nesting >= 0;
|
||||
bool couldrecurse = rcu_preempt_depth() >= 0;
|
||||
|
||||
if (!rcu_preempt_need_deferred_qs(t))
|
||||
return;
|
||||
if (couldrecurse)
|
||||
t->rcu_read_lock_nesting -= RCU_NEST_BIAS;
|
||||
rcu_preempt_depth_set(rcu_preempt_depth() - RCU_NEST_BIAS);
|
||||
local_irq_save(flags);
|
||||
rcu_preempt_deferred_qs_irqrestore(t, flags);
|
||||
if (couldrecurse)
|
||||
t->rcu_read_lock_nesting += RCU_NEST_BIAS;
|
||||
rcu_preempt_depth_set(rcu_preempt_depth() + RCU_NEST_BIAS);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -610,9 +613,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
|
|||
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
|
||||
struct rcu_node *rnp = rdp->mynode;
|
||||
|
||||
t->rcu_read_unlock_special.b.exp_hint = false;
|
||||
exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) ||
|
||||
(rdp->grpmask & rnp->expmask) ||
|
||||
(rdp->grpmask & READ_ONCE(rnp->expmask)) ||
|
||||
tick_nohz_full_cpu(rdp->cpu);
|
||||
// Need to defer quiescent state until everything is enabled.
|
||||
if (irqs_were_disabled && use_softirq &&
|
||||
|
@ -640,7 +642,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
|
|||
local_irq_restore(flags);
|
||||
return;
|
||||
}
|
||||
WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
|
||||
rcu_preempt_deferred_qs_irqrestore(t, flags);
|
||||
}
|
||||
|
||||
|
@ -648,8 +649,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
|
|||
* Check that the list of blocked tasks for the newly completed grace
|
||||
* period is in fact empty. It is a serious bug to complete a grace
|
||||
* period that still has RCU readers blocked! This function must be
|
||||
* invoked -before- updating this rnp's ->gp_seq, and the rnp's ->lock
|
||||
* must be held by the caller.
|
||||
* invoked -before- updating this rnp's ->gp_seq.
|
||||
*
|
||||
* Also, if there are blocked tasks on the list, they automatically
|
||||
* block the newly created grace period, so set up ->gp_tasks accordingly.
|
||||
|
@ -659,11 +659,12 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
|
|||
struct task_struct *t;
|
||||
|
||||
RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_check_blocked_tasks() invoked with preemption enabled!!!\n");
|
||||
raw_lockdep_assert_held_rcu_node(rnp);
|
||||
if (WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp)))
|
||||
dump_blkd_tasks(rnp, 10);
|
||||
if (rcu_preempt_has_tasks(rnp) &&
|
||||
(rnp->qsmaskinit || rnp->wait_blkd_tasks)) {
|
||||
rnp->gp_tasks = rnp->blkd_tasks.next;
|
||||
WRITE_ONCE(rnp->gp_tasks, rnp->blkd_tasks.next);
|
||||
t = container_of(rnp->gp_tasks, struct task_struct,
|
||||
rcu_node_entry);
|
||||
trace_rcu_unlock_preempted_task(TPS("rcu_preempt-GPS"),
|
||||
|
@ -686,7 +687,7 @@ static void rcu_flavor_sched_clock_irq(int user)
|
|||
if (user || rcu_is_cpu_rrupt_from_idle()) {
|
||||
rcu_note_voluntary_context_switch(current);
|
||||
}
|
||||
if (t->rcu_read_lock_nesting > 0 ||
|
||||
if (rcu_preempt_depth() > 0 ||
|
||||
(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) {
|
||||
/* No QS, force context switch if deferred. */
|
||||
if (rcu_preempt_need_deferred_qs(t)) {
|
||||
|
@ -696,13 +697,13 @@ static void rcu_flavor_sched_clock_irq(int user)
|
|||
} else if (rcu_preempt_need_deferred_qs(t)) {
|
||||
rcu_preempt_deferred_qs(t); /* Report deferred QS. */
|
||||
return;
|
||||
} else if (!t->rcu_read_lock_nesting) {
|
||||
} else if (!rcu_preempt_depth()) {
|
||||
rcu_qs(); /* Report immediate QS. */
|
||||
return;
|
||||
}
|
||||
|
||||
/* If GP is oldish, ask for help from rcu_read_unlock_special(). */
|
||||
if (t->rcu_read_lock_nesting > 0 &&
|
||||
if (rcu_preempt_depth() > 0 &&
|
||||
__this_cpu_read(rcu_data.core_needs_qs) &&
|
||||
__this_cpu_read(rcu_data.cpu_no_qs.b.norm) &&
|
||||
!t->rcu_read_unlock_special.b.need_qs &&
|
||||
|
@ -723,11 +724,11 @@ void exit_rcu(void)
|
|||
struct task_struct *t = current;
|
||||
|
||||
if (unlikely(!list_empty(¤t->rcu_node_entry))) {
|
||||
t->rcu_read_lock_nesting = 1;
|
||||
rcu_preempt_depth_set(1);
|
||||
barrier();
|
||||
WRITE_ONCE(t->rcu_read_unlock_special.b.blocked, true);
|
||||
} else if (unlikely(t->rcu_read_lock_nesting)) {
|
||||
t->rcu_read_lock_nesting = 1;
|
||||
} else if (unlikely(rcu_preempt_depth())) {
|
||||
rcu_preempt_depth_set(1);
|
||||
} else {
|
||||
return;
|
||||
}
|
||||
|
@ -757,7 +758,8 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
|
|||
pr_info("%s: %d:%d ->qsmask %#lx ->qsmaskinit %#lx ->qsmaskinitnext %#lx\n",
|
||||
__func__, rnp1->grplo, rnp1->grphi, rnp1->qsmask, rnp1->qsmaskinit, rnp1->qsmaskinitnext);
|
||||
pr_info("%s: ->gp_tasks %p ->boost_tasks %p ->exp_tasks %p\n",
|
||||
__func__, rnp->gp_tasks, rnp->boost_tasks, rnp->exp_tasks);
|
||||
__func__, READ_ONCE(rnp->gp_tasks), rnp->boost_tasks,
|
||||
rnp->exp_tasks);
|
||||
pr_info("%s: ->blkd_tasks", __func__);
|
||||
i = 0;
|
||||
list_for_each(lhp, &rnp->blkd_tasks) {
|
||||
|
@ -788,7 +790,7 @@ static void __init rcu_bootup_announce(void)
|
|||
}
|
||||
|
||||
/*
|
||||
* Note a quiescent state for PREEMPT=n. Because we do not need to know
|
||||
* Note a quiescent state for PREEMPTION=n. Because we do not need to know
|
||||
* how many quiescent states passed, just if there was at least one since
|
||||
* the start of the grace period, this just sets a flag. The caller must
|
||||
* have disabled preemption.
|
||||
|
@ -838,7 +840,7 @@ void rcu_all_qs(void)
|
|||
EXPORT_SYMBOL_GPL(rcu_all_qs);
|
||||
|
||||
/*
|
||||
* Note a PREEMPT=n context switch. The caller must have disabled interrupts.
|
||||
* Note a PREEMPTION=n context switch. The caller must have disabled interrupts.
|
||||
*/
|
||||
void rcu_note_context_switch(bool preempt)
|
||||
{
|
||||
|
@ -1262,10 +1264,9 @@ static void rcu_prepare_for_idle(void)
|
|||
/*
|
||||
* This code is invoked when a CPU goes idle, at which point we want
|
||||
* to have the CPU do everything required for RCU so that it can enter
|
||||
* the energy-efficient dyntick-idle mode. This is handled by a
|
||||
* state machine implemented by rcu_prepare_for_idle() below.
|
||||
* the energy-efficient dyntick-idle mode.
|
||||
*
|
||||
* The following three proprocessor symbols control this state machine:
|
||||
* The following preprocessor symbol controls this:
|
||||
*
|
||||
* RCU_IDLE_GP_DELAY gives the number of jiffies that a CPU is permitted
|
||||
* to sleep in dyntick-idle mode with RCU callbacks pending. This
|
||||
|
@ -1274,21 +1275,15 @@ static void rcu_prepare_for_idle(void)
|
|||
* number, be warned: Setting RCU_IDLE_GP_DELAY too high can hang your
|
||||
* system. And if you are -that- concerned about energy efficiency,
|
||||
* just power the system down and be done with it!
|
||||
* RCU_IDLE_LAZY_GP_DELAY gives the number of jiffies that a CPU is
|
||||
* permitted to sleep in dyntick-idle mode with only lazy RCU
|
||||
* callbacks pending. Setting this too high can OOM your system.
|
||||
*
|
||||
* The values below work well in practice. If future workloads require
|
||||
* The value below works well in practice. If future workloads require
|
||||
* adjustment, they can be converted into kernel config parameters, though
|
||||
* making the state machine smarter might be a better option.
|
||||
*/
|
||||
#define RCU_IDLE_GP_DELAY 4 /* Roughly one grace period. */
|
||||
#define RCU_IDLE_LAZY_GP_DELAY (6 * HZ) /* Roughly six seconds. */
|
||||
|
||||
static int rcu_idle_gp_delay = RCU_IDLE_GP_DELAY;
|
||||
module_param(rcu_idle_gp_delay, int, 0644);
|
||||
static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY;
|
||||
module_param(rcu_idle_lazy_gp_delay, int, 0644);
|
||||
|
||||
/*
|
||||
* Try to advance callbacks on the current CPU, but only if it has been
|
||||
|
@ -1327,8 +1322,7 @@ static bool __maybe_unused rcu_try_advance_all_cbs(void)
|
|||
/*
|
||||
* Allow the CPU to enter dyntick-idle mode unless it has callbacks ready
|
||||
* to invoke. If the CPU has callbacks, try to advance them. Tell the
|
||||
* caller to set the timeout based on whether or not there are non-lazy
|
||||
* callbacks.
|
||||
* caller about what to set the timeout.
|
||||
*
|
||||
* The caller must have disabled interrupts.
|
||||
*/
|
||||
|
@ -1354,25 +1348,18 @@ int rcu_needs_cpu(u64 basemono, u64 *nextevt)
|
|||
}
|
||||
rdp->last_accelerate = jiffies;
|
||||
|
||||
/* Request timer delay depending on laziness, and round. */
|
||||
rdp->all_lazy = !rcu_segcblist_n_nonlazy_cbs(&rdp->cblist);
|
||||
if (rdp->all_lazy) {
|
||||
dj = round_jiffies(rcu_idle_lazy_gp_delay + jiffies) - jiffies;
|
||||
} else {
|
||||
dj = round_up(rcu_idle_gp_delay + jiffies,
|
||||
rcu_idle_gp_delay) - jiffies;
|
||||
}
|
||||
/* Request timer and round. */
|
||||
dj = round_up(rcu_idle_gp_delay + jiffies, rcu_idle_gp_delay) - jiffies;
|
||||
|
||||
*nextevt = basemono + dj * TICK_NSEC;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Prepare a CPU for idle from an RCU perspective. The first major task
|
||||
* is to sense whether nohz mode has been enabled or disabled via sysfs.
|
||||
* The second major task is to check to see if a non-lazy callback has
|
||||
* arrived at a CPU that previously had only lazy callbacks. The third
|
||||
* major task is to accelerate (that is, assign grace-period numbers to)
|
||||
* any recently arrived callbacks.
|
||||
* Prepare a CPU for idle from an RCU perspective. The first major task is to
|
||||
* sense whether nohz mode has been enabled or disabled via sysfs. The second
|
||||
* major task is to accelerate (that is, assign grace-period numbers to) any
|
||||
* recently arrived callbacks.
|
||||
*
|
||||
* The caller must have disabled interrupts.
|
||||
*/
|
||||
|
@ -1398,17 +1385,6 @@ static void rcu_prepare_for_idle(void)
|
|||
if (!tne)
|
||||
return;
|
||||
|
||||
/*
|
||||
* If a non-lazy callback arrived at a CPU having only lazy
|
||||
* callbacks, invoke RCU core for the side-effect of recalculating
|
||||
* idle duration on re-entry to idle.
|
||||
*/
|
||||
if (rdp->all_lazy && rcu_segcblist_n_nonlazy_cbs(&rdp->cblist)) {
|
||||
rdp->all_lazy = false;
|
||||
invoke_rcu_core();
|
||||
return;
|
||||
}
|
||||
|
||||
/*
|
||||
* If we have not yet accelerated this jiffy, accelerate all
|
||||
* callbacks on this CPU.
|
||||
|
@ -2321,6 +2297,8 @@ static void __init rcu_organize_nocb_kthreads(void)
|
|||
{
|
||||
int cpu;
|
||||
bool firsttime = true;
|
||||
bool gotnocbs = false;
|
||||
bool gotnocbscbs = true;
|
||||
int ls = rcu_nocb_gp_stride;
|
||||
int nl = 0; /* Next GP kthread. */
|
||||
struct rcu_data *rdp;
|
||||
|
@ -2343,21 +2321,31 @@ static void __init rcu_organize_nocb_kthreads(void)
|
|||
rdp = per_cpu_ptr(&rcu_data, cpu);
|
||||
if (rdp->cpu >= nl) {
|
||||
/* New GP kthread, set up for CBs & next GP. */
|
||||
gotnocbs = true;
|
||||
nl = DIV_ROUND_UP(rdp->cpu + 1, ls) * ls;
|
||||
rdp->nocb_gp_rdp = rdp;
|
||||
rdp_gp = rdp;
|
||||
if (!firsttime && dump_tree)
|
||||
pr_cont("\n");
|
||||
firsttime = false;
|
||||
pr_alert("%s: No-CB GP kthread CPU %d:", __func__, cpu);
|
||||
if (dump_tree) {
|
||||
if (!firsttime)
|
||||
pr_cont("%s\n", gotnocbscbs
|
||||
? "" : " (self only)");
|
||||
gotnocbscbs = false;
|
||||
firsttime = false;
|
||||
pr_alert("%s: No-CB GP kthread CPU %d:",
|
||||
__func__, cpu);
|
||||
}
|
||||
} else {
|
||||
/* Another CB kthread, link to previous GP kthread. */
|
||||
gotnocbscbs = true;
|
||||
rdp->nocb_gp_rdp = rdp_gp;
|
||||
rdp_prev->nocb_next_cb_rdp = rdp;
|
||||
pr_alert(" %d", cpu);
|
||||
if (dump_tree)
|
||||
pr_cont(" %d", cpu);
|
||||
}
|
||||
rdp_prev = rdp;
|
||||
}
|
||||
if (gotnocbs && dump_tree)
|
||||
pr_cont("%s\n", gotnocbscbs ? "" : " (self only)");
|
||||
}
|
||||
|
||||
/*
|
||||
|
|
|
@ -163,7 +163,7 @@ static void rcu_iw_handler(struct irq_work *iwp)
|
|||
//
|
||||
// Printing RCU CPU stall warnings
|
||||
|
||||
#ifdef CONFIG_PREEMPTION
|
||||
#ifdef CONFIG_PREEMPT_RCU
|
||||
|
||||
/*
|
||||
* Dump detailed information for all tasks blocking the current RCU
|
||||
|
@ -215,7 +215,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp)
|
|||
return ndetected;
|
||||
}
|
||||
|
||||
#else /* #ifdef CONFIG_PREEMPTION */
|
||||
#else /* #ifdef CONFIG_PREEMPT_RCU */
|
||||
|
||||
/*
|
||||
* Because preemptible RCU does not exist, we never have to check for
|
||||
|
@ -233,7 +233,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp)
|
|||
{
|
||||
return 0;
|
||||
}
|
||||
#endif /* #else #ifdef CONFIG_PREEMPTION */
|
||||
#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
|
||||
|
||||
/*
|
||||
* Dump stacks of all tasks running on stalled CPUs. First try using
|
||||
|
@ -263,11 +263,9 @@ static void print_cpu_stall_fast_no_hz(char *cp, int cpu)
|
|||
{
|
||||
struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
|
||||
|
||||
sprintf(cp, "last_accelerate: %04lx/%04lx, Nonlazy posted: %c%c%c",
|
||||
sprintf(cp, "last_accelerate: %04lx/%04lx dyntick_enabled: %d",
|
||||
rdp->last_accelerate & 0xffff, jiffies & 0xffff,
|
||||
".l"[rdp->all_lazy],
|
||||
".L"[!rcu_segcblist_n_nonlazy_cbs(&rdp->cblist)],
|
||||
".D"[!!rdp->tick_nohz_enabled_snap]);
|
||||
!!rdp->tick_nohz_enabled_snap);
|
||||
}
|
||||
|
||||
#else /* #ifdef CONFIG_RCU_FAST_NO_HZ */
|
||||
|
@ -279,6 +277,28 @@ static void print_cpu_stall_fast_no_hz(char *cp, int cpu)
|
|||
|
||||
#endif /* #else #ifdef CONFIG_RCU_FAST_NO_HZ */
|
||||
|
||||
static const char * const gp_state_names[] = {
|
||||
[RCU_GP_IDLE] = "RCU_GP_IDLE",
|
||||
[RCU_GP_WAIT_GPS] = "RCU_GP_WAIT_GPS",
|
||||
[RCU_GP_DONE_GPS] = "RCU_GP_DONE_GPS",
|
||||
[RCU_GP_ONOFF] = "RCU_GP_ONOFF",
|
||||
[RCU_GP_INIT] = "RCU_GP_INIT",
|
||||
[RCU_GP_WAIT_FQS] = "RCU_GP_WAIT_FQS",
|
||||
[RCU_GP_DOING_FQS] = "RCU_GP_DOING_FQS",
|
||||
[RCU_GP_CLEANUP] = "RCU_GP_CLEANUP",
|
||||
[RCU_GP_CLEANED] = "RCU_GP_CLEANED",
|
||||
};
|
||||
|
||||
/*
|
||||
* Convert a ->gp_state value to a character string.
|
||||
*/
|
||||
static const char *gp_state_getname(short gs)
|
||||
{
|
||||
if (gs < 0 || gs >= ARRAY_SIZE(gp_state_names))
|
||||
return "???";
|
||||
return gp_state_names[gs];
|
||||
}
|
||||
|
||||
/*
|
||||
* Print out diagnostic information for the specified stalled CPU.
|
||||
*
|
||||
|
|
|
@ -40,6 +40,7 @@
|
|||
#include <linux/rcupdate_wait.h>
|
||||
#include <linux/sched/isolation.h>
|
||||
#include <linux/kprobes.h>
|
||||
#include <linux/slab.h>
|
||||
|
||||
#define CREATE_TRACE_POINTS
|
||||
|
||||
|
@ -51,9 +52,7 @@
|
|||
#define MODULE_PARAM_PREFIX "rcupdate."
|
||||
|
||||
#ifndef CONFIG_TINY_RCU
|
||||
extern int rcu_expedited; /* from sysctl */
|
||||
module_param(rcu_expedited, int, 0);
|
||||
extern int rcu_normal; /* from sysctl */
|
||||
module_param(rcu_normal, int, 0);
|
||||
static int rcu_normal_after_boot;
|
||||
module_param(rcu_normal_after_boot, int, 0);
|
||||
|
@ -218,6 +217,7 @@ static int __init rcu_set_runtime_mode(void)
|
|||
{
|
||||
rcu_test_sync_prims();
|
||||
rcu_scheduler_active = RCU_SCHEDULER_RUNNING;
|
||||
kfree_rcu_scheduler_running();
|
||||
rcu_test_sync_prims();
|
||||
return 0;
|
||||
}
|
||||
|
@ -435,7 +435,7 @@ struct debug_obj_descr rcuhead_debug_descr = {
|
|||
EXPORT_SYMBOL_GPL(rcuhead_debug_descr);
|
||||
#endif /* #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD */
|
||||
|
||||
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU) || defined(CONFIG_RCU_TRACE)
|
||||
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_RCU_TRACE)
|
||||
void do_trace_rcu_torture_read(const char *rcutorturename, struct rcu_head *rhp,
|
||||
unsigned long secs,
|
||||
unsigned long c_old, unsigned long c)
|
||||
|
@ -853,14 +853,22 @@ static void test_callback(struct rcu_head *r)
|
|||
|
||||
DEFINE_STATIC_SRCU(early_srcu);
|
||||
|
||||
struct early_boot_kfree_rcu {
|
||||
struct rcu_head rh;
|
||||
};
|
||||
|
||||
static void early_boot_test_call_rcu(void)
|
||||
{
|
||||
static struct rcu_head head;
|
||||
static struct rcu_head shead;
|
||||
struct early_boot_kfree_rcu *rhp;
|
||||
|
||||
call_rcu(&head, test_callback);
|
||||
if (IS_ENABLED(CONFIG_SRCU))
|
||||
call_srcu(&early_srcu, &shead, test_callback);
|
||||
rhp = kmalloc(sizeof(*rhp), GFP_KERNEL);
|
||||
if (!WARN_ON_ONCE(!rhp))
|
||||
kfree_rcu(rhp, rh);
|
||||
}
|
||||
|
||||
void rcu_early_boot_tests(void)
|
||||
|
|
|
@ -1268,7 +1268,7 @@ static struct ctl_table kern_table[] = {
|
|||
.proc_handler = proc_do_static_key,
|
||||
},
|
||||
#endif
|
||||
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
|
||||
#if defined(CONFIG_TREE_RCU)
|
||||
{
|
||||
.procname = "panic_on_rcu_stall",
|
||||
.data = &sysctl_panic_on_rcu_stall,
|
||||
|
|
|
@ -257,9 +257,6 @@ static char *tipc_key_change_dump(struct tipc_key old, struct tipc_key new,
|
|||
#define tipc_aead_rcu_ptr(rcu_ptr, lock) \
|
||||
rcu_dereference_protected((rcu_ptr), lockdep_is_held(lock))
|
||||
|
||||
#define tipc_aead_rcu_swap(rcu_ptr, ptr, lock) \
|
||||
rcu_swap_protected((rcu_ptr), (ptr), lockdep_is_held(lock))
|
||||
|
||||
#define tipc_aead_rcu_replace(rcu_ptr, ptr, lock) \
|
||||
do { \
|
||||
typeof(rcu_ptr) __tmp = rcu_dereference_protected((rcu_ptr), \
|
||||
|
@ -1189,7 +1186,7 @@ static bool tipc_crypto_key_try_align(struct tipc_crypto *rx, u8 new_pending)
|
|||
|
||||
/* Move passive key if any */
|
||||
if (key.passive) {
|
||||
tipc_aead_rcu_swap(rx->aead[key.passive], tmp2, &rx->lock);
|
||||
tmp2 = rcu_replace_pointer(rx->aead[key.passive], tmp2, lockdep_is_held(&rx->lock));
|
||||
x = (key.passive - key.pending + new_pending) % KEY_MAX;
|
||||
new_passive = (x <= 0) ? x + KEY_MAX : x;
|
||||
}
|
||||
|
|
|
@ -15,8 +15,15 @@ then
|
|||
exit 0
|
||||
fi
|
||||
ncpus=`grep '^processor' /proc/cpuinfo | wc -l`
|
||||
idlecpus=`mpstat | tail -1 | \
|
||||
awk -v ncpus=$ncpus '{ print ncpus * ($7 + $NF) / 100 }'`
|
||||
if mpstat -V > /dev/null 2>&1
|
||||
then
|
||||
idlecpus=`mpstat | tail -1 | \
|
||||
awk -v ncpus=$ncpus '{ print ncpus * ($7 + $NF) / 100 }'`
|
||||
else
|
||||
# No mpstat command, so use all available CPUs.
|
||||
echo The mpstat command is not available, so greedily using all CPUs.
|
||||
idlecpus=$ncpus
|
||||
fi
|
||||
awk -v ncpus=$ncpus -v idlecpus=$idlecpus < /dev/null '
|
||||
BEGIN {
|
||||
cpus2use = idlecpus;
|
||||
|
|
|
@ -23,25 +23,39 @@ spinmax=${4-1000}
|
|||
|
||||
n=1
|
||||
|
||||
starttime=`awk 'BEGIN { print systime(); }' < /dev/null`
|
||||
starttime=`gawk 'BEGIN { print systime(); }' < /dev/null`
|
||||
|
||||
nohotplugcpus=
|
||||
for i in /sys/devices/system/cpu/cpu[0-9]*
|
||||
do
|
||||
if test -f $i/online
|
||||
then
|
||||
:
|
||||
else
|
||||
curcpu=`echo $i | sed -e 's/^[^0-9]*//'`
|
||||
nohotplugcpus="$nohotplugcpus $curcpu"
|
||||
fi
|
||||
done
|
||||
|
||||
while :
|
||||
do
|
||||
# Check for done.
|
||||
t=`awk -v s=$starttime 'BEGIN { print systime() - s; }' < /dev/null`
|
||||
t=`gawk -v s=$starttime 'BEGIN { print systime() - s; }' < /dev/null`
|
||||
if test "$t" -gt "$duration"
|
||||
then
|
||||
exit 0;
|
||||
fi
|
||||
|
||||
# Set affinity to randomly selected online CPU
|
||||
cpus=`grep 1 /sys/devices/system/cpu/*/online |
|
||||
sed -e 's,/[^/]*$,,' -e 's/^[^0-9]*//'`
|
||||
|
||||
# Do not leave out poor old cpu0 which may not be hot-pluggable
|
||||
if [ ! -f "/sys/devices/system/cpu/cpu0/online" ]; then
|
||||
cpus="0 $cpus"
|
||||
if cpus=`grep 1 /sys/devices/system/cpu/*/online 2>&1 |
|
||||
sed -e 's,/[^/]*$,,' -e 's/^[^0-9]*//'`
|
||||
then
|
||||
:
|
||||
else
|
||||
cpus=
|
||||
fi
|
||||
# Do not leave out non-hot-pluggable CPUs
|
||||
cpus="$cpus $nohotplugcpus"
|
||||
|
||||
cpumask=`awk -v cpus="$cpus" -v me=$me -v n=$n 'BEGIN {
|
||||
srand(n + me + systime());
|
||||
|
|
|
@ -25,6 +25,7 @@ stopstate="`grep 'End-test grace-period state: g' $i/console.log 2> /dev/null |
|
|||
tail -1 | sed -e 's/^\[[ 0-9.]*] //' |
|
||||
awk '{ print \"[\" $1 \" \" $5 \" \" $6 \" \" $7 \"]\"; }' |
|
||||
tr -d '\012\015'`"
|
||||
fwdprog="`grep 'rcu_torture_fwd_prog_cr Duration' $i/console.log 2> /dev/null | sed -e 's/^\[[^]]*] //' | sort -k15nr | head -1 | awk '{ print $14 " " $15 }'`"
|
||||
if test -z "$ngps"
|
||||
then
|
||||
echo "$configfile ------- " $stopstate
|
||||
|
@ -39,7 +40,7 @@ else
|
|||
BEGIN { print ngps / dur }' < /dev/null`
|
||||
title="$title ($ngpsps/s)"
|
||||
fi
|
||||
echo $title $stopstate
|
||||
echo $title $stopstate $fwdprog
|
||||
nclosecalls=`grep --binary-files=text 'torture: Reader Batch' $i/console.log | tail -1 | awk '{for (i=NF-8;i<=NF;i++) sum+=$i; } END {print sum}'`
|
||||
if test -z "$nclosecalls"
|
||||
then
|
||||
|
|
|
@ -123,7 +123,7 @@ qemu_args=$5
|
|||
boot_args=$6
|
||||
|
||||
cd $KVM
|
||||
kstarttime=`awk 'BEGIN { print systime() }' < /dev/null`
|
||||
kstarttime=`gawk 'BEGIN { print systime() }' < /dev/null`
|
||||
if test -z "$TORTURE_BUILDONLY"
|
||||
then
|
||||
echo ' ---' `date`: Starting kernel
|
||||
|
@ -133,11 +133,10 @@ fi
|
|||
qemu_args="-enable-kvm -nographic $qemu_args"
|
||||
cpu_count=`configNR_CPUS.sh $resdir/ConfigFragment`
|
||||
cpu_count=`configfrag_boot_cpus "$boot_args" "$config_template" "$cpu_count"`
|
||||
vcpus=`identify_qemu_vcpus`
|
||||
if test $cpu_count -gt $vcpus
|
||||
if test "$cpu_count" -gt "$TORTURE_ALLOTED_CPUS"
|
||||
then
|
||||
echo CPU count limited from $cpu_count to $vcpus | tee -a $resdir/Warnings
|
||||
cpu_count=$vcpus
|
||||
echo CPU count limited from $cpu_count to $TORTURE_ALLOTED_CPUS | tee -a $resdir/Warnings
|
||||
cpu_count=$TORTURE_ALLOTED_CPUS
|
||||
fi
|
||||
qemu_args="`specify_qemu_cpus "$QEMU" "$qemu_args" "$cpu_count"`"
|
||||
|
||||
|
@ -177,7 +176,7 @@ do
|
|||
then
|
||||
qemu_pid=`cat "$resdir/qemu_pid"`
|
||||
fi
|
||||
kruntime=`awk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null`
|
||||
kruntime=`gawk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null`
|
||||
if test -z "$qemu_pid" || kill -0 "$qemu_pid" > /dev/null 2>&1
|
||||
then
|
||||
if test $kruntime -ge $seconds
|
||||
|
@ -213,7 +212,7 @@ then
|
|||
oldline="`tail $resdir/console.log`"
|
||||
while :
|
||||
do
|
||||
kruntime=`awk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null`
|
||||
kruntime=`gawk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null`
|
||||
if kill -0 $qemu_pid > /dev/null 2>&1
|
||||
then
|
||||
:
|
||||
|
|
|
@ -24,7 +24,9 @@ dur=$((30*60))
|
|||
dryrun=""
|
||||
KVM="`pwd`/tools/testing/selftests/rcutorture"; export KVM
|
||||
PATH=${KVM}/bin:$PATH; export PATH
|
||||
TORTURE_ALLOTED_CPUS=""
|
||||
. functions.sh
|
||||
|
||||
TORTURE_ALLOTED_CPUS="`identify_qemu_vcpus`"
|
||||
TORTURE_DEFCONFIG=defconfig
|
||||
TORTURE_BOOT_IMAGE=""
|
||||
TORTURE_INITRD="$KVM/initrd"; export TORTURE_INITRD
|
||||
|
@ -40,8 +42,6 @@ cpus=0
|
|||
ds=`date +%Y.%m.%d-%H:%M:%S`
|
||||
jitter="-1"
|
||||
|
||||
. functions.sh
|
||||
|
||||
usage () {
|
||||
echo "Usage: $scriptname optional arguments:"
|
||||
echo " --bootargs kernel-boot-arguments"
|
||||
|
@ -93,6 +93,11 @@ do
|
|||
checkarg --cpus "(number)" "$#" "$2" '^[0-9]*$' '^--'
|
||||
cpus=$2
|
||||
TORTURE_ALLOTED_CPUS="$2"
|
||||
max_cpus="`identify_qemu_vcpus`"
|
||||
if test "$TORTURE_ALLOTED_CPUS" -gt "$max_cpus"
|
||||
then
|
||||
TORTURE_ALLOTED_CPUS=$max_cpus
|
||||
fi
|
||||
shift
|
||||
;;
|
||||
--datestamp)
|
||||
|
@ -198,9 +203,10 @@ fi
|
|||
|
||||
CONFIGFRAG=${KVM}/configs/${TORTURE_SUITE}; export CONFIGFRAG
|
||||
|
||||
defaultconfigs="`tr '\012' ' ' < $CONFIGFRAG/CFLIST`"
|
||||
if test -z "$configs"
|
||||
then
|
||||
configs="`cat $CONFIGFRAG/CFLIST`"
|
||||
configs=$defaultconfigs
|
||||
fi
|
||||
|
||||
if test -z "$resdir"
|
||||
|
@ -209,7 +215,7 @@ then
|
|||
fi
|
||||
|
||||
# Create a file of test-name/#cpus pairs, sorted by decreasing #cpus.
|
||||
touch $T/cfgcpu
|
||||
configs_derep=
|
||||
for CF in $configs
|
||||
do
|
||||
case $CF in
|
||||
|
@ -222,15 +228,21 @@ do
|
|||
CF1=$CF
|
||||
;;
|
||||
esac
|
||||
for ((cur_rep=0;cur_rep<$config_reps;cur_rep++))
|
||||
do
|
||||
configs_derep="$configs_derep $CF1"
|
||||
done
|
||||
done
|
||||
touch $T/cfgcpu
|
||||
configs_derep="`echo $configs_derep | sed -e "s/\<CFLIST\>/$defaultconfigs/g"`"
|
||||
for CF1 in $configs_derep
|
||||
do
|
||||
if test -f "$CONFIGFRAG/$CF1"
|
||||
then
|
||||
cpu_count=`configNR_CPUS.sh $CONFIGFRAG/$CF1`
|
||||
cpu_count=`configfrag_boot_cpus "$TORTURE_BOOTARGS" "$CONFIGFRAG/$CF1" "$cpu_count"`
|
||||
cpu_count=`configfrag_boot_maxcpus "$TORTURE_BOOTARGS" "$CONFIGFRAG/$CF1" "$cpu_count"`
|
||||
for ((cur_rep=0;cur_rep<$config_reps;cur_rep++))
|
||||
do
|
||||
echo $CF1 $cpu_count >> $T/cfgcpu
|
||||
done
|
||||
echo $CF1 $cpu_count >> $T/cfgcpu
|
||||
else
|
||||
echo "The --configs file $CF1 does not exist, terminating."
|
||||
exit 1
|
||||
|
|
|
@ -20,58 +20,9 @@ if [ -s "$D/initrd/init" ]; then
|
|||
exit 0
|
||||
fi
|
||||
|
||||
T=${TMPDIR-/tmp}/mkinitrd.sh.$$
|
||||
trap 'rm -rf $T' 0 2
|
||||
mkdir $T
|
||||
|
||||
cat > $T/init << '__EOF___'
|
||||
#!/bin/sh
|
||||
# Run in userspace a few milliseconds every second. This helps to
|
||||
# exercise the NO_HZ_FULL portions of RCU. The 192 instances of "a" was
|
||||
# empirically shown to give a nice multi-millisecond burst of user-mode
|
||||
# execution on a 2GHz CPU, as desired. Modern CPUs will vary from a
|
||||
# couple of milliseconds up to perhaps 100 milliseconds, which is an
|
||||
# acceptable range.
|
||||
#
|
||||
# Why not calibrate an exact delay? Because within this initrd, we
|
||||
# are restricted to Bourne-shell builtins, which as far as I know do not
|
||||
# provide any means of obtaining a fine-grained timestamp.
|
||||
|
||||
a4="a a a a"
|
||||
a16="$a4 $a4 $a4 $a4"
|
||||
a64="$a16 $a16 $a16 $a16"
|
||||
a192="$a64 $a64 $a64"
|
||||
while :
|
||||
do
|
||||
q=
|
||||
for i in $a192
|
||||
do
|
||||
q="$q $i"
|
||||
done
|
||||
sleep 1
|
||||
done
|
||||
__EOF___
|
||||
|
||||
# Try using dracut to create initrd
|
||||
if command -v dracut >/dev/null 2>&1
|
||||
then
|
||||
echo Creating $D/initrd using dracut.
|
||||
# Filesystem creation
|
||||
dracut --force --no-hostonly --no-hostonly-cmdline --module "base" $T/initramfs.img
|
||||
cd $D
|
||||
mkdir -p initrd
|
||||
cd initrd
|
||||
zcat $T/initramfs.img | cpio -id
|
||||
cp $T/init init
|
||||
chmod +x init
|
||||
echo Done creating $D/initrd using dracut
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# No dracut, so create a C-language initrd/init program and statically
|
||||
# link it. This results in a very small initrd, but might be a bit less
|
||||
# future-proof than dracut.
|
||||
echo "Could not find dracut, attempting C initrd"
|
||||
# Create a C-language initrd/init infinite-loop program and statically
|
||||
# link it. This results in a very small initrd.
|
||||
echo "Creating a statically linked C-language initrd"
|
||||
cd $D
|
||||
mkdir -p initrd
|
||||
cd initrd
|
||||
|
|
Loading…
Reference in New Issue