doc: Update RCU data-structure documentation for rcu_segcblist
The rcu_segcblist data structure, which contains segmented lists of RCU callbacks, was recently added. This commit updates the documentation accordingly. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit is contained in:
parent
8e2a439753
commit
aa123a748e
|
@ -19,6 +19,8 @@ to each other.
|
||||||
The <tt>rcu_state</tt> Structure</a>
|
The <tt>rcu_state</tt> Structure</a>
|
||||||
<li> <a href="#The rcu_node Structure">
|
<li> <a href="#The rcu_node Structure">
|
||||||
The <tt>rcu_node</tt> Structure</a>
|
The <tt>rcu_node</tt> Structure</a>
|
||||||
|
<li> <a href="#The rcu_segcblist Structure">
|
||||||
|
The <tt>rcu_segcblist</tt> Structure</a>
|
||||||
<li> <a href="#The rcu_data Structure">
|
<li> <a href="#The rcu_data Structure">
|
||||||
The <tt>rcu_data</tt> Structure</a>
|
The <tt>rcu_data</tt> Structure</a>
|
||||||
<li> <a href="#The rcu_dynticks Structure">
|
<li> <a href="#The rcu_dynticks Structure">
|
||||||
|
@ -841,6 +843,134 @@ for lockdep lock-class names.
|
||||||
Finally, lines 64-66 produce an error if the maximum number of
|
Finally, lines 64-66 produce an error if the maximum number of
|
||||||
CPUs is too large for the specified fanout.
|
CPUs is too large for the specified fanout.
|
||||||
|
|
||||||
|
<h3><a name="The rcu_segcblist Structure">
|
||||||
|
The <tt>rcu_segcblist</tt> Structure</a></h3>
|
||||||
|
|
||||||
|
The <tt>rcu_segcblist</tt> structure maintains a segmented list of
|
||||||
|
callbacks as follows:
|
||||||
|
|
||||||
|
<pre>
|
||||||
|
1 #define RCU_DONE_TAIL 0
|
||||||
|
2 #define RCU_WAIT_TAIL 1
|
||||||
|
3 #define RCU_NEXT_READY_TAIL 2
|
||||||
|
4 #define RCU_NEXT_TAIL 3
|
||||||
|
5 #define RCU_CBLIST_NSEGS 4
|
||||||
|
6
|
||||||
|
7 struct rcu_segcblist {
|
||||||
|
8 struct rcu_head *head;
|
||||||
|
9 struct rcu_head **tails[RCU_CBLIST_NSEGS];
|
||||||
|
10 unsigned long gp_seq[RCU_CBLIST_NSEGS];
|
||||||
|
11 long len;
|
||||||
|
12 long len_lazy;
|
||||||
|
13 };
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The segments are as follows:
|
||||||
|
|
||||||
|
<ol>
|
||||||
|
<li> <tt>RCU_DONE_TAIL</tt>: Callbacks whose grace periods have elapsed.
|
||||||
|
These callbacks are ready to be invoked.
|
||||||
|
<li> <tt>RCU_WAIT_TAIL</tt>: Callbacks that are waiting for the
|
||||||
|
current grace period.
|
||||||
|
Note that different CPUs can have different ideas about which
|
||||||
|
grace period is current, hence the <tt>->gp_seq</tt> field.
|
||||||
|
<li> <tt>RCU_NEXT_READY_TAIL</tt>: Callbacks waiting for the next
|
||||||
|
grace period to start.
|
||||||
|
<li> <tt>RCU_NEXT_TAIL</tt>: Callbacks that have not yet been
|
||||||
|
associated with a grace period.
|
||||||
|
</ol>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The <tt>->head</tt> pointer references the first callback or
|
||||||
|
is <tt>NULL</tt> if the list contains no callbacks (which is
|
||||||
|
<i>not</i> the same as being empty).
|
||||||
|
Each element of the <tt>->tails[]</tt> array references the
|
||||||
|
<tt>->next</tt> pointer of the last callback in the corresponding
|
||||||
|
segment of the list, or the list's <tt>->head</tt> pointer if
|
||||||
|
that segment and all previous segments are empty.
|
||||||
|
If the corresponding segment is empty but some previous segment is
|
||||||
|
not empty, then the array element is identical to its predecessor.
|
||||||
|
Older callbacks are closer to the head of the list, and new callbacks
|
||||||
|
are added at the tail.
|
||||||
|
This relationship between the <tt>->head</tt> pointer, the
|
||||||
|
<tt>->tails[]</tt> array, and the callbacks is shown in this
|
||||||
|
diagram:
|
||||||
|
|
||||||
|
</p><p><img src="nxtlist.svg" alt="nxtlist.svg" width="40%">
|
||||||
|
|
||||||
|
</p><p>In this figure, the <tt>->head</tt> pointer references the
|
||||||
|
first
|
||||||
|
RCU callback in the list.
|
||||||
|
The <tt>->tails[RCU_DONE_TAIL]</tt> array element references
|
||||||
|
the <tt>->head</tt> pointer itself, indicating that none
|
||||||
|
of the callbacks is ready to invoke.
|
||||||
|
The <tt>->tails[RCU_WAIT_TAIL]</tt> array element references callback
|
||||||
|
CB 2's <tt>->next</tt> pointer, which indicates that
|
||||||
|
CB 1 and CB 2 are both waiting on the current grace period,
|
||||||
|
give or take possible disagreements about exactly which grace period
|
||||||
|
is the current one.
|
||||||
|
The <tt>->tails[RCU_NEXT_READY_TAIL]</tt> array element
|
||||||
|
references the same RCU callback that <tt>->tails[RCU_WAIT_TAIL]</tt>
|
||||||
|
does, which indicates that there are no callbacks waiting on the next
|
||||||
|
RCU grace period.
|
||||||
|
The <tt>->tails[RCU_NEXT_TAIL]</tt> array element references
|
||||||
|
CB 4's <tt>->next</tt> pointer, indicating that all the
|
||||||
|
remaining RCU callbacks have not yet been assigned to an RCU grace
|
||||||
|
period.
|
||||||
|
Note that the <tt>->tails[RCU_NEXT_TAIL]</tt> array element
|
||||||
|
always references the last RCU callback's <tt>->next</tt> pointer
|
||||||
|
unless the callback list is empty, in which case it references
|
||||||
|
the <tt>->head</tt> pointer.
|
||||||
|
|
||||||
|
<p>
|
||||||
|
There is one additional important special case for the
|
||||||
|
<tt>->tails[RCU_NEXT_TAIL]</tt> array element: It can be <tt>NULL</tt>
|
||||||
|
when this list is <i>disabled</i>.
|
||||||
|
Lists are disabled when the corresponding CPU is offline or when
|
||||||
|
the corresponding CPU's callbacks are offloaded to a kthread,
|
||||||
|
both of which are described elsewhere.
|
||||||
|
|
||||||
|
</p><p>CPUs advance their callbacks from the
|
||||||
|
<tt>RCU_NEXT_TAIL</tt> to the <tt>RCU_NEXT_READY_TAIL</tt> to the
|
||||||
|
<tt>RCU_WAIT_TAIL</tt> to the <tt>RCU_DONE_TAIL</tt> list segments
|
||||||
|
as grace periods advance.
|
||||||
|
|
||||||
|
</p><p>The <tt>->gp_seq[]</tt> array records grace-period
|
||||||
|
numbers corresponding to the list segments.
|
||||||
|
This is what allows different CPUs to have different ideas as to
|
||||||
|
which is the current grace period while still avoiding premature
|
||||||
|
invocation of their callbacks.
|
||||||
|
In particular, this allows CPUs that go idle for extended periods
|
||||||
|
to determine which of their callbacks are ready to be invoked after
|
||||||
|
reawakening.
|
||||||
|
|
||||||
|
</p><p>The <tt>->len</tt> counter contains the number of
|
||||||
|
callbacks in <tt>->head</tt>, and the
|
||||||
|
<tt>->len_lazy</tt> contains the number of those callbacks that
|
||||||
|
are known to only free memory, and whose invocation can therefore
|
||||||
|
be safely deferred.
|
||||||
|
|
||||||
|
<p><b>Important note</b>: It is the <tt>->len</tt> field that
|
||||||
|
determines whether or not there are callbacks associated with
|
||||||
|
this <tt>rcu_segcblist</tt> structure, <i>not</i> the <tt>->head</tt>
|
||||||
|
pointer.
|
||||||
|
The reason for this is that all the ready-to-invoke callbacks
|
||||||
|
(that is, those in the <tt>RCU_DONE_TAIL</tt> segment) are extracted
|
||||||
|
all at once at callback-invocation time.
|
||||||
|
If callback invocation must be postponed, for example, because a
|
||||||
|
high-priority process just woke up on this CPU, then the remaining
|
||||||
|
callbacks are placed back on the <tt>RCU_DONE_TAIL</tt> segment.
|
||||||
|
Either way, the <tt>->len</tt> and <tt>->len_lazy</tt> counts
|
||||||
|
are adjusted after the corresponding callbacks have been invoked, and so
|
||||||
|
again it is the <tt>->len</tt> count that accurately reflects whether
|
||||||
|
or not there are callbacks associated with this <tt>rcu_segcblist</tt>
|
||||||
|
structure.
|
||||||
|
Of course, off-CPU sampling of the <tt>->len</tt> count requires
|
||||||
|
the use of appropriate synchronization, for example, memory barriers.
|
||||||
|
This synchronization can be a bit subtle, particularly in the case
|
||||||
|
of <tt>rcu_barrier()</tt>.
|
||||||
|
|
||||||
<h3><a name="The rcu_data Structure">
|
<h3><a name="The rcu_data Structure">
|
||||||
The <tt>rcu_data</tt> Structure</a></h3>
|
The <tt>rcu_data</tt> Structure</a></h3>
|
||||||
|
|
||||||
|
@ -983,62 +1113,18 @@ choice.
|
||||||
as follows:
|
as follows:
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
1 struct rcu_head *nxtlist;
|
1 struct rcu_segcblist cblist;
|
||||||
2 struct rcu_head **nxttail[RCU_NEXT_SIZE];
|
2 long qlen_last_fqs_check;
|
||||||
3 unsigned long nxtcompleted[RCU_NEXT_SIZE];
|
3 unsigned long n_cbs_invoked;
|
||||||
4 long qlen_lazy;
|
4 unsigned long n_nocbs_invoked;
|
||||||
5 long qlen;
|
5 unsigned long n_cbs_orphaned;
|
||||||
6 long qlen_last_fqs_check;
|
6 unsigned long n_cbs_adopted;
|
||||||
7 unsigned long n_force_qs_snap;
|
7 unsigned long n_force_qs_snap;
|
||||||
8 unsigned long n_cbs_invoked;
|
8 long blimit;
|
||||||
9 unsigned long n_cbs_orphaned;
|
|
||||||
10 unsigned long n_cbs_adopted;
|
|
||||||
11 long blimit;
|
|
||||||
</pre>
|
</pre>
|
||||||
|
|
||||||
<p>The <tt>->nxtlist</tt> pointer and the
|
<p>The <tt>->cblist</tt> structure is the segmented callback list
|
||||||
<tt>->nxttail[]</tt> array form a four-segment list with
|
described earlier.
|
||||||
older callbacks near the head and newer ones near the tail.
|
|
||||||
Each segment contains callbacks with the corresponding relationship
|
|
||||||
to the current grace period.
|
|
||||||
The pointer out of the end of each of the four segments is referenced
|
|
||||||
by the element of the <tt>->nxttail[]</tt> array indexed by
|
|
||||||
<tt>RCU_DONE_TAIL</tt> (for callbacks handled by a prior grace period),
|
|
||||||
<tt>RCU_WAIT_TAIL</tt> (for callbacks waiting on the current grace period),
|
|
||||||
<tt>RCU_NEXT_READY_TAIL</tt> (for callbacks that will wait on the next
|
|
||||||
grace period), and
|
|
||||||
<tt>RCU_NEXT_TAIL</tt> (for callbacks that are not yet associated
|
|
||||||
with a specific grace period)
|
|
||||||
respectively, as shown in the following figure.
|
|
||||||
|
|
||||||
</p><p><img src="nxtlist.svg" alt="nxtlist.svg" width="40%">
|
|
||||||
|
|
||||||
</p><p>In this figure, the <tt>->nxtlist</tt> pointer references the
|
|
||||||
first
|
|
||||||
RCU callback in the list.
|
|
||||||
The <tt>->nxttail[RCU_DONE_TAIL]</tt> array element references
|
|
||||||
the <tt>->nxtlist</tt> pointer itself, indicating that none
|
|
||||||
of the callbacks is ready to invoke.
|
|
||||||
The <tt>->nxttail[RCU_WAIT_TAIL]</tt> array element references callback
|
|
||||||
CB 2's <tt>->next</tt> pointer, which indicates that
|
|
||||||
CB 1 and CB 2 are both waiting on the current grace period.
|
|
||||||
The <tt>->nxttail[RCU_NEXT_READY_TAIL]</tt> array element
|
|
||||||
references the same RCU callback that <tt>->nxttail[RCU_WAIT_TAIL]</tt>
|
|
||||||
does, which indicates that there are no callbacks waiting on the next
|
|
||||||
RCU grace period.
|
|
||||||
The <tt>->nxttail[RCU_NEXT_TAIL]</tt> array element references
|
|
||||||
CB 4's <tt>->next</tt> pointer, indicating that all the
|
|
||||||
remaining RCU callbacks have not yet been assigned to an RCU grace
|
|
||||||
period.
|
|
||||||
Note that the <tt>->nxttail[RCU_NEXT_TAIL]</tt> array element
|
|
||||||
always references the last RCU callback's <tt>->next</tt> pointer
|
|
||||||
unless the callback list is empty, in which case it references
|
|
||||||
the <tt>->nxtlist</tt> pointer.
|
|
||||||
|
|
||||||
</p><p>CPUs advance their callbacks from the
|
|
||||||
<tt>RCU_NEXT_TAIL</tt> to the <tt>RCU_NEXT_READY_TAIL</tt> to the
|
|
||||||
<tt>RCU_WAIT_TAIL</tt> to the <tt>RCU_DONE_TAIL</tt> list segments
|
|
||||||
as grace periods advance.
|
|
||||||
The CPU advances the callbacks in its <tt>rcu_data</tt> structure
|
The CPU advances the callbacks in its <tt>rcu_data</tt> structure
|
||||||
whenever it notices that another RCU grace period has completed.
|
whenever it notices that another RCU grace period has completed.
|
||||||
The CPU detects the completion of an RCU grace period by noticing
|
The CPU detects the completion of an RCU grace period by noticing
|
||||||
|
@ -1049,16 +1135,7 @@ Recall that each <tt>rcu_node</tt> structure's
|
||||||
<tt>->completed</tt> field is updated at the end of each
|
<tt>->completed</tt> field is updated at the end of each
|
||||||
grace period.
|
grace period.
|
||||||
|
|
||||||
</p><p>The <tt>->nxtcompleted[]</tt> array records grace-period
|
<p>
|
||||||
numbers corresponding to the list segments.
|
|
||||||
This allows CPUs that go idle for extended periods to determine
|
|
||||||
which of their callbacks are ready to be invoked after reawakening.
|
|
||||||
|
|
||||||
</p><p>The <tt>->qlen</tt> counter contains the number of
|
|
||||||
callbacks in <tt>->nxtlist</tt>, and the
|
|
||||||
<tt>->qlen_lazy</tt> contains the number of those callbacks that
|
|
||||||
are known to only free memory, and whose invocation can therefore
|
|
||||||
be safely deferred.
|
|
||||||
The <tt>->qlen_last_fqs_check</tt> and
|
The <tt>->qlen_last_fqs_check</tt> and
|
||||||
<tt>->n_force_qs_snap</tt> coordinate the forcing of quiescent
|
<tt>->n_force_qs_snap</tt> coordinate the forcing of quiescent
|
||||||
states from <tt>call_rcu()</tt> and friends when callback
|
states from <tt>call_rcu()</tt> and friends when callback
|
||||||
|
@ -1069,6 +1146,10 @@ lists grow excessively long.
|
||||||
fields count the number of callbacks invoked,
|
fields count the number of callbacks invoked,
|
||||||
sent to other CPUs when this CPU goes offline,
|
sent to other CPUs when this CPU goes offline,
|
||||||
and received from other CPUs when those other CPUs go offline.
|
and received from other CPUs when those other CPUs go offline.
|
||||||
|
The <tt>->n_nocbs_invoked</tt> is used when the CPU's callbacks
|
||||||
|
are offloaded to a kthread.
|
||||||
|
|
||||||
|
<p>
|
||||||
Finally, the <tt>->blimit</tt> counter is the maximum number of
|
Finally, the <tt>->blimit</tt> counter is the maximum number of
|
||||||
RCU callbacks that may be invoked at a given time.
|
RCU callbacks that may be invoked at a given time.
|
||||||
|
|
||||||
|
|
|
@ -19,7 +19,7 @@
|
||||||
id="svg2"
|
id="svg2"
|
||||||
version="1.1"
|
version="1.1"
|
||||||
inkscape:version="0.48.4 r9939"
|
inkscape:version="0.48.4 r9939"
|
||||||
sodipodi:docname="nxtlist.fig">
|
sodipodi:docname="segcblist.svg">
|
||||||
<metadata
|
<metadata
|
||||||
id="metadata94">
|
id="metadata94">
|
||||||
<rdf:RDF>
|
<rdf:RDF>
|
||||||
|
@ -28,7 +28,7 @@
|
||||||
<dc:format>image/svg+xml</dc:format>
|
<dc:format>image/svg+xml</dc:format>
|
||||||
<dc:type
|
<dc:type
|
||||||
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
|
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
|
||||||
<dc:title></dc:title>
|
<dc:title />
|
||||||
</cc:Work>
|
</cc:Work>
|
||||||
</rdf:RDF>
|
</rdf:RDF>
|
||||||
</metadata>
|
</metadata>
|
||||||
|
@ -241,61 +241,51 @@
|
||||||
xml:space="preserve"
|
xml:space="preserve"
|
||||||
x="225"
|
x="225"
|
||||||
y="675"
|
y="675"
|
||||||
fill="#000000"
|
|
||||||
font-family="Courier"
|
|
||||||
font-style="normal"
|
font-style="normal"
|
||||||
font-weight="bold"
|
font-weight="bold"
|
||||||
font-size="324"
|
font-size="324"
|
||||||
text-anchor="start"
|
id="text64"
|
||||||
id="text64">nxtlist</text>
|
style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->head</text>
|
||||||
<!-- Text -->
|
<!-- Text -->
|
||||||
<text
|
<text
|
||||||
xml:space="preserve"
|
xml:space="preserve"
|
||||||
x="225"
|
x="225"
|
||||||
y="1800"
|
y="1800"
|
||||||
fill="#000000"
|
|
||||||
font-family="Courier"
|
|
||||||
font-style="normal"
|
font-style="normal"
|
||||||
font-weight="bold"
|
font-weight="bold"
|
||||||
font-size="324"
|
font-size="324"
|
||||||
text-anchor="start"
|
id="text66"
|
||||||
id="text66">nxttail[RCU_DONE_TAIL]</text>
|
style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_DONE_TAIL]</text>
|
||||||
<!-- Text -->
|
<!-- Text -->
|
||||||
<text
|
<text
|
||||||
xml:space="preserve"
|
xml:space="preserve"
|
||||||
x="225"
|
x="225"
|
||||||
y="2925"
|
y="2925"
|
||||||
fill="#000000"
|
|
||||||
font-family="Courier"
|
|
||||||
font-style="normal"
|
font-style="normal"
|
||||||
font-weight="bold"
|
font-weight="bold"
|
||||||
font-size="324"
|
font-size="324"
|
||||||
text-anchor="start"
|
id="text68"
|
||||||
id="text68">nxttail[RCU_WAIT_TAIL]</text>
|
style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_WAIT_TAIL]</text>
|
||||||
<!-- Text -->
|
<!-- Text -->
|
||||||
<text
|
<text
|
||||||
xml:space="preserve"
|
xml:space="preserve"
|
||||||
x="225"
|
x="225"
|
||||||
y="4050"
|
y="4050"
|
||||||
fill="#000000"
|
|
||||||
font-family="Courier"
|
|
||||||
font-style="normal"
|
font-style="normal"
|
||||||
font-weight="bold"
|
font-weight="bold"
|
||||||
font-size="324"
|
font-size="324"
|
||||||
text-anchor="start"
|
id="text70"
|
||||||
id="text70">nxttail[RCU_NEXT_READY_TAIL]</text>
|
style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_NEXT_READY_TAIL]</text>
|
||||||
<!-- Text -->
|
<!-- Text -->
|
||||||
<text
|
<text
|
||||||
xml:space="preserve"
|
xml:space="preserve"
|
||||||
x="225"
|
x="225"
|
||||||
y="5175"
|
y="5175"
|
||||||
fill="#000000"
|
|
||||||
font-family="Courier"
|
|
||||||
font-style="normal"
|
font-style="normal"
|
||||||
font-weight="bold"
|
font-weight="bold"
|
||||||
font-size="324"
|
font-size="324"
|
||||||
text-anchor="start"
|
id="text72"
|
||||||
id="text72">nxttail[RCU_NEXT_TAIL]</text>
|
style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_NEXT_TAIL]</text>
|
||||||
<!-- Text -->
|
<!-- Text -->
|
||||||
<text
|
<text
|
||||||
xml:space="preserve"
|
xml:space="preserve"
|
||||||
|
|
Before Width: | Height: | Size: 11 KiB After Width: | Height: | Size: 11 KiB |
Loading…
Reference in New Issue