tracing: Add inter-event hist trigger Documentation
Add background and details on inter-event hist triggers, including hist variables, synthetic events, and actions. Link: http://lkml.kernel.org/r/b0414efb66535aa52aa7411f58c3d56724027fce.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Baohong Liu <baohong.liu@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This commit is contained in:
parent
f404da6e1d
commit
033cbceefa
|
@ -1603,3 +1603,384 @@
|
|||
Hits: 489
|
||||
Entries: 7
|
||||
Dropped: 0
|
||||
|
||||
|
||||
2.2 Inter-event hist triggers
|
||||
-----------------------------
|
||||
|
||||
Inter-event hist triggers are hist triggers that combine values from
|
||||
one or more other events and create a histogram using that data. Data
|
||||
from an inter-event histogram can in turn become the source for
|
||||
further combined histograms, thus providing a chain of related
|
||||
histograms, which is important for some applications.
|
||||
|
||||
The most important example of an inter-event quantity that can be used
|
||||
in this manner is latency, which is simply a difference in timestamps
|
||||
between two events. Although latency is the most important
|
||||
inter-event quantity, note that because the support is completely
|
||||
general across the trace event subsystem, any event field can be used
|
||||
in an inter-event quantity.
|
||||
|
||||
An example of a histogram that combines data from other histograms
|
||||
into a useful chain would be a 'wakeupswitch latency' histogram that
|
||||
combines a 'wakeup latency' histogram and a 'switch latency'
|
||||
histogram.
|
||||
|
||||
Normally, a hist trigger specification consists of a (possibly
|
||||
compound) key along with one or more numeric values, which are
|
||||
continually updated sums associated with that key. A histogram
|
||||
specification in this case consists of individual key and value
|
||||
specifications that refer to trace event fields associated with a
|
||||
single event type.
|
||||
|
||||
The inter-event hist trigger extension allows fields from multiple
|
||||
events to be referenced and combined into a multi-event histogram
|
||||
specification. In support of this overall goal, a few enabling
|
||||
features have been added to the hist trigger support:
|
||||
|
||||
- In order to compute an inter-event quantity, a value from one
|
||||
event needs to saved and then referenced from another event. This
|
||||
requires the introduction of support for histogram 'variables'.
|
||||
|
||||
- The computation of inter-event quantities and their combination
|
||||
require some minimal amount of support for applying simple
|
||||
expressions to variables (+ and -).
|
||||
|
||||
- A histogram consisting of inter-event quantities isn't logically a
|
||||
histogram on either event (so having the 'hist' file for either
|
||||
event host the histogram output doesn't really make sense). To
|
||||
address the idea that the histogram is associated with a
|
||||
combination of events, support is added allowing the creation of
|
||||
'synthetic' events that are events derived from other events.
|
||||
These synthetic events are full-fledged events just like any other
|
||||
and can be used as such, as for instance to create the
|
||||
'combination' histograms mentioned previously.
|
||||
|
||||
- A set of 'actions' can be associated with histogram entries -
|
||||
these can be used to generate the previously mentioned synthetic
|
||||
events, but can also be used for other purposes, such as for
|
||||
example saving context when a 'max' latency has been hit.
|
||||
|
||||
- Trace events don't have a 'timestamp' associated with them, but
|
||||
there is an implicit timestamp saved along with an event in the
|
||||
underlying ftrace ring buffer. This timestamp is now exposed as a
|
||||
a synthetic field named 'common_timestamp' which can be used in
|
||||
histograms as if it were any other event field; it isn't an actual
|
||||
field in the trace format but rather is a synthesized value that
|
||||
nonetheless can be used as if it were an actual field. By default
|
||||
it is in units of nanoseconds; appending '.usecs' to a
|
||||
common_timestamp field changes the units to microseconds.
|
||||
|
||||
These features are decribed in more detail in the following sections.
|
||||
|
||||
2.2.1 Histogram Variables
|
||||
-------------------------
|
||||
|
||||
Variables are simply named locations used for saving and retrieving
|
||||
values between matching events. A 'matching' event is defined as an
|
||||
event that has a matching key - if a variable is saved for a histogram
|
||||
entry corresponding to that key, any subsequent event with a matching
|
||||
key can access that variable.
|
||||
|
||||
A variable's value is normally available to any subsequent event until
|
||||
it is set to something else by a subsequent event. The one exception
|
||||
to that rule is that any variable used in an expression is essentially
|
||||
'read-once' - once it's used by an expression in a subsequent event,
|
||||
it's reset to its 'unset' state, which means it can't be used again
|
||||
unless it's set again. This ensures not only that an event doesn't
|
||||
use an uninitialized variable in a calculation, but that that variable
|
||||
is used only once and not for any unrelated subsequent match.
|
||||
|
||||
The basic syntax for saving a variable is to simply prefix a unique
|
||||
variable name not corresponding to any keyword along with an '=' sign
|
||||
to any event field.
|
||||
|
||||
Either keys or values can be saved and retrieved in this way. This
|
||||
creates a variable named 'ts0' for a histogram entry with the key
|
||||
'next_pid':
|
||||
|
||||
# echo 'hist:keys=next_pid:vals=$ts0:ts0=common_timestamp ... >> \
|
||||
event/trigger
|
||||
|
||||
The ts0 variable can be accessed by any subsequent event having the
|
||||
same pid as 'next_pid'.
|
||||
|
||||
Variable references are formed by prepending the variable name with
|
||||
the '$' sign. Thus for example, the ts0 variable above would be
|
||||
referenced as '$ts0' in expressions.
|
||||
|
||||
Because 'vals=' is used, the common_timestamp variable value above
|
||||
will also be summed as a normal histogram value would (though for a
|
||||
timestamp it makes little sense).
|
||||
|
||||
The below shows that a key value can also be saved in the same way:
|
||||
|
||||
# echo 'hist:timer_pid=common_pid:key=timer_pid ...' >> event/trigger
|
||||
|
||||
If a variable isn't a key variable or prefixed with 'vals=', the
|
||||
associated event field will be saved in a variable but won't be summed
|
||||
as a value:
|
||||
|
||||
# echo 'hist:keys=next_pid:ts1=common_timestamp ... >> event/trigger
|
||||
|
||||
Multiple variables can be assigned at the same time. The below would
|
||||
result in both ts0 and b being created as variables, with both
|
||||
common_timestamp and field1 additionally being summed as values:
|
||||
|
||||
# echo 'hist:keys=pid:vals=$ts0,$b:ts0=common_timestamp,b=field1 ... >> \
|
||||
event/trigger
|
||||
|
||||
Note that variable assignments can appear either preceding or
|
||||
following their use. The command below behaves identically to the
|
||||
command above:
|
||||
|
||||
# echo 'hist:keys=pid:ts0=common_timestamp,b=field1:vals=$ts0,$b ... >> \
|
||||
event/trigger
|
||||
|
||||
Any number of variables not bound to a 'vals=' prefix can also be
|
||||
assigned by simply separating them with colons. Below is the same
|
||||
thing but without the values being summed in the histogram:
|
||||
|
||||
# echo 'hist:keys=pid:ts0=common_timestamp:b=field1 ... >> event/trigger
|
||||
|
||||
Variables set as above can be referenced and used in expressions on
|
||||
another event.
|
||||
|
||||
For example, here's how a latency can be calculated:
|
||||
|
||||
# echo 'hist:keys=pid,prio:ts0=common_timestamp ... >> event1/trigger
|
||||
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-$ts0 ... >> event2/trigger
|
||||
|
||||
In the first line above, the event's timetamp is saved into the
|
||||
variable ts0. In the next line, ts0 is subtracted from the second
|
||||
event's timestamp to produce the latency, which is then assigned into
|
||||
yet another variable, 'wakeup_lat'. The hist trigger below in turn
|
||||
makes use of the wakeup_lat variable to compute a combined latency
|
||||
using the same key and variable from yet another event:
|
||||
|
||||
# echo 'hist:key=pid:wakeupswitch_lat=$wakeup_lat+$switchtime_lat ... >> event3/trigger
|
||||
|
||||
2.2.2 Synthetic Events
|
||||
----------------------
|
||||
|
||||
Synthetic events are user-defined events generated from hist trigger
|
||||
variables or fields associated with one or more other events. Their
|
||||
purpose is to provide a mechanism for displaying data spanning
|
||||
multiple events consistent with the existing and already familiar
|
||||
usage for normal events.
|
||||
|
||||
To define a synthetic event, the user writes a simple specification
|
||||
consisting of the name of the new event along with one or more
|
||||
variables and their types, which can be any valid field type,
|
||||
separated by semicolons, to the tracing/synthetic_events file.
|
||||
|
||||
For instance, the following creates a new event named 'wakeup_latency'
|
||||
with 3 fields: lat, pid, and prio. Each of those fields is simply a
|
||||
variable reference to a variable on another event:
|
||||
|
||||
# echo 'wakeup_latency \
|
||||
u64 lat; \
|
||||
pid_t pid; \
|
||||
int prio' >> \
|
||||
/sys/kernel/debug/tracing/synthetic_events
|
||||
|
||||
Reading the tracing/synthetic_events file lists all the currently
|
||||
defined synthetic events, in this case the event defined above:
|
||||
|
||||
# cat /sys/kernel/debug/tracing/synthetic_events
|
||||
wakeup_latency u64 lat; pid_t pid; int prio
|
||||
|
||||
An existing synthetic event definition can be removed by prepending
|
||||
the command that defined it with a '!':
|
||||
|
||||
# echo '!wakeup_latency u64 lat pid_t pid int prio' >> \
|
||||
/sys/kernel/debug/tracing/synthetic_events
|
||||
|
||||
At this point, there isn't yet an actual 'wakeup_latency' event
|
||||
instantiated in the event subsytem - for this to happen, a 'hist
|
||||
trigger action' needs to be instantiated and bound to actual fields
|
||||
and variables defined on other events (see Section 6.3.3 below).
|
||||
|
||||
Once that is done, an event instance is created, and a histogram can
|
||||
be defined using it:
|
||||
|
||||
# echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \
|
||||
/sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger
|
||||
|
||||
The new event is created under the tracing/events/synthetic/ directory
|
||||
and looks and behaves just like any other event:
|
||||
|
||||
# ls /sys/kernel/debug/tracing/events/synthetic/wakeup_latency
|
||||
enable filter format hist id trigger
|
||||
|
||||
Like any other event, once a histogram is enabled for the event, the
|
||||
output can be displayed by reading the event's 'hist' file.
|
||||
|
||||
2.2.3 Hist trigger 'actions'
|
||||
----------------------------
|
||||
|
||||
A hist trigger 'action' is a function that's executed whenever a
|
||||
histogram entry is added or updated.
|
||||
|
||||
The default 'action' if no special function is explicity specified is
|
||||
as it always has been, to simply update the set of values associated
|
||||
with an entry. Some applications, however, may want to perform
|
||||
additional actions at that point, such as generate another event, or
|
||||
compare and save a maximum.
|
||||
|
||||
The following additional actions are available. To specify an action
|
||||
for a given event, simply specify the action between colons in the
|
||||
hist trigger specification.
|
||||
|
||||
- onmatch(matching.event).<synthetic_event_name>(param list)
|
||||
|
||||
The 'onmatch(matching.event).<synthetic_event_name>(params)' hist
|
||||
trigger action is invoked whenever an event matches and the
|
||||
histogram entry would be added or updated. It causes the named
|
||||
synthetic event to be generated with the values given in the
|
||||
'param list'. The result is the generation of a synthetic event
|
||||
that consists of the values contained in those variables at the
|
||||
time the invoking event was hit.
|
||||
|
||||
The 'param list' consists of one or more parameters which may be
|
||||
either variables or fields defined on either the 'matching.event'
|
||||
or the target event. The variables or fields specified in the
|
||||
param list may be either fully-qualified or unqualified. If a
|
||||
variable is specified as unqualified, it must be unique between
|
||||
the two events. A field name used as a param can be unqualified
|
||||
if it refers to the target event, but must be fully qualified if
|
||||
it refers to the matching event. A fully-qualified name is of the
|
||||
form 'system.event_name.$var_name' or 'system.event_name.field'.
|
||||
|
||||
The 'matching.event' specification is simply the fully qualified
|
||||
event name of the event that matches the target event for the
|
||||
onmatch() functionality, in the form 'system.event_name'.
|
||||
|
||||
Finally, the number and type of variables/fields in the 'param
|
||||
list' must match the number and types of the fields in the
|
||||
synthetic event being generated.
|
||||
|
||||
As an example the below defines a simple synthetic event and uses
|
||||
a variable defined on the sched_wakeup_new event as a parameter
|
||||
when invoking the synthetic event. Here we define the synthetic
|
||||
event:
|
||||
|
||||
# echo 'wakeup_new_test pid_t pid' >> \
|
||||
/sys/kernel/debug/tracing/synthetic_events
|
||||
|
||||
# cat /sys/kernel/debug/tracing/synthetic_events
|
||||
wakeup_new_test pid_t pid
|
||||
|
||||
The following hist trigger both defines the missing testpid
|
||||
variable and specifies an onmatch() action that generates a
|
||||
wakeup_new_test synthetic event whenever a sched_wakeup_new event
|
||||
occurs, which because of the 'if comm == "cyclictest"' filter only
|
||||
happens when the executable is cyclictest:
|
||||
|
||||
# echo 'hist:keys=$testpid:testpid=pid:onmatch(sched.sched_wakeup_new).\
|
||||
wakeup_new_test($testpid) if comm=="cyclictest"' >> \
|
||||
/sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger
|
||||
|
||||
Creating and displaying a histogram based on those events is now
|
||||
just a matter of using the fields and new synthetic event in the
|
||||
tracing/events/synthetic directory, as usual:
|
||||
|
||||
# echo 'hist:keys=pid:sort=pid' >> \
|
||||
/sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/trigger
|
||||
|
||||
Running 'cyclictest' should cause wakeup_new events to generate
|
||||
wakeup_new_test synthetic events which should result in histogram
|
||||
output in the wakeup_new_test event's hist file:
|
||||
|
||||
# cat /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/hist
|
||||
|
||||
A more typical usage would be to use two events to calculate a
|
||||
latency. The following example uses a set of hist triggers to
|
||||
produce a 'wakeup_latency' histogram:
|
||||
|
||||
First, we define a 'wakeup_latency' synthetic event:
|
||||
|
||||
# echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \
|
||||
/sys/kernel/debug/tracing/synthetic_events
|
||||
|
||||
Next, we specify that whenever we see a sched_waking event for a
|
||||
cyclictest thread, save the timestamp in a 'ts0' variable:
|
||||
|
||||
# echo 'hist:keys=$saved_pid:saved_pid=pid:ts0=common_timestamp.usecs \
|
||||
if comm=="cyclictest"' >> \
|
||||
/sys/kernel/debug/tracing/events/sched/sched_waking/trigger
|
||||
|
||||
Then, when the corresponding thread is actually scheduled onto the
|
||||
CPU by a sched_switch event, calculate the latency and use that
|
||||
along with another variable and an event field to generate a
|
||||
wakeup_latency synthetic event:
|
||||
|
||||
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:\
|
||||
onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,\
|
||||
$saved_pid,next_prio) if next_comm=="cyclictest"' >> \
|
||||
/sys/kernel/debug/tracing/events/sched/sched_switch/trigger
|
||||
|
||||
We also need to create a histogram on the wakeup_latency synthetic
|
||||
event in order to aggregate the generated synthetic event data:
|
||||
|
||||
# echo 'hist:keys=pid,prio,lat:sort=pid,lat' >> \
|
||||
/sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger
|
||||
|
||||
Finally, once we've run cyclictest to actually generate some
|
||||
events, we can see the output by looking at the wakeup_latency
|
||||
synthetic event's hist file:
|
||||
|
||||
# cat /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/hist
|
||||
|
||||
- onmax(var).save(field,.. .)
|
||||
|
||||
The 'onmax(var).save(field,...)' hist trigger action is invoked
|
||||
whenever the value of 'var' associated with a histogram entry
|
||||
exceeds the current maximum contained in that variable.
|
||||
|
||||
The end result is that the trace event fields specified as the
|
||||
onmax.save() params will be saved if 'var' exceeds the current
|
||||
maximum for that hist trigger entry. This allows context from the
|
||||
event that exhibited the new maximum to be saved for later
|
||||
reference. When the histogram is displayed, additional fields
|
||||
displaying the saved values will be printed.
|
||||
|
||||
As an example the below defines a couple of hist triggers, one for
|
||||
sched_waking and another for sched_switch, keyed on pid. Whenever
|
||||
a sched_waking occurs, the timestamp is saved in the entry
|
||||
corresponding to the current pid, and when the scheduler switches
|
||||
back to that pid, the timestamp difference is calculated. If the
|
||||
resulting latency, stored in wakeup_lat, exceeds the current
|
||||
maximum latency, the values specified in the save() fields are
|
||||
recoreded:
|
||||
|
||||
# echo 'hist:keys=pid:ts0=common_timestamp.usecs \
|
||||
if comm=="cyclictest"' >> \
|
||||
/sys/kernel/debug/tracing/events/sched/sched_waking/trigger
|
||||
|
||||
# echo 'hist:keys=next_pid:\
|
||||
wakeup_lat=common_timestamp.usecs-$ts0:\
|
||||
onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) \
|
||||
if next_comm=="cyclictest"' >> \
|
||||
/sys/kernel/debug/tracing/events/sched/sched_switch/trigger
|
||||
|
||||
When the histogram is displayed, the max value and the saved
|
||||
values corresponding to the max are displayed following the rest
|
||||
of the fields:
|
||||
|
||||
# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
|
||||
{ next_pid: 2255 } hitcount: 239
|
||||
common_timestamp-ts0: 0
|
||||
max: 27
|
||||
next_comm: cyclictest
|
||||
prev_pid: 0 prev_prio: 120 prev_comm: swapper/1
|
||||
|
||||
{ next_pid: 2256 } hitcount: 2355
|
||||
common_timestamp-ts0: 0
|
||||
max: 49 next_comm: cyclictest
|
||||
prev_pid: 0 prev_prio: 120 prev_comm: swapper/0
|
||||
|
||||
Totals:
|
||||
Hits: 12970
|
||||
Entries: 2
|
||||
Dropped: 0
|
||||
|
|
Loading…
Reference in New Issue