License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
tracing: Add and use generic set_trigger_filter() implementation
Add a generic event_command.set_trigger_filter() op implementation and
have the current set of trigger commands use it - this essentially
gives them all support for filters.
Syntactically, filters are supported by adding 'if <filter>' just
after the command, in which case only events matching the filter will
invoke the trigger. For example, to add a filter to an
enable/disable_event command:
echo 'enable_event:system:event if common_pid == 999' > \
.../othersys/otherevent/trigger
The above command will only enable the system:event event if the
common_pid field in the othersys:otherevent event is 999.
As another example, to add a filter to a stacktrace command:
echo 'stacktrace if common_pid == 999' > \
.../somesys/someevent/trigger
The above command will only trigger a stacktrace if the common_pid
field in the event is 999.
The filter syntax is the same as that described in the 'Event
filtering' section of Documentation/trace/events.txt.
Because triggers can now use filters, the trigger-invoking logic needs
to be moved in those cases - e.g. for ftrace_raw_event_calls, if a
trigger has a filter associated with it, the trigger invocation now
needs to happen after the { assign; } part of the call, in order for
the trigger condition to be tested.
There's still a SOFT_DISABLED-only check at the top of e.g. the
ftrace_raw_events function, so when an event is soft disabled but not
because of the presence of a trigger, the original SOFT_DISABLED
behavior remains unchanged.
There's also a bit of trickiness in that some triggers need to avoid
being invoked while an event is currently in the process of being
logged, since the trigger may itself log data into the trace buffer.
Thus we make sure the current event is committed before invoking those
triggers. To do that, we split the trigger invocation in two - the
first part (event_triggers_call()) checks the filter using the current
trace record; if a command has the post_trigger flag set, it sets a
bit for itself in the return value, otherwise it directly invoks the
trigger. Once all commands have been either invoked or set their
return flag, event_triggers_call() returns. The current record is
then either committed or discarded; if any commands have deferred
their triggers, those commands are finally invoked following the close
of the current event by event_triggers_post_call().
To simplify the above and make it more efficient, the TRIGGER_COND bit
is introduced, which is set only if a soft-disabled trigger needs to
use the log record for filter testing or needs to wait until the
current log record is closed.
The syscall event invocation code is also changed in analogous ways.
Because event triggers need to be able to create and free filters,
this also adds a couple external wrappers for the existing
create_filter and free_filter functions, which are too generic to be
made extern functions themselves.
Link: http://lkml.kernel.org/r/7164930759d8719ef460357f143d995406e4eead.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 21:59:29 +08:00
|
|
|
|
2015-05-05 06:12:44 +08:00
|
|
|
#ifndef _LINUX_TRACE_EVENT_H
|
|
|
|
#define _LINUX_TRACE_EVENT_H
|
2009-04-13 23:20:49 +08:00
|
|
|
|
|
|
|
#include <linux/ring_buffer.h>
|
2009-09-13 07:04:54 +08:00
|
|
|
#include <linux/trace_seq.h>
|
2009-05-27 02:25:22 +08:00
|
|
|
#include <linux/percpu.h>
|
2009-09-18 12:10:28 +08:00
|
|
|
#include <linux/hardirq.h>
|
2010-01-28 09:32:29 +08:00
|
|
|
#include <linux/perf_event.h>
|
2014-04-09 05:26:21 +08:00
|
|
|
#include <linux/tracepoint.h>
|
2009-04-13 23:20:49 +08:00
|
|
|
|
|
|
|
struct trace_array;
|
tracing: Consolidate max_tr into main trace_array structure
Currently, the way the latency tracers and snapshot feature works
is to have a separate trace_array called "max_tr" that holds the
snapshot buffer. For latency tracers, this snapshot buffer is used
to swap the running buffer with this buffer to save the current max
latency.
The only items needed for the max_tr is really just a copy of the buffer
itself, the per_cpu data pointers, the time_start timestamp that states
when the max latency was triggered, and the cpu that the max latency
was triggered on. All other fields in trace_array are unused by the
max_tr, making the max_tr mostly bloat.
This change removes the max_tr completely, and adds a new structure
called trace_buffer, that holds the buffer pointer, the per_cpu data
pointers, the time_start timestamp, and the cpu where the latency occurred.
The trace_array, now has two trace_buffers, one for the normal trace and
one for the max trace or snapshot. By doing this, not only do we remove
the bloat from the max_trace but the instances of traces can now use
their own snapshot feature and not have just the top level global_trace have
the snapshot feature and latency tracers for itself.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-05 22:24:35 +08:00
|
|
|
struct trace_buffer;
|
2009-04-13 23:20:49 +08:00
|
|
|
struct tracer;
|
2009-04-11 02:53:50 +08:00
|
|
|
struct dentry;
|
tracing, perf: Implement BPF programs attached to kprobes
BPF programs, attached to kprobes, provide a safe way to execute
user-defined BPF byte-code programs without being able to crash or
hang the kernel in any way. The BPF engine makes sure that such
programs have a finite execution time and that they cannot break
out of their sandbox.
The user interface is to attach to a kprobe via the perf syscall:
struct perf_event_attr attr = {
.type = PERF_TYPE_TRACEPOINT,
.config = event_id,
...
};
event_fd = perf_event_open(&attr,...);
ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
'prog_fd' is a file descriptor associated with BPF program
previously loaded.
'event_id' is an ID of the kprobe created.
Closing 'event_fd':
close(event_fd);
... automatically detaches BPF program from it.
BPF programs can call in-kernel helper functions to:
- lookup/update/delete elements in maps
- probe_read - wraper of probe_kernel_read() used to access any
kernel data structures
BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
architecture dependent) and return 0 to ignore the event and 1 to store
kprobe event into the ring buffer.
Note, kprobes are a fundamentally _not_ a stable kernel ABI,
so BPF programs attached to kprobes must be recompiled for
every kernel version and user must supply correct LINUX_VERSION_CODE
in attr.kern_version during bpf_prog_load() call.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-26 03:49:20 +08:00
|
|
|
struct bpf_prog;
|
2009-04-13 23:20:49 +08:00
|
|
|
|
2015-05-05 06:12:44 +08:00
|
|
|
const char *trace_print_flags_seq(struct trace_seq *p, const char *delim,
|
|
|
|
unsigned long flags,
|
|
|
|
const struct trace_print_flags *flag_array);
|
2009-05-27 02:25:22 +08:00
|
|
|
|
2015-05-05 06:12:44 +08:00
|
|
|
const char *trace_print_symbols_seq(struct trace_seq *p, unsigned long val,
|
|
|
|
const struct trace_print_flags *symbol_array);
|
2009-05-21 07:21:47 +08:00
|
|
|
|
2011-04-19 09:35:28 +08:00
|
|
|
#if BITS_PER_LONG == 32
|
2017-02-23 07:39:47 +08:00
|
|
|
const char *trace_print_flags_seq_u64(struct trace_seq *p, const char *delim,
|
|
|
|
unsigned long long flags,
|
|
|
|
const struct trace_print_flags_u64 *flag_array);
|
|
|
|
|
2015-05-05 06:12:44 +08:00
|
|
|
const char *trace_print_symbols_seq_u64(struct trace_seq *p,
|
|
|
|
unsigned long long val,
|
|
|
|
const struct trace_print_flags_u64
|
2011-04-19 09:35:28 +08:00
|
|
|
*symbol_array);
|
|
|
|
#endif
|
|
|
|
|
2015-05-05 06:12:44 +08:00
|
|
|
const char *trace_print_bitmask_seq(struct trace_seq *p, void *bitmask_ptr,
|
|
|
|
unsigned int bitmask_size);
|
tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks
Being able to show a cpumask of events can be useful as some events
may affect only some CPUs. There is no standard way to record the
cpumask and converting it to a string is rather expensive during
the trace as traces happen in hotpaths. It would be better to record
the raw event mask and be able to parse it at print time.
The following macros were added for use with the TRACE_EVENT() macro:
__bitmask()
__assign_bitmask()
__get_bitmask()
To test this, I added this to the sched_migrate_task event, which
looked like this:
TRACE_EVENT(sched_migrate_task,
TP_PROTO(struct task_struct *p, int dest_cpu, const struct cpumask *cpus),
TP_ARGS(p, dest_cpu, cpus),
TP_STRUCT__entry(
__array( char, comm, TASK_COMM_LEN )
__field( pid_t, pid )
__field( int, prio )
__field( int, orig_cpu )
__field( int, dest_cpu )
__bitmask( cpumask, num_possible_cpus() )
),
TP_fast_assign(
memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
__entry->pid = p->pid;
__entry->prio = p->prio;
__entry->orig_cpu = task_cpu(p);
__entry->dest_cpu = dest_cpu;
__assign_bitmask(cpumask, cpumask_bits(cpus), num_possible_cpus());
),
TP_printk("comm=%s pid=%d prio=%d orig_cpu=%d dest_cpu=%d cpumask=%s",
__entry->comm, __entry->pid, __entry->prio,
__entry->orig_cpu, __entry->dest_cpu,
__get_bitmask(cpumask))
);
With the output of:
ksmtuned-3613 [003] d..2 485.220508: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=3 dest_cpu=2 cpumask=00000000,0000000f
migration/1-13 [001] d..5 485.221202: sched_migrate_task: comm=ksmtuned pid=3614 prio=120 orig_cpu=1 dest_cpu=0 cpumask=00000000,0000000f
awk-3615 [002] d.H5 485.221747: sched_migrate_task: comm=rcu_preempt pid=7 prio=120 orig_cpu=0 dest_cpu=1 cpumask=00000000,000000ff
migration/2-18 [002] d..5 485.222062: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=2 dest_cpu=3 cpumask=00000000,0000000f
Link: http://lkml.kernel.org/r/1399377998-14870-6-git-send-email-javi.merino@arm.com
Link: http://lkml.kernel.org/r/20140506132238.22e136d1@gandalf.local.home
Suggested-by: Javi Merino <javi.merino@arm.com>
Tested-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-07 01:10:24 +08:00
|
|
|
|
2015-05-05 06:12:44 +08:00
|
|
|
const char *trace_print_hex_seq(struct trace_seq *p,
|
2017-01-25 09:28:16 +08:00
|
|
|
const unsigned char *buf, int len,
|
2017-02-03 00:09:54 +08:00
|
|
|
bool concatenate);
|
2010-04-01 19:40:58 +08:00
|
|
|
|
2015-05-05 06:12:44 +08:00
|
|
|
const char *trace_print_array_seq(struct trace_seq *p,
|
2015-04-29 23:18:46 +08:00
|
|
|
const void *buf, int count,
|
2015-01-28 20:48:53 +08:00
|
|
|
size_t el_size);
|
|
|
|
|
2013-02-21 10:32:38 +08:00
|
|
|
struct trace_iterator;
|
|
|
|
struct trace_event;
|
|
|
|
|
2015-05-06 02:18:11 +08:00
|
|
|
int trace_raw_output_prep(struct trace_iterator *iter,
|
|
|
|
struct trace_event *event);
|
2013-02-21 10:32:38 +08:00
|
|
|
|
2009-04-13 23:20:49 +08:00
|
|
|
/*
|
|
|
|
* The trace entry - the most basic unit of tracing. This is what
|
|
|
|
* is printed in the end as a single line in the trace output, such as:
|
|
|
|
*
|
|
|
|
* bash-15816 [01] 235.197585: idle_cpu <- irq_enter
|
|
|
|
*/
|
|
|
|
struct trace_entry {
|
2009-03-26 23:03:29 +08:00
|
|
|
unsigned short type;
|
2009-04-13 23:20:49 +08:00
|
|
|
unsigned char flags;
|
|
|
|
unsigned char preempt_count;
|
|
|
|
int pid;
|
|
|
|
};
|
|
|
|
|
2015-05-14 01:44:36 +08:00
|
|
|
#define TRACE_EVENT_TYPE_MAX \
|
2009-03-26 23:03:29 +08:00
|
|
|
((1 << (sizeof(((struct trace_entry *)0)->type) * 8)) - 1)
|
|
|
|
|
2009-04-13 23:20:49 +08:00
|
|
|
/*
|
|
|
|
* Trace iterator - used by printout routines who present trace
|
|
|
|
* results to users and which routines might sleep, etc:
|
|
|
|
*/
|
|
|
|
struct trace_iterator {
|
|
|
|
struct trace_array *tr;
|
|
|
|
struct tracer *trace;
|
tracing: Consolidate max_tr into main trace_array structure
Currently, the way the latency tracers and snapshot feature works
is to have a separate trace_array called "max_tr" that holds the
snapshot buffer. For latency tracers, this snapshot buffer is used
to swap the running buffer with this buffer to save the current max
latency.
The only items needed for the max_tr is really just a copy of the buffer
itself, the per_cpu data pointers, the time_start timestamp that states
when the max latency was triggered, and the cpu that the max latency
was triggered on. All other fields in trace_array are unused by the
max_tr, making the max_tr mostly bloat.
This change removes the max_tr completely, and adds a new structure
called trace_buffer, that holds the buffer pointer, the per_cpu data
pointers, the time_start timestamp, and the cpu where the latency occurred.
The trace_array, now has two trace_buffers, one for the normal trace and
one for the max trace or snapshot. By doing this, not only do we remove
the bloat from the max_trace but the instances of traces can now use
their own snapshot feature and not have just the top level global_trace have
the snapshot feature and latency tracers for itself.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-05 22:24:35 +08:00
|
|
|
struct trace_buffer *trace_buffer;
|
2009-04-13 23:20:49 +08:00
|
|
|
void *private;
|
|
|
|
int cpu_file;
|
|
|
|
struct mutex mutex;
|
2012-06-28 08:46:14 +08:00
|
|
|
struct ring_buffer_iter **buffer_iter;
|
2009-06-02 03:16:05 +08:00
|
|
|
unsigned long iter_flags;
|
2009-04-13 23:20:49 +08:00
|
|
|
|
2010-06-03 18:26:24 +08:00
|
|
|
/* trace_seq for __print_flags() and __print_symbolic() etc. */
|
|
|
|
struct trace_seq tmp_seq;
|
|
|
|
|
2013-08-03 01:16:43 +08:00
|
|
|
cpumask_var_t started;
|
|
|
|
|
|
|
|
/* it's true when current open file is snapshot */
|
|
|
|
bool snapshot;
|
|
|
|
|
2009-04-13 23:20:49 +08:00
|
|
|
/* The below is zeroed out in pipe_read */
|
|
|
|
struct trace_seq seq;
|
|
|
|
struct trace_entry *ent;
|
2010-04-01 07:49:26 +08:00
|
|
|
unsigned long lost_events;
|
2009-12-07 22:11:39 +08:00
|
|
|
int leftover;
|
2011-07-15 04:36:53 +08:00
|
|
|
int ent_size;
|
2009-04-13 23:20:49 +08:00
|
|
|
int cpu;
|
|
|
|
u64 ts;
|
|
|
|
|
|
|
|
loff_t pos;
|
|
|
|
long idx;
|
|
|
|
|
2013-08-03 01:16:43 +08:00
|
|
|
/* All new field here will be zeroed out in pipe_read */
|
2009-04-13 23:20:49 +08:00
|
|
|
};
|
|
|
|
|
2012-11-14 04:18:22 +08:00
|
|
|
enum trace_iter_flags {
|
|
|
|
TRACE_FILE_LAT_FMT = 1,
|
|
|
|
TRACE_FILE_ANNOTATE = 2,
|
|
|
|
TRACE_FILE_TIME_IN_NS = 4,
|
|
|
|
};
|
|
|
|
|
2009-04-13 23:20:49 +08:00
|
|
|
|
|
|
|
typedef enum print_line_t (*trace_print_func)(struct trace_iterator *iter,
|
2010-04-23 06:46:14 +08:00
|
|
|
int flags, struct trace_event *event);
|
|
|
|
|
|
|
|
struct trace_event_functions {
|
2009-04-13 23:20:49 +08:00
|
|
|
trace_print_func trace;
|
|
|
|
trace_print_func raw;
|
|
|
|
trace_print_func hex;
|
|
|
|
trace_print_func binary;
|
|
|
|
};
|
|
|
|
|
2010-04-23 06:46:14 +08:00
|
|
|
struct trace_event {
|
|
|
|
struct hlist_node node;
|
|
|
|
struct list_head list;
|
|
|
|
int type;
|
|
|
|
struct trace_event_functions *funcs;
|
|
|
|
};
|
|
|
|
|
2015-05-05 21:39:12 +08:00
|
|
|
extern int register_trace_event(struct trace_event *event);
|
|
|
|
extern int unregister_trace_event(struct trace_event *event);
|
2009-04-13 23:20:49 +08:00
|
|
|
|
|
|
|
/* Return values for print_line callback */
|
|
|
|
enum print_line_t {
|
|
|
|
TRACE_TYPE_PARTIAL_LINE = 0, /* Retry after flushing the seq */
|
|
|
|
TRACE_TYPE_HANDLED = 1,
|
|
|
|
TRACE_TYPE_UNHANDLED = 2, /* Relay to other output functions */
|
|
|
|
TRACE_TYPE_NO_CONSUME = 3 /* Handled but ask to not consume */
|
|
|
|
};
|
|
|
|
|
tracing: Move trace_handle_return() out of line
Currently trace_handle_return() looks like this:
static inline enum print_line_t trace_handle_return(struct trace_seq *s)
{
return trace_seq_has_overflowed(s) ?
TRACE_TYPE_PARTIAL_LINE : TRACE_TYPE_HANDLED;
}
Where trace_seq_overflowed(s) is:
static inline bool trace_seq_has_overflowed(struct trace_seq *s)
{
return s->full || seq_buf_has_overflowed(&s->seq);
}
And seq_buf_has_overflowed(&s->seq) is:
static inline bool
seq_buf_has_overflowed(struct seq_buf *s)
{
return s->len > s->size;
}
Making trace_handle_return() into:
return (s->full || (s->seq->len > s->seq->size)) ?
TRACE_TYPE_PARTIAL_LINE :
TRACE_TYPE_HANDLED;
One would think this is not an issue to keep as an inline. But because this
is used in the TRACE_EVENT() macro, it is extended for every tracepoint in
the system. Taking a look at a single tracepoint x86_irq_vector (was the
first one I randomly chosen). As trace_handle_return is used in the
TRACE_EVENT() macro of trace_raw_output_##call() we disassemble
trace_raw_output_x86_irq_vector and do a diff:
- is the original
+ is the out-of-line code
I removed identical lines that were different just due to different
addresses.
--- /tmp/irq-vec-orig 2017-03-16 09:12:48.569384851 -0400
+++ /tmp/irq-vec-ool 2017-03-16 09:13:39.378153385 -0400
@@ -6,27 +6,23 @@
53 push %rbx
48 89 fb mov %rdi,%rbx
4c 8b a7 c0 20 00 00 mov 0x20c0(%rdi),%r12
e8 f7 72 13 00 callq ffffffff81155c80 <trace_raw_output_prep>
83 f8 01 cmp $0x1,%eax
74 05 je ffffffff8101e993 <trace_raw_output_x86_irq_vector+0x23>
5b pop %rbx
41 5c pop %r12
5d pop %rbp
c3 retq
41 8b 54 24 08 mov 0x8(%r12),%edx
- 48 8d bb 98 10 00 00 lea 0x1098(%rbx),%rdi
+ 48 81 c3 98 10 00 00 add $0x1098,%rbx
- 48 c7 c6 7b 8a a0 81 mov $0xffffffff81a08a7b,%rsi
+ 48 c7 c6 ab 8a a0 81 mov $0xffffffff81a08aab,%rsi
- e8 c5 85 13 00 callq ffffffff81156f70 <trace_seq_printf>
=== here's the start of the main difference ===
+ 48 89 df mov %rbx,%rdi
+ e8 62 7e 13 00 callq ffffffff81156810 <trace_seq_printf>
- 8b 93 b8 20 00 00 mov 0x20b8(%rbx),%edx
- 31 c0 xor %eax,%eax
- 85 d2 test %edx,%edx
- 75 11 jne ffffffff8101e9c8 <trace_raw_output_x86_irq_vector+0x58>
- 48 8b 83 a8 20 00 00 mov 0x20a8(%rbx),%rax
- 48 39 83 a0 20 00 00 cmp %rax,0x20a0(%rbx)
- 0f 93 c0 setae %al
+ 48 89 df mov %rbx,%rdi
+ e8 4a c5 12 00 callq ffffffff8114af00 <trace_handle_return>
5b pop %rbx
- 0f b6 c0 movzbl %al,%eax
=== end ===
41 5c pop %r12
5d pop %rbp
c3 retq
If you notice, the original has 22 bytes of text more than the out of line
version. As this is for every TRACE_EVENT() defined in the system, this can
become quite large.
text data bss dec hex filename
8690305 5450490 1298432 15439227 eb957b vmlinux-orig
8681725 5450490 1298432 15430647 eb73f7 vmlinux-handle
This change has a total of 8580 bytes in savings.
$ objdump -dr /tmp/vmlinux-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
324
That's 324 tracepoints. But this does not include modules (which contain
many more tracepoints). For an allyesconfig build:
$ objdump -dr vmlinux-allyes-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
1401
That's 1401 tracepoints giving us:
text data bss dec hex filename
137920629 140221067 53264384 331406080 13c0db00 vmlinux-allyes-orig
137827709 140221067 53264384 331313160 13bf7008 vmlinux-allyes-handle
92920 bytes in savings!!!
Link: http://lkml.kernel.org/r/20170315021431.13107-2-andi@firstfloor.org
Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-16 23:01:06 +08:00
|
|
|
enum print_line_t trace_handle_return(struct trace_seq *s);
|
2014-11-12 23:29:54 +08:00
|
|
|
|
2009-08-07 07:25:54 +08:00
|
|
|
void tracing_generic_entry_update(struct trace_entry *entry,
|
|
|
|
unsigned long flags,
|
|
|
|
int pc);
|
2015-05-05 22:09:53 +08:00
|
|
|
struct trace_event_file;
|
2012-08-02 22:32:10 +08:00
|
|
|
|
|
|
|
struct ring_buffer_event *
|
|
|
|
trace_event_buffer_lock_reserve(struct ring_buffer **current_buffer,
|
2015-05-05 22:09:53 +08:00
|
|
|
struct trace_event_file *trace_file,
|
2012-08-02 22:32:10 +08:00
|
|
|
int type, unsigned long len,
|
|
|
|
unsigned long flags, int pc);
|
2009-04-13 23:20:49 +08:00
|
|
|
|
2017-06-27 10:01:55 +08:00
|
|
|
#define TRACE_RECORD_CMDLINE BIT(0)
|
|
|
|
#define TRACE_RECORD_TGID BIT(1)
|
|
|
|
|
|
|
|
void tracing_record_taskinfo(struct task_struct *task, int flags);
|
|
|
|
void tracing_record_taskinfo_sched_switch(struct task_struct *prev,
|
|
|
|
struct task_struct *next, int flags);
|
|
|
|
|
|
|
|
void tracing_record_cmdline(struct task_struct *task);
|
|
|
|
void tracing_record_tgid(struct task_struct *task);
|
2009-04-13 23:20:49 +08:00
|
|
|
|
2015-05-06 02:18:11 +08:00
|
|
|
int trace_output_call(struct trace_iterator *iter, char *name, char *fmt, ...);
|
2012-08-10 07:16:14 +08:00
|
|
|
|
2009-07-20 10:20:53 +08:00
|
|
|
struct event_filter;
|
|
|
|
|
2010-04-22 00:27:06 +08:00
|
|
|
enum trace_reg {
|
|
|
|
TRACE_REG_REGISTER,
|
|
|
|
TRACE_REG_UNREGISTER,
|
2012-03-14 07:03:02 +08:00
|
|
|
#ifdef CONFIG_PERF_EVENTS
|
2010-04-22 00:27:06 +08:00
|
|
|
TRACE_REG_PERF_REGISTER,
|
|
|
|
TRACE_REG_PERF_UNREGISTER,
|
2012-02-15 22:51:49 +08:00
|
|
|
TRACE_REG_PERF_OPEN,
|
|
|
|
TRACE_REG_PERF_CLOSE,
|
2017-10-10 23:15:47 +08:00
|
|
|
/*
|
|
|
|
* These (ADD/DEL) use a 'boolean' return value, where 1 (true) means a
|
|
|
|
* custom action was taken and the default action is not to be
|
|
|
|
* performed.
|
|
|
|
*/
|
2012-02-15 22:51:50 +08:00
|
|
|
TRACE_REG_PERF_ADD,
|
|
|
|
TRACE_REG_PERF_DEL,
|
2012-03-14 07:03:02 +08:00
|
|
|
#endif
|
2010-04-22 00:27:06 +08:00
|
|
|
};
|
|
|
|
|
2015-05-05 23:45:27 +08:00
|
|
|
struct trace_event_call;
|
2010-04-22 00:27:06 +08:00
|
|
|
|
2015-05-05 23:45:27 +08:00
|
|
|
struct trace_event_class {
|
2015-04-01 02:37:12 +08:00
|
|
|
const char *system;
|
2010-04-22 00:27:06 +08:00
|
|
|
void *probe;
|
|
|
|
#ifdef CONFIG_PERF_EVENTS
|
|
|
|
void *perf_probe;
|
|
|
|
#endif
|
2015-05-05 23:45:27 +08:00
|
|
|
int (*reg)(struct trace_event_call *event,
|
2012-02-15 22:51:49 +08:00
|
|
|
enum trace_reg type, void *data);
|
2015-05-05 23:45:27 +08:00
|
|
|
int (*define_fields)(struct trace_event_call *);
|
|
|
|
struct list_head *(*get_fields)(struct trace_event_call *);
|
2010-04-22 22:35:55 +08:00
|
|
|
struct list_head fields;
|
2015-05-05 23:45:27 +08:00
|
|
|
int (*raw_init)(struct trace_event_call *);
|
2010-04-20 22:47:33 +08:00
|
|
|
};
|
|
|
|
|
2015-05-05 23:45:27 +08:00
|
|
|
extern int trace_event_reg(struct trace_event_call *event,
|
2012-02-15 22:51:49 +08:00
|
|
|
enum trace_reg type, void *data);
|
2010-06-08 23:22:06 +08:00
|
|
|
|
2015-05-06 01:18:46 +08:00
|
|
|
struct trace_event_buffer {
|
2012-08-10 10:42:57 +08:00
|
|
|
struct ring_buffer *buffer;
|
|
|
|
struct ring_buffer_event *event;
|
2015-05-05 22:09:53 +08:00
|
|
|
struct trace_event_file *trace_file;
|
2012-08-10 10:42:57 +08:00
|
|
|
void *entry;
|
|
|
|
unsigned long flags;
|
|
|
|
int pc;
|
|
|
|
};
|
|
|
|
|
2015-05-06 01:18:46 +08:00
|
|
|
void *trace_event_buffer_reserve(struct trace_event_buffer *fbuffer,
|
2015-05-05 22:09:53 +08:00
|
|
|
struct trace_event_file *trace_file,
|
2012-08-10 10:42:57 +08:00
|
|
|
unsigned long len);
|
|
|
|
|
2015-05-06 01:18:46 +08:00
|
|
|
void trace_event_buffer_commit(struct trace_event_buffer *fbuffer);
|
2012-08-10 10:42:57 +08:00
|
|
|
|
2010-04-23 23:12:36 +08:00
|
|
|
enum {
|
|
|
|
TRACE_EVENT_FL_FILTERED_BIT,
|
2010-11-18 08:39:17 +08:00
|
|
|
TRACE_EVENT_FL_CAP_ANY_BIT,
|
2011-11-01 09:09:35 +08:00
|
|
|
TRACE_EVENT_FL_NO_SET_FILTER_BIT,
|
2012-05-11 03:55:43 +08:00
|
|
|
TRACE_EVENT_FL_IGNORE_ENABLE_BIT,
|
2014-04-09 05:26:21 +08:00
|
|
|
TRACE_EVENT_FL_TRACEPOINT_BIT,
|
2015-03-26 03:49:19 +08:00
|
|
|
TRACE_EVENT_FL_KPROBE_BIT,
|
2015-07-01 10:13:50 +08:00
|
|
|
TRACE_EVENT_FL_UPROBE_BIT,
|
2010-04-23 23:12:36 +08:00
|
|
|
};
|
|
|
|
|
2012-05-04 11:09:03 +08:00
|
|
|
/*
|
|
|
|
* Event flags:
|
|
|
|
* FILTERED - The event has a filter attached
|
|
|
|
* CAP_ANY - Any user can enable for perf
|
|
|
|
* NO_SET_FILTER - Set when filter has error and is to be ignored
|
2015-05-14 03:12:33 +08:00
|
|
|
* IGNORE_ENABLE - For trace internal events, do not enable with debugfs file
|
2014-04-09 05:26:21 +08:00
|
|
|
* TRACEPOINT - Event is a tracepoint
|
2015-03-26 03:49:19 +08:00
|
|
|
* KPROBE - Event is a kprobe
|
2015-07-01 10:13:50 +08:00
|
|
|
* UPROBE - Event is a uprobe
|
2012-05-04 11:09:03 +08:00
|
|
|
*/
|
2010-04-23 23:12:36 +08:00
|
|
|
enum {
|
2010-07-02 11:07:32 +08:00
|
|
|
TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT),
|
2010-11-18 08:39:17 +08:00
|
|
|
TRACE_EVENT_FL_CAP_ANY = (1 << TRACE_EVENT_FL_CAP_ANY_BIT),
|
2011-11-01 09:09:35 +08:00
|
|
|
TRACE_EVENT_FL_NO_SET_FILTER = (1 << TRACE_EVENT_FL_NO_SET_FILTER_BIT),
|
2012-05-11 03:55:43 +08:00
|
|
|
TRACE_EVENT_FL_IGNORE_ENABLE = (1 << TRACE_EVENT_FL_IGNORE_ENABLE_BIT),
|
2014-04-09 05:26:21 +08:00
|
|
|
TRACE_EVENT_FL_TRACEPOINT = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT),
|
2015-03-26 03:49:19 +08:00
|
|
|
TRACE_EVENT_FL_KPROBE = (1 << TRACE_EVENT_FL_KPROBE_BIT),
|
2015-07-01 10:13:50 +08:00
|
|
|
TRACE_EVENT_FL_UPROBE = (1 << TRACE_EVENT_FL_UPROBE_BIT),
|
2010-04-23 23:12:36 +08:00
|
|
|
};
|
|
|
|
|
2015-07-01 10:13:50 +08:00
|
|
|
#define TRACE_EVENT_FL_UKPROBE (TRACE_EVENT_FL_KPROBE | TRACE_EVENT_FL_UPROBE)
|
|
|
|
|
2015-05-05 23:45:27 +08:00
|
|
|
struct trace_event_call {
|
2009-04-11 01:52:20 +08:00
|
|
|
struct list_head list;
|
2015-05-05 23:45:27 +08:00
|
|
|
struct trace_event_class *class;
|
2014-04-09 05:26:21 +08:00
|
|
|
union {
|
|
|
|
char *name;
|
|
|
|
/* Set TRACE_EVENT_FL_TRACEPOINT flag when using "tp" */
|
|
|
|
struct tracepoint *tp;
|
|
|
|
};
|
2010-04-23 22:00:22 +08:00
|
|
|
struct trace_event event;
|
tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values
Several tracepoints use the helper functions __print_symbolic() or
__print_flags() and pass in enums that do the mapping between the
binary data stored and the value to print. This works well for reading
the ASCII trace files, but when the data is read via userspace tools
such as perf and trace-cmd, the conversion of the binary value to a
human string format is lost if an enum is used, as userspace does not
have access to what the ENUM is.
For example, the tracepoint trace_tlb_flush() has:
__print_symbolic(REC->reason,
{ TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
{ TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
{ TLB_LOCAL_SHOOTDOWN, "local shootdown" },
{ TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })
Which maps the enum values to the strings they represent. But perf and
trace-cmd do no know what value TLB_LOCAL_MM_SHOOTDOWN is, and would
not be able to map it.
With TRACE_DEFINE_ENUM(), developers can place these in the event header
files and ftrace will convert the enums to their values:
By adding:
TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);
$ cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/format
[...]
__print_symbolic(REC->reason,
{ 0, "flush on task switch" },
{ 1, "remote shootdown" },
{ 2, "local shootdown" },
{ 3, "local mm shootdown" })
The above is what userspace expects to see, and tools do not need to
be modified to parse them.
Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org
Cc: Guilherme Cox <cox@computer.org>
Cc: Tony Luck <tony.luck@gmail.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-03-25 05:58:09 +08:00
|
|
|
char *print_fmt;
|
2009-07-20 10:20:53 +08:00
|
|
|
struct event_filter *filter;
|
2009-04-11 02:53:50 +08:00
|
|
|
void *mod;
|
2009-08-11 04:52:44 +08:00
|
|
|
void *data;
|
2013-03-13 00:38:06 +08:00
|
|
|
/*
|
|
|
|
* bit 0: filter_active
|
|
|
|
* bit 1: allow trace by non root (cap any)
|
|
|
|
* bit 2: failed to apply filter
|
2015-05-14 03:12:33 +08:00
|
|
|
* bit 3: trace internal event (do not enable)
|
2013-03-13 00:38:06 +08:00
|
|
|
* bit 4: Event was enabled by module
|
2013-10-24 21:34:17 +08:00
|
|
|
* bit 5: use call filter rather than file filter
|
2014-04-09 05:26:21 +08:00
|
|
|
* bit 6: Event is a tracepoint
|
2013-03-13 00:38:06 +08:00
|
|
|
*/
|
2012-05-04 11:09:03 +08:00
|
|
|
int flags; /* static flags of different events */
|
|
|
|
|
|
|
|
#ifdef CONFIG_PERF_EVENTS
|
|
|
|
int perf_refcount;
|
|
|
|
struct hlist_head __percpu *perf_events;
|
2017-10-24 14:53:08 +08:00
|
|
|
struct bpf_prog_array __rcu *prog_array;
|
2013-11-14 23:23:04 +08:00
|
|
|
|
2015-05-05 23:45:27 +08:00
|
|
|
int (*perf_perm)(struct trace_event_call *,
|
2013-11-14 23:23:04 +08:00
|
|
|
struct perf_event *);
|
2012-05-04 11:09:03 +08:00
|
|
|
#endif
|
|
|
|
};
|
|
|
|
|
2017-10-24 14:53:08 +08:00
|
|
|
#ifdef CONFIG_PERF_EVENTS
|
|
|
|
static inline bool bpf_prog_array_valid(struct trace_event_call *call)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* This inline function checks whether call->prog_array
|
|
|
|
* is valid or not. The function is called in various places,
|
|
|
|
* outside rcu_read_lock/unlock, as a heuristic to speed up execution.
|
|
|
|
*
|
|
|
|
* If this function returns true, and later call->prog_array
|
|
|
|
* becomes false inside rcu_read_lock/unlock region,
|
|
|
|
* we bail out then. If this function return false,
|
|
|
|
* there is a risk that we might miss a few events if the checking
|
|
|
|
* were delayed until inside rcu_read_lock/unlock region and
|
|
|
|
* call->prog_array happened to become non-NULL then.
|
|
|
|
*
|
|
|
|
* Here, READ_ONCE() is used instead of rcu_access_pointer().
|
|
|
|
* rcu_access_pointer() requires the actual definition of
|
|
|
|
* "struct bpf_prog_array" while READ_ONCE() only needs
|
|
|
|
* a declaration of the same type.
|
|
|
|
*/
|
|
|
|
return !!READ_ONCE(call->prog_array);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2014-04-09 05:26:21 +08:00
|
|
|
static inline const char *
|
2015-05-14 02:20:14 +08:00
|
|
|
trace_event_name(struct trace_event_call *call)
|
2014-04-09 05:26:21 +08:00
|
|
|
{
|
|
|
|
if (call->flags & TRACE_EVENT_FL_TRACEPOINT)
|
|
|
|
return call->tp ? call->tp->name : NULL;
|
|
|
|
else
|
|
|
|
return call->name;
|
|
|
|
}
|
|
|
|
|
2012-05-04 11:09:03 +08:00
|
|
|
struct trace_array;
|
2015-05-14 02:59:40 +08:00
|
|
|
struct trace_subsystem_dir;
|
2012-05-04 11:09:03 +08:00
|
|
|
|
|
|
|
enum {
|
2015-05-14 03:12:33 +08:00
|
|
|
EVENT_FILE_FL_ENABLED_BIT,
|
|
|
|
EVENT_FILE_FL_RECORDED_CMD_BIT,
|
2017-06-27 10:01:55 +08:00
|
|
|
EVENT_FILE_FL_RECORDED_TGID_BIT,
|
2015-05-14 03:12:33 +08:00
|
|
|
EVENT_FILE_FL_FILTERED_BIT,
|
|
|
|
EVENT_FILE_FL_NO_SET_FILTER_BIT,
|
|
|
|
EVENT_FILE_FL_SOFT_MODE_BIT,
|
|
|
|
EVENT_FILE_FL_SOFT_DISABLED_BIT,
|
|
|
|
EVENT_FILE_FL_TRIGGER_MODE_BIT,
|
|
|
|
EVENT_FILE_FL_TRIGGER_COND_BIT,
|
2015-09-26 00:58:44 +08:00
|
|
|
EVENT_FILE_FL_PID_FILTER_BIT,
|
tracing: Only have rmmod clear buffers that its events were active in
Currently, when a module event is enabled, when that module is removed, it
clears all ring buffers. This is to prevent another module from being loaded
and having one of its trace event IDs from reusing a trace event ID of the
removed module. This could cause undesirable effects as the trace event of
the new module would be using its own processing algorithms to process raw
data of another event. To prevent this, when a module is loaded, if any of
its events have been used (signified by the WAS_ENABLED event call flag,
which is never cleared), all ring buffers are cleared, just in case any one
of them contains event data of the removed event.
The problem is, there's no reason to clear all ring buffers if only one (or
less than all of them) uses one of the events. Instead, only clear the ring
buffers that recorded the events of a module that is being removed.
To do this, instead of keeping the WAS_ENABLED flag with the trace event
call, move it to the per instance (per ring buffer) event file descriptor.
The event file descriptor maps each event to a separate ring buffer
instance. Then when the module is removed, only the ring buffers that
activated one of the module's events get cleared. The rest are not touched.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-09-01 05:03:47 +08:00
|
|
|
EVENT_FILE_FL_WAS_ENABLED_BIT,
|
2012-05-04 11:09:03 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
2015-05-14 03:12:33 +08:00
|
|
|
* Event file flags:
|
2013-03-13 00:38:06 +08:00
|
|
|
* ENABLED - The event is enabled
|
2012-05-04 11:09:03 +08:00
|
|
|
* RECORDED_CMD - The comms should be recorded at sched_switch
|
2017-06-27 10:01:55 +08:00
|
|
|
* RECORDED_TGID - The tgids should be recorded at sched_switch
|
2013-10-24 21:34:17 +08:00
|
|
|
* FILTERED - The event has a filter attached
|
|
|
|
* NO_SET_FILTER - Set when filter has error and is to be ignored
|
2013-03-13 01:26:18 +08:00
|
|
|
* SOFT_MODE - The event is enabled/disabled by SOFT_DISABLED
|
|
|
|
* SOFT_DISABLED - When set, do not trace the event (even though its
|
|
|
|
* tracepoint may be enabled)
|
tracing: Add basic event trigger framework
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.
'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.
The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.
The event trigger functionality is built on top of SOFT_DISABLE
functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires. Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that. Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function. Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.
The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.
The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.
The standard open, read, and release file operations are implemented
here.
The open() implementation sets up for the various open modes of the
'trigger' file. It creates and attaches the trigger iterator and sets
up the command parser. If opened for reading set up the trigger
seq_ops.
The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.
The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.
A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.
also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.
A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations. They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.
The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event. It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.
Every event_command func() implementation essentially does the
same thing for any command:
- choose ops - use the value of param to choose either a number or
count version of event_trigger_ops specific to the command
- do the register or unregister of those ops
- associate a filter, if specified, with the triggering event
The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized. When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite. The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.
Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.
The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions. This allows func()
implementations to use command-specific blobs and supports code
re-use.
The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked. The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.
This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.
Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 21:59:24 +08:00
|
|
|
* TRIGGER_MODE - When set, invoke the triggers associated with the event
|
tracing: Add and use generic set_trigger_filter() implementation
Add a generic event_command.set_trigger_filter() op implementation and
have the current set of trigger commands use it - this essentially
gives them all support for filters.
Syntactically, filters are supported by adding 'if <filter>' just
after the command, in which case only events matching the filter will
invoke the trigger. For example, to add a filter to an
enable/disable_event command:
echo 'enable_event:system:event if common_pid == 999' > \
.../othersys/otherevent/trigger
The above command will only enable the system:event event if the
common_pid field in the othersys:otherevent event is 999.
As another example, to add a filter to a stacktrace command:
echo 'stacktrace if common_pid == 999' > \
.../somesys/someevent/trigger
The above command will only trigger a stacktrace if the common_pid
field in the event is 999.
The filter syntax is the same as that described in the 'Event
filtering' section of Documentation/trace/events.txt.
Because triggers can now use filters, the trigger-invoking logic needs
to be moved in those cases - e.g. for ftrace_raw_event_calls, if a
trigger has a filter associated with it, the trigger invocation now
needs to happen after the { assign; } part of the call, in order for
the trigger condition to be tested.
There's still a SOFT_DISABLED-only check at the top of e.g. the
ftrace_raw_events function, so when an event is soft disabled but not
because of the presence of a trigger, the original SOFT_DISABLED
behavior remains unchanged.
There's also a bit of trickiness in that some triggers need to avoid
being invoked while an event is currently in the process of being
logged, since the trigger may itself log data into the trace buffer.
Thus we make sure the current event is committed before invoking those
triggers. To do that, we split the trigger invocation in two - the
first part (event_triggers_call()) checks the filter using the current
trace record; if a command has the post_trigger flag set, it sets a
bit for itself in the return value, otherwise it directly invoks the
trigger. Once all commands have been either invoked or set their
return flag, event_triggers_call() returns. The current record is
then either committed or discarded; if any commands have deferred
their triggers, those commands are finally invoked following the close
of the current event by event_triggers_post_call().
To simplify the above and make it more efficient, the TRIGGER_COND bit
is introduced, which is set only if a soft-disabled trigger needs to
use the log record for filter testing or needs to wait until the
current log record is closed.
The syscall event invocation code is also changed in analogous ways.
Because event triggers need to be able to create and free filters,
this also adds a couple external wrappers for the existing
create_filter and free_filter functions, which are too generic to be
made extern functions themselves.
Link: http://lkml.kernel.org/r/7164930759d8719ef460357f143d995406e4eead.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 21:59:29 +08:00
|
|
|
* TRIGGER_COND - When set, one or more triggers has an associated filter
|
2015-09-26 00:58:44 +08:00
|
|
|
* PID_FILTER - When set, the event is filtered based on pid
|
tracing: Only have rmmod clear buffers that its events were active in
Currently, when a module event is enabled, when that module is removed, it
clears all ring buffers. This is to prevent another module from being loaded
and having one of its trace event IDs from reusing a trace event ID of the
removed module. This could cause undesirable effects as the trace event of
the new module would be using its own processing algorithms to process raw
data of another event. To prevent this, when a module is loaded, if any of
its events have been used (signified by the WAS_ENABLED event call flag,
which is never cleared), all ring buffers are cleared, just in case any one
of them contains event data of the removed event.
The problem is, there's no reason to clear all ring buffers if only one (or
less than all of them) uses one of the events. Instead, only clear the ring
buffers that recorded the events of a module that is being removed.
To do this, instead of keeping the WAS_ENABLED flag with the trace event
call, move it to the per instance (per ring buffer) event file descriptor.
The event file descriptor maps each event to a separate ring buffer
instance. Then when the module is removed, only the ring buffers that
activated one of the module's events get cleared. The rest are not touched.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-09-01 05:03:47 +08:00
|
|
|
* WAS_ENABLED - Set when enabled to know to clear trace on module removal
|
2012-05-04 11:09:03 +08:00
|
|
|
*/
|
|
|
|
enum {
|
2015-05-14 03:12:33 +08:00
|
|
|
EVENT_FILE_FL_ENABLED = (1 << EVENT_FILE_FL_ENABLED_BIT),
|
|
|
|
EVENT_FILE_FL_RECORDED_CMD = (1 << EVENT_FILE_FL_RECORDED_CMD_BIT),
|
2017-06-27 10:01:55 +08:00
|
|
|
EVENT_FILE_FL_RECORDED_TGID = (1 << EVENT_FILE_FL_RECORDED_TGID_BIT),
|
2015-05-14 03:12:33 +08:00
|
|
|
EVENT_FILE_FL_FILTERED = (1 << EVENT_FILE_FL_FILTERED_BIT),
|
|
|
|
EVENT_FILE_FL_NO_SET_FILTER = (1 << EVENT_FILE_FL_NO_SET_FILTER_BIT),
|
|
|
|
EVENT_FILE_FL_SOFT_MODE = (1 << EVENT_FILE_FL_SOFT_MODE_BIT),
|
|
|
|
EVENT_FILE_FL_SOFT_DISABLED = (1 << EVENT_FILE_FL_SOFT_DISABLED_BIT),
|
|
|
|
EVENT_FILE_FL_TRIGGER_MODE = (1 << EVENT_FILE_FL_TRIGGER_MODE_BIT),
|
|
|
|
EVENT_FILE_FL_TRIGGER_COND = (1 << EVENT_FILE_FL_TRIGGER_COND_BIT),
|
2015-09-26 00:58:44 +08:00
|
|
|
EVENT_FILE_FL_PID_FILTER = (1 << EVENT_FILE_FL_PID_FILTER_BIT),
|
tracing: Only have rmmod clear buffers that its events were active in
Currently, when a module event is enabled, when that module is removed, it
clears all ring buffers. This is to prevent another module from being loaded
and having one of its trace event IDs from reusing a trace event ID of the
removed module. This could cause undesirable effects as the trace event of
the new module would be using its own processing algorithms to process raw
data of another event. To prevent this, when a module is loaded, if any of
its events have been used (signified by the WAS_ENABLED event call flag,
which is never cleared), all ring buffers are cleared, just in case any one
of them contains event data of the removed event.
The problem is, there's no reason to clear all ring buffers if only one (or
less than all of them) uses one of the events. Instead, only clear the ring
buffers that recorded the events of a module that is being removed.
To do this, instead of keeping the WAS_ENABLED flag with the trace event
call, move it to the per instance (per ring buffer) event file descriptor.
The event file descriptor maps each event to a separate ring buffer
instance. Then when the module is removed, only the ring buffers that
activated one of the module's events get cleared. The rest are not touched.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-09-01 05:03:47 +08:00
|
|
|
EVENT_FILE_FL_WAS_ENABLED = (1 << EVENT_FILE_FL_WAS_ENABLED_BIT),
|
2012-05-04 11:09:03 +08:00
|
|
|
};
|
|
|
|
|
2015-05-05 22:09:53 +08:00
|
|
|
struct trace_event_file {
|
2012-05-04 11:09:03 +08:00
|
|
|
struct list_head list;
|
2015-05-05 23:45:27 +08:00
|
|
|
struct trace_event_call *event_call;
|
2017-06-07 16:12:51 +08:00
|
|
|
struct event_filter __rcu *filter;
|
2012-05-04 11:09:03 +08:00
|
|
|
struct dentry *dir;
|
|
|
|
struct trace_array *tr;
|
2015-05-14 02:59:40 +08:00
|
|
|
struct trace_subsystem_dir *system;
|
tracing: Add basic event trigger framework
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.
'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.
The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.
The event trigger functionality is built on top of SOFT_DISABLE
functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires. Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that. Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function. Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.
The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.
The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.
The standard open, read, and release file operations are implemented
here.
The open() implementation sets up for the various open modes of the
'trigger' file. It creates and attaches the trigger iterator and sets
up the command parser. If opened for reading set up the trigger
seq_ops.
The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.
The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.
A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.
also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.
A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations. They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.
The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event. It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.
Every event_command func() implementation essentially does the
same thing for any command:
- choose ops - use the value of param to choose either a number or
count version of event_trigger_ops specific to the command
- do the register or unregister of those ops
- associate a filter, if specified, with the triggering event
The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized. When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite. The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.
Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.
The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions. This allows func()
implementations to use command-specific blobs and supports code
re-use.
The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked. The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.
This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.
Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 21:59:24 +08:00
|
|
|
struct list_head triggers;
|
2009-04-13 23:20:49 +08:00
|
|
|
|
2010-04-23 23:12:36 +08:00
|
|
|
/*
|
|
|
|
* 32 bit flags:
|
2013-03-13 00:38:06 +08:00
|
|
|
* bit 0: enabled
|
|
|
|
* bit 1: enabled cmd record
|
2013-03-13 01:26:18 +08:00
|
|
|
* bit 2: enable/disable with the soft disable bit
|
|
|
|
* bit 3: soft disabled
|
tracing: Add basic event trigger framework
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.
'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.
The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.
The event trigger functionality is built on top of SOFT_DISABLE
functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires. Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that. Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function. Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.
The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.
The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.
The standard open, read, and release file operations are implemented
here.
The open() implementation sets up for the various open modes of the
'trigger' file. It creates and attaches the trigger iterator and sets
up the command parser. If opened for reading set up the trigger
seq_ops.
The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.
The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.
A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.
also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.
A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations. They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.
The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event. It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.
Every event_command func() implementation essentially does the
same thing for any command:
- choose ops - use the value of param to choose either a number or
count version of event_trigger_ops specific to the command
- do the register or unregister of those ops
- associate a filter, if specified, with the triggering event
The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized. When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite. The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.
Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.
The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions. This allows func()
implementations to use command-specific blobs and supports code
re-use.
The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked. The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.
This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.
Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 21:59:24 +08:00
|
|
|
* bit 4: trigger enabled
|
2010-04-23 23:12:36 +08:00
|
|
|
*
|
2013-03-13 01:26:18 +08:00
|
|
|
* Note: The bits must be set atomically to prevent races
|
|
|
|
* from other writers. Reads of flags do not need to be in
|
|
|
|
* sync as they occur in critical sections. But the way flags
|
2012-05-04 11:09:03 +08:00
|
|
|
* is currently used, these changes do not affect the code
|
2010-05-14 22:19:13 +08:00
|
|
|
* except that when a change is made, it may have a slight
|
|
|
|
* delay in propagating the changes to other CPUs due to
|
2013-03-13 01:26:18 +08:00
|
|
|
* caching and such. Which is mostly OK ;-)
|
2010-04-23 23:12:36 +08:00
|
|
|
*/
|
2013-03-13 01:26:18 +08:00
|
|
|
unsigned long flags;
|
2013-05-09 13:44:29 +08:00
|
|
|
atomic_t sm_ref; /* soft-mode reference counter */
|
tracing: Add basic event trigger framework
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.
'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.
The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.
The event trigger functionality is built on top of SOFT_DISABLE
functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires. Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that. Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function. Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.
The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.
The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.
The standard open, read, and release file operations are implemented
here.
The open() implementation sets up for the various open modes of the
'trigger' file. It creates and attaches the trigger iterator and sets
up the command parser. If opened for reading set up the trigger
seq_ops.
The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.
The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.
A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.
also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.
A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations. They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.
The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event. It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.
Every event_command func() implementation essentially does the
same thing for any command:
- choose ops - use the value of param to choose either a number or
count version of event_trigger_ops specific to the command
- do the register or unregister of those ops
- associate a filter, if specified, with the triggering event
The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized. When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite. The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.
Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.
The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions. This allows func()
implementations to use command-specific blobs and supports code
re-use.
The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked. The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.
This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.
Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 21:59:24 +08:00
|
|
|
atomic_t tm_ref; /* trigger-mode reference counter */
|
2009-04-13 23:20:49 +08:00
|
|
|
};
|
|
|
|
|
2010-11-18 09:11:42 +08:00
|
|
|
#define __TRACE_EVENT_FLAGS(name, value) \
|
|
|
|
static int __init trace_init_flags_##name(void) \
|
|
|
|
{ \
|
2014-04-09 05:26:21 +08:00
|
|
|
event_##name.flags |= value; \
|
2010-11-18 09:11:42 +08:00
|
|
|
return 0; \
|
|
|
|
} \
|
|
|
|
early_initcall(trace_init_flags_##name);
|
|
|
|
|
2013-11-14 23:23:04 +08:00
|
|
|
#define __TRACE_EVENT_PERF_PERM(name, expr...) \
|
2015-05-05 23:45:27 +08:00
|
|
|
static int perf_perm_##name(struct trace_event_call *tp_event, \
|
2013-11-14 23:23:04 +08:00
|
|
|
struct perf_event *p_event) \
|
|
|
|
{ \
|
|
|
|
return ({ expr; }); \
|
|
|
|
} \
|
|
|
|
static int __init trace_init_perf_perm_##name(void) \
|
|
|
|
{ \
|
|
|
|
event_##name.perf_perm = &perf_perm_##name; \
|
|
|
|
return 0; \
|
|
|
|
} \
|
|
|
|
early_initcall(trace_init_perf_perm_##name);
|
|
|
|
|
2010-03-05 12:35:37 +08:00
|
|
|
#define PERF_MAX_TRACE_SIZE 2048
|
2009-09-18 12:10:28 +08:00
|
|
|
|
2009-09-13 07:04:54 +08:00
|
|
|
#define MAX_FILTER_STR_VAL 256 /* Should handle KSYM_SYMBOL_LEN */
|
2009-04-13 23:20:49 +08:00
|
|
|
|
tracing: Add basic event trigger framework
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.
'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.
The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.
The event trigger functionality is built on top of SOFT_DISABLE
functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires. Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that. Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function. Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.
The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.
The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.
The standard open, read, and release file operations are implemented
here.
The open() implementation sets up for the various open modes of the
'trigger' file. It creates and attaches the trigger iterator and sets
up the command parser. If opened for reading set up the trigger
seq_ops.
The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.
The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.
A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.
also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.
A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations. They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.
The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event. It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.
Every event_command func() implementation essentially does the
same thing for any command:
- choose ops - use the value of param to choose either a number or
count version of event_trigger_ops specific to the command
- do the register or unregister of those ops
- associate a filter, if specified, with the triggering event
The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized. When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite. The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.
Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.
The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions. This allows func()
implementations to use command-specific blobs and supports code
re-use.
The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked. The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.
This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.
Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 21:59:24 +08:00
|
|
|
enum event_trigger_type {
|
|
|
|
ETT_NONE = (0),
|
tracing: Add 'traceon' and 'traceoff' event trigger commands
Add 'traceon' and 'traceoff' event_command commands. traceon and
traceoff event triggers are added by the user via these commands in a
similar way and using practically the same syntax as the analagous
'traceon' and 'traceoff' ftrace function commands, but instead of
writing to the set_ftrace_filter file, the traceon and traceoff
triggers are written to the per-event 'trigger' files:
echo 'traceon' > .../tracing/events/somesys/someevent/trigger
echo 'traceoff' > .../tracing/events/somesys/someevent/trigger
The above command will turn tracing on or off whenever someevent is
hit.
This also adds a 'count' version that limits the number of times the
command will be invoked:
echo 'traceon:N' > .../tracing/events/somesys/someevent/trigger
echo 'traceoff:N' > .../tracing/events/somesys/someevent/trigger
Where N is the number of times the command will be invoked.
The above commands will will turn tracing on or off whenever someevent
is hit, but only N times.
Some common register/unregister_trigger() implementations of the
event_command reg()/unreg() callbacks are also provided, which add and
remove trigger instances to the per-event list of triggers, and
arm/disarm them as appropriate. event_trigger_callback() is a
general-purpose event_command func() implementation that orchestrates
command parsing and registration for most normal commands.
Most event commands will use these, but some will override and
possibly reuse them.
The event_trigger_init(), event_trigger_free(), and
event_trigger_print() functions are meant to be common implementations
of the event_trigger_ops init(), free(), and print() ops,
respectively.
Most trigger_ops implementations will use these, but some will
override and possibly reuse them.
Link: http://lkml.kernel.org/r/00a52816703b98d2072947478dd6e2d70cde5197.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 21:59:25 +08:00
|
|
|
ETT_TRACE_ONOFF = (1 << 0),
|
2013-10-24 21:59:26 +08:00
|
|
|
ETT_SNAPSHOT = (1 << 1),
|
2013-10-24 21:59:27 +08:00
|
|
|
ETT_STACKTRACE = (1 << 2),
|
2013-10-24 21:59:28 +08:00
|
|
|
ETT_EVENT_ENABLE = (1 << 3),
|
tracing: Add 'hist' event trigger command
'hist' triggers allow users to continually aggregate trace events,
which can then be viewed afterwards by simply reading a 'hist' file
containing the aggregation in a human-readable format.
The basic idea is very simple and boils down to a mechanism whereby
trace events, rather than being exhaustively dumped in raw form and
viewed directly, are automatically 'compressed' into meaningful tables
completely defined by the user.
This is done strictly via single-line command-line commands and
without the aid of any kind of programming language or interpreter.
A surprising number of typical use cases can be accomplished by users
via this simple mechanism. In fact, a large number of the tasks that
users typically do using the more complicated script-based tracing
tools, at least during the initial stages of an investigation, can be
accomplished by simply specifying a set of keys and values to be used
in the creation of a hash table.
The Linux kernel trace event subsystem happens to provide an extensive
list of keys and values ready-made for such a purpose in the form of
the event format files associated with each trace event. By simply
consulting the format file for field names of interest and by plugging
them into the hist trigger command, users can create an endless number
of useful aggregations to help with investigating various properties
of the system. See Documentation/trace/events.txt for examples.
hist triggers are implemented on top of the existing event trigger
infrastructure, and as such are consistent with the existing triggers
from a user's perspective as well.
The basic syntax follows the existing trigger syntax. Users start an
aggregation by writing a 'hist' trigger to the event of interest's
trigger file:
# echo hist:keys=xxx [ if filter] > event/trigger
Once a hist trigger has been set up, by default it continually
aggregates every matching event into a hash table using the event key
and a value field named 'hitcount'.
To view the aggregation at any point in time, simply read the 'hist'
file in the same directory as the 'trigger' file:
# cat event/hist
The detailed syntax provides additional options for user control, and
is described exhaustively in Documentation/trace/events.txt and in the
virtual tracing/README file in the tracing subsystem.
Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-03-04 02:54:42 +08:00
|
|
|
ETT_EVENT_HIST = (1 << 4),
|
2016-03-04 02:54:55 +08:00
|
|
|
ETT_HIST_ENABLE = (1 << 5),
|
tracing: Add basic event trigger framework
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.
'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.
The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.
The event trigger functionality is built on top of SOFT_DISABLE
functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires. Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that. Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function. Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.
The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.
The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.
The standard open, read, and release file operations are implemented
here.
The open() implementation sets up for the various open modes of the
'trigger' file. It creates and attaches the trigger iterator and sets
up the command parser. If opened for reading set up the trigger
seq_ops.
The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.
The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.
A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.
also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.
A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations. They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.
The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event. It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.
Every event_command func() implementation essentially does the
same thing for any command:
- choose ops - use the value of param to choose either a number or
count version of event_trigger_ops specific to the command
- do the register or unregister of those ops
- associate a filter, if specified, with the triggering event
The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized. When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite. The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.
Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.
The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions. This allows func()
implementations to use command-specific blobs and supports code
re-use.
The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked. The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.
This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.
Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-24 21:59:24 +08:00
|
|
|
};
|
|
|
|
|
2009-10-15 11:21:42 +08:00
|
|
|
extern int filter_match_preds(struct event_filter *filter, void *rec);
|
2013-10-24 21:34:17 +08:00
|
|
|
|
2018-01-16 10:51:42 +08:00
|
|
|
extern enum event_trigger_type
|
|
|
|
event_triggers_call(struct trace_event_file *file, void *rec,
|
|
|
|
struct ring_buffer_event *event);
|
|
|
|
extern void
|
|
|
|
event_triggers_post_call(struct trace_event_file *file,
|
2018-05-08 04:02:14 +08:00
|
|
|
enum event_trigger_type tt);
|
2009-04-13 23:20:49 +08:00
|
|
|
|
2015-09-26 00:58:44 +08:00
|
|
|
bool trace_event_ignore_this_pid(struct trace_event_file *trace_file);
|
|
|
|
|
2014-01-07 10:32:10 +08:00
|
|
|
/**
|
2015-05-14 03:21:25 +08:00
|
|
|
* trace_trigger_soft_disabled - do triggers and test if soft disabled
|
2014-01-07 10:32:10 +08:00
|
|
|
* @file: The file pointer of the event to test
|
|
|
|
*
|
|
|
|
* If any triggers without filters are attached to this event, they
|
|
|
|
* will be called here. If the event is soft disabled and has no
|
|
|
|
* triggers that require testing the fields, it will return true,
|
|
|
|
* otherwise false.
|
|
|
|
*/
|
|
|
|
static inline bool
|
2015-05-14 03:21:25 +08:00
|
|
|
trace_trigger_soft_disabled(struct trace_event_file *file)
|
2014-01-07 10:32:10 +08:00
|
|
|
{
|
|
|
|
unsigned long eflags = file->flags;
|
|
|
|
|
2015-05-14 03:12:33 +08:00
|
|
|
if (!(eflags & EVENT_FILE_FL_TRIGGER_COND)) {
|
|
|
|
if (eflags & EVENT_FILE_FL_TRIGGER_MODE)
|
2018-01-16 10:51:42 +08:00
|
|
|
event_triggers_call(file, NULL, NULL);
|
2015-05-14 03:12:33 +08:00
|
|
|
if (eflags & EVENT_FILE_FL_SOFT_DISABLED)
|
2014-01-07 10:32:10 +08:00
|
|
|
return true;
|
2015-09-26 00:58:44 +08:00
|
|
|
if (eflags & EVENT_FILE_FL_PID_FILTER)
|
|
|
|
return trace_event_ignore_this_pid(file);
|
2014-01-07 10:32:10 +08:00
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2015-07-01 10:13:49 +08:00
|
|
|
#ifdef CONFIG_BPF_EVENTS
|
2017-10-24 14:53:08 +08:00
|
|
|
unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
|
|
|
|
int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
|
|
|
|
void perf_event_detach_bpf_prog(struct perf_event *event);
|
2017-12-14 02:35:37 +08:00
|
|
|
int perf_event_query_prog_array(struct perf_event *event, void __user *info);
|
2018-03-29 03:05:37 +08:00
|
|
|
int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
|
|
|
|
int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
|
|
|
|
struct bpf_raw_event_map *bpf_find_raw_tracepoint(const char *name);
|
bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
Currently, suppose a userspace application has loaded a bpf program
and attached it to a tracepoint/kprobe/uprobe, and a bpf
introspection tool, e.g., bpftool, wants to show which bpf program
is attached to which tracepoint/kprobe/uprobe. Such attachment
information will be really useful to understand the overall bpf
deployment in the system.
There is a name field (16 bytes) for each program, which could
be used to encode the attachment point. There are some drawbacks
for this approaches. First, bpftool user (e.g., an admin) may not
really understand the association between the name and the
attachment point. Second, if one program is attached to multiple
places, encoding a proper name which can imply all these
attachments becomes difficult.
This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
Given a pid and fd, if the <pid, fd> is associated with a
tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
. prog_id
. tracepoint name, or
. k[ret]probe funcname + offset or kernel addr, or
. u[ret]probe filename + offset
to the userspace.
The user can use "bpftool prog" to find more information about
bpf program itself with prog_id.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-25 02:21:09 +08:00
|
|
|
int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
|
|
|
|
u32 *fd_type, const char **buf,
|
|
|
|
u64 *probe_offset, u64 *probe_addr);
|
tracing, perf: Implement BPF programs attached to kprobes
BPF programs, attached to kprobes, provide a safe way to execute
user-defined BPF byte-code programs without being able to crash or
hang the kernel in any way. The BPF engine makes sure that such
programs have a finite execution time and that they cannot break
out of their sandbox.
The user interface is to attach to a kprobe via the perf syscall:
struct perf_event_attr attr = {
.type = PERF_TYPE_TRACEPOINT,
.config = event_id,
...
};
event_fd = perf_event_open(&attr,...);
ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
'prog_fd' is a file descriptor associated with BPF program
previously loaded.
'event_id' is an ID of the kprobe created.
Closing 'event_fd':
close(event_fd);
... automatically detaches BPF program from it.
BPF programs can call in-kernel helper functions to:
- lookup/update/delete elements in maps
- probe_read - wraper of probe_kernel_read() used to access any
kernel data structures
BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
architecture dependent) and return 0 to ignore the event and 1 to store
kprobe event into the ring buffer.
Note, kprobes are a fundamentally _not_ a stable kernel ABI,
so BPF programs attached to kprobes must be recompiled for
every kernel version and user must supply correct LINUX_VERSION_CODE
in attr.kern_version during bpf_prog_load() call.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-26 03:49:20 +08:00
|
|
|
#else
|
2017-10-24 14:53:08 +08:00
|
|
|
static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
|
tracing, perf: Implement BPF programs attached to kprobes
BPF programs, attached to kprobes, provide a safe way to execute
user-defined BPF byte-code programs without being able to crash or
hang the kernel in any way. The BPF engine makes sure that such
programs have a finite execution time and that they cannot break
out of their sandbox.
The user interface is to attach to a kprobe via the perf syscall:
struct perf_event_attr attr = {
.type = PERF_TYPE_TRACEPOINT,
.config = event_id,
...
};
event_fd = perf_event_open(&attr,...);
ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
'prog_fd' is a file descriptor associated with BPF program
previously loaded.
'event_id' is an ID of the kprobe created.
Closing 'event_fd':
close(event_fd);
... automatically detaches BPF program from it.
BPF programs can call in-kernel helper functions to:
- lookup/update/delete elements in maps
- probe_read - wraper of probe_kernel_read() used to access any
kernel data structures
BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
architecture dependent) and return 0 to ignore the event and 1 to store
kprobe event into the ring buffer.
Note, kprobes are a fundamentally _not_ a stable kernel ABI,
so BPF programs attached to kprobes must be recompiled for
every kernel version and user must supply correct LINUX_VERSION_CODE
in attr.kern_version during bpf_prog_load() call.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-26 03:49:20 +08:00
|
|
|
{
|
|
|
|
return 1;
|
|
|
|
}
|
2017-10-24 14:53:08 +08:00
|
|
|
|
|
|
|
static inline int
|
|
|
|
perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void perf_event_detach_bpf_prog(struct perf_event *event) { }
|
|
|
|
|
2017-12-14 02:35:37 +08:00
|
|
|
static inline int
|
|
|
|
perf_event_query_prog_array(struct perf_event *event, void __user *info)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
2018-03-29 03:05:37 +08:00
|
|
|
static inline int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *p)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *p)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
static inline struct bpf_raw_event_map *bpf_find_raw_tracepoint(const char *name)
|
|
|
|
{
|
|
|
|
return NULL;
|
|
|
|
}
|
bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
Currently, suppose a userspace application has loaded a bpf program
and attached it to a tracepoint/kprobe/uprobe, and a bpf
introspection tool, e.g., bpftool, wants to show which bpf program
is attached to which tracepoint/kprobe/uprobe. Such attachment
information will be really useful to understand the overall bpf
deployment in the system.
There is a name field (16 bytes) for each program, which could
be used to encode the attachment point. There are some drawbacks
for this approaches. First, bpftool user (e.g., an admin) may not
really understand the association between the name and the
attachment point. Second, if one program is attached to multiple
places, encoding a proper name which can imply all these
attachments becomes difficult.
This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
Given a pid and fd, if the <pid, fd> is associated with a
tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
. prog_id
. tracepoint name, or
. k[ret]probe funcname + offset or kernel addr, or
. u[ret]probe filename + offset
to the userspace.
The user can use "bpftool prog" to find more information about
bpf program itself with prog_id.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-25 02:21:09 +08:00
|
|
|
static inline int bpf_get_perf_event_info(const struct perf_event *event,
|
|
|
|
u32 *prog_id, u32 *fd_type,
|
|
|
|
const char **buf, u64 *probe_offset,
|
|
|
|
u64 *probe_addr)
|
|
|
|
{
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
tracing, perf: Implement BPF programs attached to kprobes
BPF programs, attached to kprobes, provide a safe way to execute
user-defined BPF byte-code programs without being able to crash or
hang the kernel in any way. The BPF engine makes sure that such
programs have a finite execution time and that they cannot break
out of their sandbox.
The user interface is to attach to a kprobe via the perf syscall:
struct perf_event_attr attr = {
.type = PERF_TYPE_TRACEPOINT,
.config = event_id,
...
};
event_fd = perf_event_open(&attr,...);
ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
'prog_fd' is a file descriptor associated with BPF program
previously loaded.
'event_id' is an ID of the kprobe created.
Closing 'event_fd':
close(event_fd);
... automatically detaches BPF program from it.
BPF programs can call in-kernel helper functions to:
- lookup/update/delete elements in maps
- probe_read - wraper of probe_kernel_read() used to access any
kernel data structures
BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
architecture dependent) and return 0 to ignore the event and 1 to store
kprobe event into the ring buffer.
Note, kprobes are a fundamentally _not_ a stable kernel ABI,
so BPF programs attached to kprobes must be recompiled for
every kernel version and user must supply correct LINUX_VERSION_CODE
in attr.kern_version during bpf_prog_load() call.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-26 03:49:20 +08:00
|
|
|
#endif
|
|
|
|
|
2009-08-07 10:33:22 +08:00
|
|
|
enum {
|
|
|
|
FILTER_OTHER = 0,
|
|
|
|
FILTER_STATIC_STRING,
|
|
|
|
FILTER_DYN_STRING,
|
2009-08-07 10:33:43 +08:00
|
|
|
FILTER_PTR_STRING,
|
2012-02-15 22:51:53 +08:00
|
|
|
FILTER_TRACE_FN,
|
2016-03-04 06:18:20 +08:00
|
|
|
FILTER_COMM,
|
|
|
|
FILTER_CPU,
|
2009-08-07 10:33:22 +08:00
|
|
|
};
|
|
|
|
|
2015-05-05 23:45:27 +08:00
|
|
|
extern int trace_event_raw_init(struct trace_event_call *call);
|
|
|
|
extern int trace_define_field(struct trace_event_call *call, const char *type,
|
2009-08-27 11:09:51 +08:00
|
|
|
const char *name, int offset, int size,
|
|
|
|
int is_signed, int filter_type);
|
2015-05-05 23:45:27 +08:00
|
|
|
extern int trace_add_event_call(struct trace_event_call *call);
|
|
|
|
extern int trace_remove_event_call(struct trace_event_call *call);
|
2016-04-07 09:43:28 +08:00
|
|
|
extern int trace_event_get_offsets(struct trace_event_call *call);
|
2009-04-13 23:20:49 +08:00
|
|
|
|
2013-04-20 05:10:27 +08:00
|
|
|
#define is_signed_type(type) (((type)(-1)) < (type)1)
|
2009-04-13 23:20:49 +08:00
|
|
|
|
2009-05-09 04:27:41 +08:00
|
|
|
int trace_set_clr_event(const char *system, const char *event, int set);
|
|
|
|
|
2009-04-13 23:20:49 +08:00
|
|
|
/*
|
|
|
|
* The double __builtin_constant_p is because gcc will give us an error
|
|
|
|
* if we try to allocate the static variable to fmt if it is not a
|
|
|
|
* constant. Even with the outer if statement optimizing out.
|
|
|
|
*/
|
|
|
|
#define event_trace_printk(ip, fmt, args...) \
|
|
|
|
do { \
|
|
|
|
__trace_printk_check_format(fmt, ##args); \
|
|
|
|
tracing_record_cmdline(current); \
|
|
|
|
if (__builtin_constant_p(fmt)) { \
|
|
|
|
static const char *trace_printk_fmt \
|
|
|
|
__attribute__((section("__trace_printk_fmt"))) = \
|
|
|
|
__builtin_constant_p(fmt) ? fmt : NULL; \
|
|
|
|
\
|
|
|
|
__trace_bprintk(ip, trace_printk_fmt, ##args); \
|
|
|
|
} else \
|
|
|
|
__trace_printk(ip, fmt, ##args); \
|
|
|
|
} while (0)
|
|
|
|
|
2009-12-21 14:27:35 +08:00
|
|
|
#ifdef CONFIG_PERF_EVENTS
|
2009-10-15 11:21:42 +08:00
|
|
|
struct perf_event;
|
2010-03-03 14:16:16 +08:00
|
|
|
|
|
|
|
DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
|
2017-12-12 00:36:48 +08:00
|
|
|
DECLARE_PER_CPU(int, bpf_kprobe_override);
|
2010-03-03 14:16:16 +08:00
|
|
|
|
2010-05-19 20:02:22 +08:00
|
|
|
extern int perf_trace_init(struct perf_event *event);
|
|
|
|
extern void perf_trace_destroy(struct perf_event *event);
|
perf: Rework the PMU methods
Replace pmu::{enable,disable,start,stop,unthrottle} with
pmu::{add,del,start,stop}, all of which take a flags argument.
The new interface extends the capability to stop a counter while
keeping it scheduled on the PMU. We replace the throttled state with
the generic stopped state.
This also allows us to efficiently stop/start counters over certain
code paths (like IRQ handlers).
It also allows scheduling a counter without it starting, allowing for
a generic frozen state (useful for rotating stopped counters).
The stopped state is implemented in two different ways, depending on
how the architecture implemented the throttled state:
1) We disable the counter:
a) the pmu has per-counter enable bits, we flip that
b) we program a NOP event, preserving the counter state
2) We store the counter state and ignore all read/overflow events
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-16 20:37:10 +08:00
|
|
|
extern int perf_trace_add(struct perf_event *event, int flags);
|
|
|
|
extern void perf_trace_del(struct perf_event *event, int flags);
|
2017-12-07 06:45:15 +08:00
|
|
|
#ifdef CONFIG_KPROBE_EVENTS
|
|
|
|
extern int perf_kprobe_init(struct perf_event *event, bool is_retprobe);
|
|
|
|
extern void perf_kprobe_destroy(struct perf_event *event);
|
bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
Currently, suppose a userspace application has loaded a bpf program
and attached it to a tracepoint/kprobe/uprobe, and a bpf
introspection tool, e.g., bpftool, wants to show which bpf program
is attached to which tracepoint/kprobe/uprobe. Such attachment
information will be really useful to understand the overall bpf
deployment in the system.
There is a name field (16 bytes) for each program, which could
be used to encode the attachment point. There are some drawbacks
for this approaches. First, bpftool user (e.g., an admin) may not
really understand the association between the name and the
attachment point. Second, if one program is attached to multiple
places, encoding a proper name which can imply all these
attachments becomes difficult.
This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
Given a pid and fd, if the <pid, fd> is associated with a
tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
. prog_id
. tracepoint name, or
. k[ret]probe funcname + offset or kernel addr, or
. u[ret]probe filename + offset
to the userspace.
The user can use "bpftool prog" to find more information about
bpf program itself with prog_id.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-25 02:21:09 +08:00
|
|
|
extern int bpf_get_kprobe_info(const struct perf_event *event,
|
|
|
|
u32 *fd_type, const char **symbol,
|
|
|
|
u64 *probe_offset, u64 *probe_addr,
|
|
|
|
bool perf_type_tracepoint);
|
2017-12-07 06:45:15 +08:00
|
|
|
#endif
|
2017-12-07 06:45:16 +08:00
|
|
|
#ifdef CONFIG_UPROBE_EVENTS
|
|
|
|
extern int perf_uprobe_init(struct perf_event *event, bool is_retprobe);
|
|
|
|
extern void perf_uprobe_destroy(struct perf_event *event);
|
bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
Currently, suppose a userspace application has loaded a bpf program
and attached it to a tracepoint/kprobe/uprobe, and a bpf
introspection tool, e.g., bpftool, wants to show which bpf program
is attached to which tracepoint/kprobe/uprobe. Such attachment
information will be really useful to understand the overall bpf
deployment in the system.
There is a name field (16 bytes) for each program, which could
be used to encode the attachment point. There are some drawbacks
for this approaches. First, bpftool user (e.g., an admin) may not
really understand the association between the name and the
attachment point. Second, if one program is attached to multiple
places, encoding a proper name which can imply all these
attachments becomes difficult.
This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
Given a pid and fd, if the <pid, fd> is associated with a
tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
. prog_id
. tracepoint name, or
. k[ret]probe funcname + offset or kernel addr, or
. u[ret]probe filename + offset
to the userspace.
The user can use "bpftool prog" to find more information about
bpf program itself with prog_id.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-25 02:21:09 +08:00
|
|
|
extern int bpf_get_uprobe_info(const struct perf_event *event,
|
|
|
|
u32 *fd_type, const char **filename,
|
|
|
|
u64 *probe_offset, bool perf_type_tracepoint);
|
2017-12-07 06:45:16 +08:00
|
|
|
#endif
|
2010-05-19 20:02:22 +08:00
|
|
|
extern int ftrace_profile_set_filter(struct perf_event *event, int event_id,
|
2009-10-15 11:21:42 +08:00
|
|
|
char *filter_str);
|
|
|
|
extern void ftrace_profile_free_filter(struct perf_event *event);
|
2016-04-07 09:43:24 +08:00
|
|
|
void perf_trace_buf_update(void *record, u16 type);
|
|
|
|
void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
|
2010-01-28 09:32:29 +08:00
|
|
|
|
2018-03-29 03:05:37 +08:00
|
|
|
void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
|
|
|
|
void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
|
|
|
|
void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
|
|
|
|
u64 arg3);
|
|
|
|
void bpf_trace_run4(struct bpf_prog *prog, u64 arg1, u64 arg2,
|
|
|
|
u64 arg3, u64 arg4);
|
|
|
|
void bpf_trace_run5(struct bpf_prog *prog, u64 arg1, u64 arg2,
|
|
|
|
u64 arg3, u64 arg4, u64 arg5);
|
|
|
|
void bpf_trace_run6(struct bpf_prog *prog, u64 arg1, u64 arg2,
|
|
|
|
u64 arg3, u64 arg4, u64 arg5, u64 arg6);
|
|
|
|
void bpf_trace_run7(struct bpf_prog *prog, u64 arg1, u64 arg2,
|
|
|
|
u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7);
|
|
|
|
void bpf_trace_run8(struct bpf_prog *prog, u64 arg1, u64 arg2,
|
|
|
|
u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
|
|
|
|
u64 arg8);
|
|
|
|
void bpf_trace_run9(struct bpf_prog *prog, u64 arg1, u64 arg2,
|
|
|
|
u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
|
|
|
|
u64 arg8, u64 arg9);
|
|
|
|
void bpf_trace_run10(struct bpf_prog *prog, u64 arg1, u64 arg2,
|
|
|
|
u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
|
|
|
|
u64 arg8, u64 arg9, u64 arg10);
|
|
|
|
void bpf_trace_run11(struct bpf_prog *prog, u64 arg1, u64 arg2,
|
|
|
|
u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
|
|
|
|
u64 arg8, u64 arg9, u64 arg10, u64 arg11);
|
|
|
|
void bpf_trace_run12(struct bpf_prog *prog, u64 arg1, u64 arg2,
|
|
|
|
u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
|
|
|
|
u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12);
|
2016-04-19 11:11:50 +08:00
|
|
|
void perf_trace_run_bpf_submit(void *raw_data, int size, int rctx,
|
|
|
|
struct trace_event_call *call, u64 count,
|
|
|
|
struct pt_regs *regs, struct hlist_head *head,
|
|
|
|
struct task_struct *task);
|
|
|
|
|
2010-01-28 09:32:29 +08:00
|
|
|
static inline void
|
2016-04-07 09:43:24 +08:00
|
|
|
perf_trace_buf_submit(void *raw_data, int size, int rctx, u16 type,
|
2012-07-11 22:14:58 +08:00
|
|
|
u64 count, struct pt_regs *regs, void *head,
|
2017-10-11 15:45:29 +08:00
|
|
|
struct task_struct *task)
|
2010-01-28 09:32:29 +08:00
|
|
|
{
|
2017-10-11 15:45:29 +08:00
|
|
|
perf_tp_event(type, count, raw_data, size, regs, head, rctx, task);
|
2010-01-28 09:32:29 +08:00
|
|
|
}
|
2017-10-24 14:53:08 +08:00
|
|
|
|
2009-10-15 11:21:42 +08:00
|
|
|
#endif
|
|
|
|
|
2015-05-05 23:45:27 +08:00
|
|
|
#endif /* _LINUX_TRACE_EVENT_H */
|