2008-05-13 03:20:42 +08:00
|
|
|
#
|
2008-10-07 07:06:12 +08:00
|
|
|
# Architectures that offer an FUNCTION_TRACER implementation should
|
|
|
|
# select HAVE_FUNCTION_TRACER:
|
2008-05-13 03:20:42 +08:00
|
|
|
#
|
2008-09-22 02:12:14 +08:00
|
|
|
|
2008-11-23 18:39:08 +08:00
|
|
|
config USER_STACKTRACE_SUPPORT
|
|
|
|
bool
|
|
|
|
|
2008-09-22 02:12:14 +08:00
|
|
|
config NOP_TRACER
|
|
|
|
bool
|
|
|
|
|
2008-10-07 07:06:12 +08:00
|
|
|
config HAVE_FUNCTION_TRACER
|
2008-05-13 03:20:42 +08:00
|
|
|
bool
|
2008-05-13 03:20:42 +08:00
|
|
|
|
2008-11-26 04:07:04 +08:00
|
|
|
config HAVE_FUNCTION_GRAPH_TRACER
|
2008-11-11 14:14:25 +08:00
|
|
|
bool
|
|
|
|
|
2008-11-06 05:05:44 +08:00
|
|
|
config HAVE_FUNCTION_TRACE_MCOUNT_TEST
|
|
|
|
bool
|
|
|
|
help
|
|
|
|
This gets selected when the arch tests the function_trace_stop
|
|
|
|
variable at the mcount call site. Otherwise, this variable
|
|
|
|
is tested by the called function.
|
|
|
|
|
2008-05-17 12:01:36 +08:00
|
|
|
config HAVE_DYNAMIC_FTRACE
|
|
|
|
bool
|
|
|
|
|
ftrace: create __mcount_loc section
This patch creates a section in the kernel called "__mcount_loc".
This will hold a list of pointers to the mcount relocation for
each call site of mcount.
For example:
objdump -dr init/main.o
[...]
Disassembly of section .text:
0000000000000000 <do_one_initcall>:
0: 55 push %rbp
[...]
000000000000017b <init_post>:
17b: 55 push %rbp
17c: 48 89 e5 mov %rsp,%rbp
17f: 53 push %rbx
180: 48 83 ec 08 sub $0x8,%rsp
184: e8 00 00 00 00 callq 189 <init_post+0xe>
185: R_X86_64_PC32 mcount+0xfffffffffffffffc
[...]
We will add a section to point to each function call.
.section __mcount_loc,"a",@progbits
[...]
.quad .text + 0x185
[...]
The offset to of the mcount call site in init_post is an offset from
the start of the section, and not the start of the function init_post.
The mcount relocation is at the call site 0x185 from the start of the
.text section.
.text + 0x185 == init_post + 0xa
We need a way to add this __mcount_loc section in a way that we do not
lose the relocations after final link. The .text section here will
be attached to all other .text sections after final link and the
offsets will be meaningless. We need to keep track of where these
.text sections are.
To do this, we use the start of the first function in the section.
do_one_initcall. We can make a tmp.s file with this function as a reference
to the start of the .text section.
.section __mcount_loc,"a",@progbits
[...]
.quad do_one_initcall + 0x185
[...]
Then we can compile the tmp.s into a tmp.o
gcc -c tmp.s -o tmp.o
And link it into back into main.o.
ld -r main.o tmp.o -o tmp_main.o
mv tmp_main.o main.o
But we have a problem. What happens if the first function in a section
is not exported, and is a static function. The linker will not let
the tmp.o use it. This case exists in main.o as well.
Disassembly of section .init.text:
0000000000000000 <set_reset_devices>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: e8 00 00 00 00 callq 9 <set_reset_devices+0x9>
5: R_X86_64_PC32 mcount+0xfffffffffffffffc
The first function in .init.text is a static function.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
The lowercase 't' means that set_reset_devices is local and is not exported.
If we simply try to link the tmp.o with the set_reset_devices we end
up with two symbols: one local and one global.
.section __mcount_loc,"a",@progbits
.quad set_reset_devices + 0x10
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
U set_reset_devices
We still have an undefined reference to set_reset_devices, and if we try
to compile the kernel, we will end up with an undefined reference to
set_reset_devices, or even worst, it could be exported someplace else,
and then we will have a reference to the wrong location.
To handle this case, we make an intermediate step using objcopy.
We convert set_reset_devices into a global exported symbol before linking
it with tmp.o and set it back afterwards.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
Now we have a section in main.o called __mcount_loc that we can place
somewhere in the kernel using vmlinux.ld.S and access it to convert
all these locations that call mcount into nops before starting SMP
and thus, eliminating the need to do this with kstop_machine.
Note, A well documented perl script (scripts/recordmcount.pl) is used
to do all this in one location.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-15 03:45:07 +08:00
|
|
|
config HAVE_FTRACE_MCOUNT_RECORD
|
|
|
|
bool
|
|
|
|
|
2008-11-25 16:24:15 +08:00
|
|
|
config HAVE_HW_BRANCH_TRACER
|
|
|
|
bool
|
|
|
|
|
2008-05-13 03:20:42 +08:00
|
|
|
config TRACER_MAX_TRACE
|
|
|
|
bool
|
|
|
|
|
tracing: unified trace buffer
This is a unified tracing buffer that implements a ring buffer that
hopefully everyone will eventually be able to use.
The events recorded into the buffer have the following structure:
struct ring_buffer_event {
u32 type:2, len:3, time_delta:27;
u32 array[];
};
The minimum size of an event is 8 bytes. All events are 4 byte
aligned inside the buffer.
There are 4 types (all internal use for the ring buffer, only
the data type is exported to the interface users).
RINGBUF_TYPE_PADDING: this type is used to note extra space at the end
of a buffer page.
RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events
is greater than the 27 bit delta can hold. We add another
32 bits, and record that in its own event (8 byte size).
RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to
help keep the buffer timestamps in sync.
RINGBUF_TYPE_DATA: The event actually holds user data.
The "len" field is only three bits. Since the data must be
4 byte aligned, this field is shifted left by 2, giving a
max length of 28 bytes. If the data load is greater than 28
bytes, the first array field holds the full length of the
data load and the len field is set to zero.
Example, data size of 7 bytes:
type = RINGBUF_TYPE_DATA
len = 2
time_delta: <time-stamp> - <prev_event-time-stamp>
array[0..1]: <7 bytes of data> <1 byte empty>
This event is saved in 12 bytes of the buffer.
An event with 82 bytes of data:
type = RINGBUF_TYPE_DATA
len = 0
time_delta: <time-stamp> - <prev_event-time-stamp>
array[0]: 84 (Note the alignment)
array[1..14]: <82 bytes of data> <2 bytes empty>
The above event is saved in 92 bytes (if my math is correct).
82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length.
Do not reference the above event struct directly. Use the following
functions to gain access to the event table, since the
ring_buffer_event structure may change in the future.
ring_buffer_event_length(event): get the length of the event.
This is the size of the memory used to record this
event, and not the size of the data pay load.
ring_buffer_time_delta(event): get the time delta of the event
This returns the delta time stamp since the last event.
Note: Even though this is in the header, there should
be no reason to access this directly, accept
for debugging.
ring_buffer_event_data(event): get the data from the event
This is the function to use to get the actual data
from the event. Note, it is only a pointer to the
data inside the buffer. This data must be copied to
another location otherwise you risk it being written
over in the buffer.
ring_buffer_lock: A way to lock the entire buffer.
ring_buffer_unlock: unlock the buffer.
ring_buffer_alloc: create a new ring buffer. Can choose between
overwrite or consumer/producer mode. Overwrite will
overwrite old data, where as consumer producer will
throw away new data if the consumer catches up with the
producer. The consumer/producer is the default.
ring_buffer_free: free the ring buffer.
ring_buffer_resize: resize the buffer. Changes the size of each cpu
buffer. Note, it is up to the caller to provide that
the buffer is not being used while this is happening.
This requirement may go away but do not count on it.
ring_buffer_lock_reserve: locks the ring buffer and allocates an
entry on the buffer to write to.
ring_buffer_unlock_commit: unlocks the ring buffer and commits it to
the buffer.
ring_buffer_write: writes some data into the ring buffer.
ring_buffer_peek: Look at a next item in the cpu buffer.
ring_buffer_consume: get the next item in the cpu buffer and
consume it. That is, this function increments the head
pointer.
ring_buffer_read_start: Start an iterator of a cpu buffer.
For now, this disables the cpu buffer, until you issue
a finish. This is just because we do not want the iterator
to be overwritten. This restriction may change in the future.
But note, this is used for static reading of a buffer which
is usually done "after" a trace. Live readings would want
to use the ring_buffer_consume above, which will not
disable the ring buffer.
ring_buffer_read_finish: Finishes the read iterator and reenables
the ring buffer.
ring_buffer_iter_peek: Look at the next item in the cpu iterator.
ring_buffer_read: Read the iterator and increment it.
ring_buffer_iter_reset: Reset the iterator to point to the beginning
of the cpu buffer.
ring_buffer_iter_empty: Returns true if the iterator is at the end
of the cpu buffer.
ring_buffer_size: returns the size in bytes of each cpu buffer.
Note, the real size is this times the number of CPUs.
ring_buffer_reset_cpu: Sets the cpu buffer to empty
ring_buffer_reset: sets all cpu buffers to empty
ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a
cpu buffer of another buffer. This is handy when you
want to take a snap shot of a running trace on just one
cpu. Having a backup buffer, to swap with facilitates this.
Ftrace max latencies use this.
ring_buffer_empty: Returns true if the ring buffer is empty.
ring_buffer_empty_cpu: Returns true if the cpu buffer is empty.
ring_buffer_record_disable: disable all cpu buffers (read only)
ring_buffer_record_disable_cpu: disable a single cpu buffer (read only)
ring_buffer_record_enable: enable all cpu buffers.
ring_buffer_record_enabl_cpu: enable a single cpu buffer.
ring_buffer_entries: The number of entries in a ring buffer.
ring_buffer_overruns: The number of entries removed due to writing wrap.
ring_buffer_time_stamp: Get the time stamp used by the ring buffer
ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp
into nanosecs.
I still need to implement the GTOD feature. But we need support from
the cpu frequency infrastructure. But this can be done at a later
time without affecting the ring buffer interface.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-30 11:02:38 +08:00
|
|
|
config RING_BUFFER
|
|
|
|
bool
|
|
|
|
|
2008-05-13 03:20:42 +08:00
|
|
|
config TRACING
|
|
|
|
bool
|
|
|
|
select DEBUG_FS
|
tracing: unified trace buffer
This is a unified tracing buffer that implements a ring buffer that
hopefully everyone will eventually be able to use.
The events recorded into the buffer have the following structure:
struct ring_buffer_event {
u32 type:2, len:3, time_delta:27;
u32 array[];
};
The minimum size of an event is 8 bytes. All events are 4 byte
aligned inside the buffer.
There are 4 types (all internal use for the ring buffer, only
the data type is exported to the interface users).
RINGBUF_TYPE_PADDING: this type is used to note extra space at the end
of a buffer page.
RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events
is greater than the 27 bit delta can hold. We add another
32 bits, and record that in its own event (8 byte size).
RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to
help keep the buffer timestamps in sync.
RINGBUF_TYPE_DATA: The event actually holds user data.
The "len" field is only three bits. Since the data must be
4 byte aligned, this field is shifted left by 2, giving a
max length of 28 bytes. If the data load is greater than 28
bytes, the first array field holds the full length of the
data load and the len field is set to zero.
Example, data size of 7 bytes:
type = RINGBUF_TYPE_DATA
len = 2
time_delta: <time-stamp> - <prev_event-time-stamp>
array[0..1]: <7 bytes of data> <1 byte empty>
This event is saved in 12 bytes of the buffer.
An event with 82 bytes of data:
type = RINGBUF_TYPE_DATA
len = 0
time_delta: <time-stamp> - <prev_event-time-stamp>
array[0]: 84 (Note the alignment)
array[1..14]: <82 bytes of data> <2 bytes empty>
The above event is saved in 92 bytes (if my math is correct).
82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length.
Do not reference the above event struct directly. Use the following
functions to gain access to the event table, since the
ring_buffer_event structure may change in the future.
ring_buffer_event_length(event): get the length of the event.
This is the size of the memory used to record this
event, and not the size of the data pay load.
ring_buffer_time_delta(event): get the time delta of the event
This returns the delta time stamp since the last event.
Note: Even though this is in the header, there should
be no reason to access this directly, accept
for debugging.
ring_buffer_event_data(event): get the data from the event
This is the function to use to get the actual data
from the event. Note, it is only a pointer to the
data inside the buffer. This data must be copied to
another location otherwise you risk it being written
over in the buffer.
ring_buffer_lock: A way to lock the entire buffer.
ring_buffer_unlock: unlock the buffer.
ring_buffer_alloc: create a new ring buffer. Can choose between
overwrite or consumer/producer mode. Overwrite will
overwrite old data, where as consumer producer will
throw away new data if the consumer catches up with the
producer. The consumer/producer is the default.
ring_buffer_free: free the ring buffer.
ring_buffer_resize: resize the buffer. Changes the size of each cpu
buffer. Note, it is up to the caller to provide that
the buffer is not being used while this is happening.
This requirement may go away but do not count on it.
ring_buffer_lock_reserve: locks the ring buffer and allocates an
entry on the buffer to write to.
ring_buffer_unlock_commit: unlocks the ring buffer and commits it to
the buffer.
ring_buffer_write: writes some data into the ring buffer.
ring_buffer_peek: Look at a next item in the cpu buffer.
ring_buffer_consume: get the next item in the cpu buffer and
consume it. That is, this function increments the head
pointer.
ring_buffer_read_start: Start an iterator of a cpu buffer.
For now, this disables the cpu buffer, until you issue
a finish. This is just because we do not want the iterator
to be overwritten. This restriction may change in the future.
But note, this is used for static reading of a buffer which
is usually done "after" a trace. Live readings would want
to use the ring_buffer_consume above, which will not
disable the ring buffer.
ring_buffer_read_finish: Finishes the read iterator and reenables
the ring buffer.
ring_buffer_iter_peek: Look at the next item in the cpu iterator.
ring_buffer_read: Read the iterator and increment it.
ring_buffer_iter_reset: Reset the iterator to point to the beginning
of the cpu buffer.
ring_buffer_iter_empty: Returns true if the iterator is at the end
of the cpu buffer.
ring_buffer_size: returns the size in bytes of each cpu buffer.
Note, the real size is this times the number of CPUs.
ring_buffer_reset_cpu: Sets the cpu buffer to empty
ring_buffer_reset: sets all cpu buffers to empty
ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a
cpu buffer of another buffer. This is handy when you
want to take a snap shot of a running trace on just one
cpu. Having a backup buffer, to swap with facilitates this.
Ftrace max latencies use this.
ring_buffer_empty: Returns true if the ring buffer is empty.
ring_buffer_empty_cpu: Returns true if the cpu buffer is empty.
ring_buffer_record_disable: disable all cpu buffers (read only)
ring_buffer_record_disable_cpu: disable a single cpu buffer (read only)
ring_buffer_record_enable: enable all cpu buffers.
ring_buffer_record_enabl_cpu: enable a single cpu buffer.
ring_buffer_entries: The number of entries in a ring buffer.
ring_buffer_overruns: The number of entries removed due to writing wrap.
ring_buffer_time_stamp: Get the time stamp used by the ring buffer
ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp
into nanosecs.
I still need to implement the GTOD feature. But we need support from
the cpu frequency infrastructure. But this can be done at a later
time without affecting the ring buffer interface.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-30 11:02:38 +08:00
|
|
|
select RING_BUFFER
|
2008-11-01 03:50:41 +08:00
|
|
|
select STACKTRACE if STACKTRACE_SUPPORT
|
2008-07-23 20:15:22 +08:00
|
|
|
select TRACEPOINTS
|
2008-10-29 23:15:57 +08:00
|
|
|
select NOP_TRACER
|
2008-05-13 03:20:42 +08:00
|
|
|
|
2008-10-21 22:31:18 +08:00
|
|
|
menu "Tracers"
|
|
|
|
|
2008-10-07 07:06:12 +08:00
|
|
|
config FUNCTION_TRACER
|
ftrace: function tracer
This is a simple trace that uses the ftrace infrastructure. It is
designed to be fast and small, and easy to use. It is useful to
record things that happen over a very short period of time, and
not to analyze the system in general.
Updates:
available_tracers
"function" is added to this file.
current_tracer
To enable the function tracer:
echo function > /debugfs/tracing/current_tracer
To disable the tracer:
echo disable > /debugfs/tracing/current_tracer
The output of the function_trace file is as follows
"echo noverbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
swapper-0 0d.h. 1595128us+: set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper-0 0d.h. 1595131us+: _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
Or with verbose turned on:
"echo verbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
swapper 0 0 9 00000000 00000000 [f3675f41] 1595.128ms (+0.003ms): set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper 0 0 9 00000000 00000001 [f3675f45] 1595.131ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
swapper 0 0 9 00000000 00000002 [f3675f48] 1595.135ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
The "trace" file is not affected by the verbose mode, but is by the symonly.
echo "nosymonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479967] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <ffffffff80337a4d> <-- _spin_unlock_irqrestore+0xe/0x5a <ffffffff8048cc8f>
[ 81.479967] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <ffffffff8048ccbf> <-- sub_preempt_count+0xc/0x7a <ffffffff80233d7b>
[ 81.479968] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <ffffffff80233d9f> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
[ 81.479968] CPU 0: bash:3154 vfs_write+0x11d/0x155 <ffffffff8029a043> <-- dnotify_parent+0x12/0x78 <ffffffff802d54fb>
[ 81.479968] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <ffffffff802d5516> <-- _spin_lock+0xe/0x70 <ffffffff8048c910>
[ 81.479969] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <ffffffff8048c91d> <-- add_preempt_count+0xe/0x77 <ffffffff80233df7>
[ 81.479969] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <ffffffff80233e27> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
echo "symonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479913] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <-- _spin_unlock_irqrestore+0xe/0x5a
[ 81.479913] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <-- sub_preempt_count+0xc/0x7a
[ 81.479913] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <-- in_lock_functions+0x9/0x24
[ 81.479914] CPU 0: bash:3154 vfs_write+0x11d/0x155 <-- dnotify_parent+0x12/0x78
[ 81.479914] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <-- _spin_lock+0xe/0x70
[ 81.479914] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <-- add_preempt_count+0xe/0x77
[ 81.479914] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <-- in_lock_functions+0x9/0x24
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-13 03:20:42 +08:00
|
|
|
bool "Kernel Function Tracer"
|
2008-10-07 07:06:12 +08:00
|
|
|
depends on HAVE_FUNCTION_TRACER
|
2008-09-04 20:04:51 +08:00
|
|
|
depends on DEBUG_KERNEL
|
ftrace: function tracer
This is a simple trace that uses the ftrace infrastructure. It is
designed to be fast and small, and easy to use. It is useful to
record things that happen over a very short period of time, and
not to analyze the system in general.
Updates:
available_tracers
"function" is added to this file.
current_tracer
To enable the function tracer:
echo function > /debugfs/tracing/current_tracer
To disable the tracer:
echo disable > /debugfs/tracing/current_tracer
The output of the function_trace file is as follows
"echo noverbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
swapper-0 0d.h. 1595128us+: set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper-0 0d.h. 1595131us+: _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
Or with verbose turned on:
"echo verbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
swapper 0 0 9 00000000 00000000 [f3675f41] 1595.128ms (+0.003ms): set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper 0 0 9 00000000 00000001 [f3675f45] 1595.131ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
swapper 0 0 9 00000000 00000002 [f3675f48] 1595.135ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
The "trace" file is not affected by the verbose mode, but is by the symonly.
echo "nosymonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479967] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <ffffffff80337a4d> <-- _spin_unlock_irqrestore+0xe/0x5a <ffffffff8048cc8f>
[ 81.479967] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <ffffffff8048ccbf> <-- sub_preempt_count+0xc/0x7a <ffffffff80233d7b>
[ 81.479968] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <ffffffff80233d9f> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
[ 81.479968] CPU 0: bash:3154 vfs_write+0x11d/0x155 <ffffffff8029a043> <-- dnotify_parent+0x12/0x78 <ffffffff802d54fb>
[ 81.479968] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <ffffffff802d5516> <-- _spin_lock+0xe/0x70 <ffffffff8048c910>
[ 81.479969] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <ffffffff8048c91d> <-- add_preempt_count+0xe/0x77 <ffffffff80233df7>
[ 81.479969] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <ffffffff80233e27> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
echo "symonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479913] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <-- _spin_unlock_irqrestore+0xe/0x5a
[ 81.479913] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <-- sub_preempt_count+0xc/0x7a
[ 81.479913] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <-- in_lock_functions+0x9/0x24
[ 81.479914] CPU 0: bash:3154 vfs_write+0x11d/0x155 <-- dnotify_parent+0x12/0x78
[ 81.479914] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <-- _spin_lock+0xe/0x70
[ 81.479914] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <-- add_preempt_count+0xe/0x77
[ 81.479914] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <-- in_lock_functions+0x9/0x24
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-13 03:20:42 +08:00
|
|
|
select FRAME_POINTER
|
|
|
|
select TRACING
|
2008-05-13 03:20:42 +08:00
|
|
|
select CONTEXT_SWITCH_TRACER
|
ftrace: function tracer
This is a simple trace that uses the ftrace infrastructure. It is
designed to be fast and small, and easy to use. It is useful to
record things that happen over a very short period of time, and
not to analyze the system in general.
Updates:
available_tracers
"function" is added to this file.
current_tracer
To enable the function tracer:
echo function > /debugfs/tracing/current_tracer
To disable the tracer:
echo disable > /debugfs/tracing/current_tracer
The output of the function_trace file is as follows
"echo noverbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| /
||||| delay
cmd pid ||||| time | caller
\ / ||||| \ | /
swapper-0 0d.h. 1595128us+: set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper-0 0d.h. 1595131us+: _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
Or with verbose turned on:
"echo verbose > /debugfs/tracing/iter_ctrl"
preemption latency trace v1.1.5 on 2.6.24-rc7-tst
--------------------------------------------------------------------
latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-----------------
swapper 0 0 9 00000000 00000000 [f3675f41] 1595.128ms (+0.003ms): set_normalized_timespec+0x8/0x2d <c043841d> (ktime_get_ts+0x4a/0x4e <c04499d4>)
swapper 0 0 9 00000000 00000001 [f3675f45] 1595.131ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
swapper 0 0 9 00000000 00000002 [f3675f48] 1595.135ms (+0.003ms): _spin_lock+0x8/0x18 <c0630690> (hrtimer_interrupt+0x6e/0x1b0 <c0449c56>)
The "trace" file is not affected by the verbose mode, but is by the symonly.
echo "nosymonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479967] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <ffffffff80337a4d> <-- _spin_unlock_irqrestore+0xe/0x5a <ffffffff8048cc8f>
[ 81.479967] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <ffffffff8048ccbf> <-- sub_preempt_count+0xc/0x7a <ffffffff80233d7b>
[ 81.479968] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <ffffffff80233d9f> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
[ 81.479968] CPU 0: bash:3154 vfs_write+0x11d/0x155 <ffffffff8029a043> <-- dnotify_parent+0x12/0x78 <ffffffff802d54fb>
[ 81.479968] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <ffffffff802d5516> <-- _spin_lock+0xe/0x70 <ffffffff8048c910>
[ 81.479969] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <ffffffff8048c91d> <-- add_preempt_count+0xe/0x77 <ffffffff80233df7>
[ 81.479969] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <ffffffff80233e27> <-- in_lock_functions+0x9/0x24 <ffffffff8025a75d>
echo "symonly" > /debugfs/tracing/iter_ctrl
tracer:
[ 81.479913] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <-- _spin_unlock_irqrestore+0xe/0x5a
[ 81.479913] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <-- sub_preempt_count+0xc/0x7a
[ 81.479913] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <-- in_lock_functions+0x9/0x24
[ 81.479914] CPU 0: bash:3154 vfs_write+0x11d/0x155 <-- dnotify_parent+0x12/0x78
[ 81.479914] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <-- _spin_lock+0xe/0x70
[ 81.479914] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <-- add_preempt_count+0xe/0x77
[ 81.479914] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <-- in_lock_functions+0x9/0x24
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-13 03:20:42 +08:00
|
|
|
help
|
|
|
|
Enable the kernel to trace every kernel function. This is done
|
|
|
|
by using a compiler feature to insert a small, 5-byte No-Operation
|
|
|
|
instruction to the beginning of every kernel function, which NOP
|
|
|
|
sequence is then dynamically patched into a tracer call when
|
|
|
|
tracing is enabled by the administrator. If it's runtime disabled
|
|
|
|
(the bootup default), then the overhead of the instructions is very
|
|
|
|
small and not measurable even in micro-benchmarks.
|
2008-05-13 03:20:42 +08:00
|
|
|
|
2008-11-26 04:07:04 +08:00
|
|
|
config FUNCTION_GRAPH_TRACER
|
|
|
|
bool "Kernel Function Graph Tracer"
|
|
|
|
depends on HAVE_FUNCTION_GRAPH_TRACER
|
2008-11-11 14:14:25 +08:00
|
|
|
depends on FUNCTION_TRACER
|
2008-12-03 17:33:58 +08:00
|
|
|
default y
|
2008-11-11 14:14:25 +08:00
|
|
|
help
|
2008-11-26 04:07:04 +08:00
|
|
|
Enable the kernel to trace a function at both its return
|
|
|
|
and its entry.
|
|
|
|
It's first purpose is to trace the duration of functions and
|
|
|
|
draw a call graph for each thread with some informations like
|
|
|
|
the return value.
|
|
|
|
This is done by setting the current return address on the current
|
|
|
|
task structure into a stack of calls.
|
2008-11-11 14:14:25 +08:00
|
|
|
|
2008-05-13 03:20:42 +08:00
|
|
|
config IRQSOFF_TRACER
|
|
|
|
bool "Interrupts-off Latency Tracer"
|
|
|
|
default n
|
|
|
|
depends on TRACE_IRQFLAGS_SUPPORT
|
|
|
|
depends on GENERIC_TIME
|
2008-09-04 20:04:51 +08:00
|
|
|
depends on DEBUG_KERNEL
|
2008-05-13 03:20:42 +08:00
|
|
|
select TRACE_IRQFLAGS
|
|
|
|
select TRACING
|
|
|
|
select TRACER_MAX_TRACE
|
|
|
|
help
|
|
|
|
This option measures the time spent in irqs-off critical
|
|
|
|
sections, with microsecond accuracy.
|
|
|
|
|
|
|
|
The default measurement method is a maximum search, which is
|
|
|
|
disabled by default and can be runtime (re-)started
|
|
|
|
via:
|
|
|
|
|
|
|
|
echo 0 > /debugfs/tracing/tracing_max_latency
|
|
|
|
|
2008-05-13 03:20:42 +08:00
|
|
|
(Note that kernel size and overhead increases with this option
|
|
|
|
enabled. This option and the preempt-off timing option can be
|
|
|
|
used together or separately.)
|
|
|
|
|
|
|
|
config PREEMPT_TRACER
|
|
|
|
bool "Preemption-off Latency Tracer"
|
|
|
|
default n
|
|
|
|
depends on GENERIC_TIME
|
|
|
|
depends on PREEMPT
|
2008-09-04 20:04:51 +08:00
|
|
|
depends on DEBUG_KERNEL
|
2008-05-13 03:20:42 +08:00
|
|
|
select TRACING
|
|
|
|
select TRACER_MAX_TRACE
|
|
|
|
help
|
|
|
|
This option measures the time spent in preemption off critical
|
|
|
|
sections, with microsecond accuracy.
|
|
|
|
|
|
|
|
The default measurement method is a maximum search, which is
|
|
|
|
disabled by default and can be runtime (re-)started
|
|
|
|
via:
|
|
|
|
|
|
|
|
echo 0 > /debugfs/tracing/tracing_max_latency
|
|
|
|
|
|
|
|
(Note that kernel size and overhead increases with this option
|
|
|
|
enabled. This option and the irqs-off timing option can be
|
|
|
|
used together or separately.)
|
|
|
|
|
2008-05-13 03:20:47 +08:00
|
|
|
config SYSPROF_TRACER
|
|
|
|
bool "Sysprof Tracer"
|
2008-05-24 21:00:46 +08:00
|
|
|
depends on X86
|
2008-05-13 03:20:47 +08:00
|
|
|
select TRACING
|
|
|
|
help
|
|
|
|
This tracer provides the trace needed by the 'Sysprof' userspace
|
|
|
|
tool.
|
|
|
|
|
2008-05-13 03:20:42 +08:00
|
|
|
config SCHED_TRACER
|
|
|
|
bool "Scheduling Latency Tracer"
|
2008-09-04 20:04:51 +08:00
|
|
|
depends on DEBUG_KERNEL
|
2008-05-13 03:20:42 +08:00
|
|
|
select TRACING
|
|
|
|
select CONTEXT_SWITCH_TRACER
|
|
|
|
select TRACER_MAX_TRACE
|
|
|
|
help
|
|
|
|
This tracer tracks the latency of the highest priority task
|
|
|
|
to be scheduled in, starting from the point it has woken up.
|
|
|
|
|
2008-05-13 03:20:42 +08:00
|
|
|
config CONTEXT_SWITCH_TRACER
|
|
|
|
bool "Trace process context switches"
|
2008-09-04 20:04:51 +08:00
|
|
|
depends on DEBUG_KERNEL
|
2008-05-13 03:20:42 +08:00
|
|
|
select TRACING
|
|
|
|
select MARKERS
|
|
|
|
help
|
|
|
|
This tracer gets called from the context switch and records
|
|
|
|
all switching of tasks.
|
|
|
|
|
2008-09-23 18:36:20 +08:00
|
|
|
config BOOT_TRACER
|
|
|
|
bool "Trace boot initcalls"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
select TRACING
|
2008-10-23 01:26:23 +08:00
|
|
|
select CONTEXT_SWITCH_TRACER
|
2008-09-23 18:36:20 +08:00
|
|
|
help
|
|
|
|
This tracer helps developers to optimize boot times: it records
|
2008-10-14 20:27:20 +08:00
|
|
|
the timings of the initcalls and traces key events and the identity
|
|
|
|
of tasks that can cause boot delays, such as context-switches.
|
|
|
|
|
|
|
|
Its aim is to be parsed by the /scripts/bootgraph.pl tool to
|
|
|
|
produce pretty graphics about boot inefficiencies, giving a visual
|
|
|
|
representation of the delays during initcalls - but the raw
|
|
|
|
/debug/tracing/trace text output is readable too.
|
|
|
|
|
|
|
|
( Note that tracing self tests can't be enabled if this tracer is
|
|
|
|
selected, because the self-tests are an initcall as well and that
|
|
|
|
would invalidate the boot trace. )
|
2008-09-23 18:36:20 +08:00
|
|
|
|
2008-11-13 04:24:24 +08:00
|
|
|
config TRACE_BRANCH_PROFILING
|
2008-11-12 13:14:39 +08:00
|
|
|
bool "Trace likely/unlikely profiler"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
select TRACING
|
|
|
|
help
|
|
|
|
This tracer profiles all the the likely and unlikely macros
|
|
|
|
in the kernel. It will display the results in:
|
|
|
|
|
2008-11-21 13:40:40 +08:00
|
|
|
/debugfs/tracing/profile_annotated_branch
|
2008-11-12 13:14:39 +08:00
|
|
|
|
|
|
|
Note: this will add a significant overhead, only turn this
|
|
|
|
on if you need to profile the system's use of these macros.
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2008-11-21 14:30:54 +08:00
|
|
|
config PROFILE_ALL_BRANCHES
|
|
|
|
bool "Profile all if conditionals"
|
|
|
|
depends on TRACE_BRANCH_PROFILING
|
|
|
|
help
|
|
|
|
This tracer profiles all branch conditions. Every if ()
|
|
|
|
taken in the kernel is recorded whether it hit or miss.
|
|
|
|
The results will be displayed in:
|
|
|
|
|
|
|
|
/debugfs/tracing/profile_branch
|
|
|
|
|
|
|
|
This configuration, when enabled, will impose a great overhead
|
|
|
|
on the system. This should only be enabled when the system
|
|
|
|
is to be analyzed
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2008-11-13 04:24:24 +08:00
|
|
|
config TRACING_BRANCHES
|
2008-11-12 13:14:40 +08:00
|
|
|
bool
|
|
|
|
help
|
|
|
|
Selected by tracers that will trace the likely and unlikely
|
|
|
|
conditions. This prevents the tracers themselves from being
|
|
|
|
profiled. Profiling the tracing infrastructure can only happen
|
|
|
|
when the likelys and unlikelys are not being traced.
|
|
|
|
|
2008-11-13 04:24:24 +08:00
|
|
|
config BRANCH_TRACER
|
2008-11-12 13:14:40 +08:00
|
|
|
bool "Trace likely/unlikely instances"
|
2008-11-13 04:24:24 +08:00
|
|
|
depends on TRACE_BRANCH_PROFILING
|
|
|
|
select TRACING_BRANCHES
|
2008-11-12 13:14:40 +08:00
|
|
|
help
|
|
|
|
This traces the events of likely and unlikely condition
|
|
|
|
calls in the kernel. The difference between this and the
|
|
|
|
"Trace likely/unlikely profiler" is that this is not a
|
|
|
|
histogram of the callers, but actually places the calling
|
|
|
|
events into a running trace buffer to see when and where the
|
|
|
|
events happened, as well as their results.
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2008-11-24 08:49:58 +08:00
|
|
|
config POWER_TRACER
|
|
|
|
bool "Trace power consumption behavior"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
depends on X86
|
|
|
|
select TRACING
|
|
|
|
help
|
|
|
|
This tracer helps developers to analyze and optimize the kernels
|
|
|
|
power management decisions, specifically the C-state and P-state
|
|
|
|
behavior.
|
|
|
|
|
|
|
|
|
2008-08-28 11:31:01 +08:00
|
|
|
config STACK_TRACER
|
|
|
|
bool "Trace max stack"
|
2008-10-07 07:06:12 +08:00
|
|
|
depends on HAVE_FUNCTION_TRACER
|
2008-09-04 21:04:37 +08:00
|
|
|
depends on DEBUG_KERNEL
|
2008-10-07 07:06:12 +08:00
|
|
|
select FUNCTION_TRACER
|
2008-08-28 11:31:01 +08:00
|
|
|
select STACKTRACE
|
|
|
|
help
|
2008-10-14 20:15:43 +08:00
|
|
|
This special tracer records the maximum stack footprint of the
|
|
|
|
kernel and displays it in debugfs/tracing/stack_trace.
|
|
|
|
|
|
|
|
This tracer works by hooking into every function call that the
|
|
|
|
kernel executes, and keeping a maximum stack depth value and
|
2008-12-17 12:06:40 +08:00
|
|
|
stack-trace saved. If this is configured with DYNAMIC_FTRACE
|
|
|
|
then it will not have any overhead while the stack tracer
|
|
|
|
is disabled.
|
|
|
|
|
|
|
|
To enable the stack tracer on bootup, pass in 'stacktrace'
|
|
|
|
on the kernel command line.
|
|
|
|
|
|
|
|
The stack tracer can also be enabled or disabled via the
|
|
|
|
sysctl kernel.stack_tracer_enabled
|
2008-10-14 20:15:43 +08:00
|
|
|
|
|
|
|
Say N if unsure.
|
2008-08-28 11:31:01 +08:00
|
|
|
|
2008-12-11 20:53:26 +08:00
|
|
|
config HW_BRANCH_TRACER
|
2008-11-25 16:24:15 +08:00
|
|
|
depends on HAVE_HW_BRANCH_TRACER
|
2008-12-11 20:53:26 +08:00
|
|
|
bool "Trace hw branches"
|
2008-11-25 16:24:15 +08:00
|
|
|
select TRACING
|
|
|
|
help
|
|
|
|
This tracer records all branches on the system in a circular
|
|
|
|
buffer giving access to the last N branches for each cpu.
|
|
|
|
|
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-13 03:20:42 +08:00
|
|
|
config DYNAMIC_FTRACE
|
|
|
|
bool "enable/disable ftrace tracepoints dynamically"
|
2008-10-07 07:06:12 +08:00
|
|
|
depends on FUNCTION_TRACER
|
2008-05-17 12:01:36 +08:00
|
|
|
depends on HAVE_DYNAMIC_FTRACE
|
2008-09-04 20:04:51 +08:00
|
|
|
depends on DEBUG_KERNEL
|
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-13 03:20:42 +08:00
|
|
|
default y
|
|
|
|
help
|
|
|
|
This option will modify all the calls to ftrace dynamically
|
|
|
|
(will patch them out of the binary image and replaces them
|
|
|
|
with a No-Op instruction) as they are called. A table is
|
|
|
|
created to dynamically enable them again.
|
|
|
|
|
2008-10-07 07:06:12 +08:00
|
|
|
This way a CONFIG_FUNCTION_TRACER kernel is slightly larger, but otherwise
|
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-13 03:20:42 +08:00
|
|
|
has native performance as long as no tracing is active.
|
|
|
|
|
|
|
|
The changes to the code are done by a kernel thread that
|
|
|
|
wakes up once a second and checks to see if any ftrace calls
|
|
|
|
were made. If so, it runs stop_machine (stops all CPUS)
|
|
|
|
and modifies the code to jump over the call to ftrace.
|
2008-05-13 03:20:44 +08:00
|
|
|
|
ftrace: create __mcount_loc section
This patch creates a section in the kernel called "__mcount_loc".
This will hold a list of pointers to the mcount relocation for
each call site of mcount.
For example:
objdump -dr init/main.o
[...]
Disassembly of section .text:
0000000000000000 <do_one_initcall>:
0: 55 push %rbp
[...]
000000000000017b <init_post>:
17b: 55 push %rbp
17c: 48 89 e5 mov %rsp,%rbp
17f: 53 push %rbx
180: 48 83 ec 08 sub $0x8,%rsp
184: e8 00 00 00 00 callq 189 <init_post+0xe>
185: R_X86_64_PC32 mcount+0xfffffffffffffffc
[...]
We will add a section to point to each function call.
.section __mcount_loc,"a",@progbits
[...]
.quad .text + 0x185
[...]
The offset to of the mcount call site in init_post is an offset from
the start of the section, and not the start of the function init_post.
The mcount relocation is at the call site 0x185 from the start of the
.text section.
.text + 0x185 == init_post + 0xa
We need a way to add this __mcount_loc section in a way that we do not
lose the relocations after final link. The .text section here will
be attached to all other .text sections after final link and the
offsets will be meaningless. We need to keep track of where these
.text sections are.
To do this, we use the start of the first function in the section.
do_one_initcall. We can make a tmp.s file with this function as a reference
to the start of the .text section.
.section __mcount_loc,"a",@progbits
[...]
.quad do_one_initcall + 0x185
[...]
Then we can compile the tmp.s into a tmp.o
gcc -c tmp.s -o tmp.o
And link it into back into main.o.
ld -r main.o tmp.o -o tmp_main.o
mv tmp_main.o main.o
But we have a problem. What happens if the first function in a section
is not exported, and is a static function. The linker will not let
the tmp.o use it. This case exists in main.o as well.
Disassembly of section .init.text:
0000000000000000 <set_reset_devices>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: e8 00 00 00 00 callq 9 <set_reset_devices+0x9>
5: R_X86_64_PC32 mcount+0xfffffffffffffffc
The first function in .init.text is a static function.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
The lowercase 't' means that set_reset_devices is local and is not exported.
If we simply try to link the tmp.o with the set_reset_devices we end
up with two symbols: one local and one global.
.section __mcount_loc,"a",@progbits
.quad set_reset_devices + 0x10
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
U set_reset_devices
We still have an undefined reference to set_reset_devices, and if we try
to compile the kernel, we will end up with an undefined reference to
set_reset_devices, or even worst, it could be exported someplace else,
and then we will have a reference to the wrong location.
To handle this case, we make an intermediate step using objcopy.
We convert set_reset_devices into a global exported symbol before linking
it with tmp.o and set it back afterwards.
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices
00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
Now we have a section in main.o called __mcount_loc that we can place
somewhere in the kernel using vmlinux.ld.S and access it to convert
all these locations that call mcount into nops before starting SMP
and thus, eliminating the need to do this with kstop_machine.
Note, A well documented perl script (scripts/recordmcount.pl) is used
to do all this in one location.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-15 03:45:07 +08:00
|
|
|
config FTRACE_MCOUNT_RECORD
|
|
|
|
def_bool y
|
|
|
|
depends on DYNAMIC_FTRACE
|
|
|
|
depends on HAVE_FTRACE_MCOUNT_RECORD
|
|
|
|
|
2008-05-13 03:20:44 +08:00
|
|
|
config FTRACE_SELFTEST
|
|
|
|
bool
|
|
|
|
|
|
|
|
config FTRACE_STARTUP_TEST
|
|
|
|
bool "Perform a startup test on ftrace"
|
2008-09-24 17:36:09 +08:00
|
|
|
depends on TRACING && DEBUG_KERNEL && !BOOT_TRACER
|
2008-05-13 03:20:44 +08:00
|
|
|
select FTRACE_SELFTEST
|
|
|
|
help
|
|
|
|
This option performs a series of startup tests on ftrace. On bootup
|
|
|
|
a series of tests are made to verify that the tracer is
|
|
|
|
functioning properly. It will do tests on all the configured
|
|
|
|
tracers of ftrace.
|
2008-10-21 22:31:18 +08:00
|
|
|
|
|
|
|
endmenu
|