linux-sg2042

History

Paul Mackerras 53cfbf5937 perf_counter: record time running and time enabled for each counter Impact: new functionality Currently, if there are more counters enabled than can fit on the CPU, the kernel will multiplex the counters on to the hardware using round-robin scheduling. That isn't too bad for sampling counters, but for counting counters it means that the value read from a counter represents some unknown fraction of the true count of events that occurred while the counter was enabled. This remedies the situation by keeping track of how long each counter is enabled for, and how long it is actually on the cpu and counting events. These times are recorded in nanoseconds using the task clock for per-task counters and the cpu clock for per-cpu counters. These values can be supplied to userspace on a read from the counter. Userspace requests that they be supplied after the counter value by setting the PERF_FORMAT_TOTAL_TIME_ENABLED and/or PERF_FORMAT_TOTAL_TIME_RUNNING bits in the hw_event.read_format field when creating the counter. (There is no way to change the read format after the counter is created, though it would be possible to add some way to do that.) Using this information it is possible for userspace to scale the count it reads from the counter to get an estimate of the true count: true_count_estimate = count * total_time_enabled / total_time_running This also lets userspace detect the situation where the counter never got to go on the cpu: total_time_running == 0. This functionality has been requested by the PAPI developers, and will be generally needed for interpreting the count values from counting counters correctly. In the implementation, this keeps 5 time values (in nanoseconds) for each counter: total_time_enabled and total_time_running are used when the counter is in state OFF or ERROR and for reporting back to userspace. When the counter is in state INACTIVE or ACTIVE, it is the tstamp_enabled, tstamp_running and tstamp_stopped values that are relevant, and total_time_enabled and total_time_running are determined from them. (tstamp_stopped is only used in INACTIVE state.) The reason for doing it like this is that it means that only counters being enabled or disabled at sched-in and sched-out time need to be updated. There are no new loops that iterate over all counters to update total_time_enabled or total_time_running. This also keeps separate child_total_time_running and child_total_time_enabled fields that get added in when reporting the totals to userspace. They are separate fields so that they can be atomic. We don't want to use atomics for total_time_running, total_time_enabled etc., because then we would have to use atomic sequences to update them, which are slower than regular arithmetic and memory accesses. It is possible to measure total_time_running by adding a task_clock counter to each group of counters, and total_time_enabled can be measured approximately with a top-level task_clock counter (though inaccuracies will creep in if you need to disable and enable groups since it is not possible in general to disable/enable the top-level task_clock counter simultaneously with another group). However, that adds extra overhead - I measured around 15% increase in the context switch latency reported by lat_ctx (from lmbench) when a task_clock counter was added to each of 2 groups, and around 25% increase when a task_clock counter was added to each of 4 groups. (In both cases a top-level task-clock counter was also added.) In contrast, the code added in this commit gives better information with no overhead that I could measure (in fact in some cases I measured lower times with this code, but the differences were all less than one standard deviation). [ v2: address review comments by Andrew Morton. ] Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Orig-LKML-Reference: <18890.6578.728637.139402@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>		2009-04-06 09:30:36 +02:00
..
irq	Merge branch 'tracing/core-v2' into tracing-for-linus	2009-04-02 00:49:02 +02:00
power	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2009-04-03 15:24:35 -07:00
time	Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-03-26 16:05:42 -07:00
trace	Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-04-05 11:04:19 -07:00
.gitignore	…
Kconfig.freezer	container freezer: implement freezer cgroup subsystem	2008-10-20 08:52:34 -07:00
Kconfig.hz	sched: fix SCHED_HRTICK dependency	2008-07-28 14:37:38 +02:00
Kconfig.preempt	rcu: provide RCU options on non-preempt architectures too	2008-12-25 09:31:28 +01:00
Makefile	Merge branch 'linus' into perfcounters/core-v2	2009-04-06 09:02:57 +02:00
acct.c	[CVE-2009-0029] System call wrappers part 04	2009-01-14 14:15:19 +01:00
async.c	async: remove the temporary (2.6.29) "async is off by default" code	2009-03-28 13:05:30 -07:00
audit.c	Audit: remove spaces from audit_log_d_path	2009-04-05 13:49:04 -04:00
audit.h	fixing audit rule ordering mess, part 1	2009-01-04 15:14:41 -05:00
audit_tree.c	audit: incorrect ref counting in audit tree tag_chunk	2009-04-05 13:48:26 -04:00
auditfilter.c	make the e->rule.xxx shorter in kernel auditfilter.c	2009-04-05 13:40:33 -04:00
auditsc.c	Audit: remove spaces from audit_log_d_path	2009-04-05 13:49:04 -04:00
backtracetest.c	backtrace: replace timer with tasklet + completions	2008-06-27 18:09:16 +02:00
bounds.c	…
capability.c	[CVE-2009-0029] System call wrappers part 04	2009-01-14 14:15:19 +01:00
cgroup.c	memcg: fix OOM killer under memcg	2009-04-02 19:04:55 -07:00
cgroup_debug.c	debug cgroup: remove unneeded cgroup_lock	2009-04-02 19:04:54 -07:00
cgroup_freezer.c	freezer_cg: disable writing freezer.state of root cgroup	2008-11-12 17:17:16 -08:00
compat.c	Allow times and time system calls to return small negative values	2009-01-06 15:59:13 -08:00
configs.c	kernel/configs.c: remove useless comments	2008-10-20 08:52:34 -07:00
cpu.c	cpumask: use set_cpu_active in init/main.c	2009-03-30 22:05:12 +10:30
cpuset.c	cpusets: prevent PF_THREAD_BOUND tasks from attaching to non-root cpusets	2009-04-02 19:04:57 -07:00
cred-internals.h	CRED: Inaugurate COW credentials	2008-11-14 10:39:23 +11:00
cred.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6	2009-01-09 13:59:25 -08:00
delayacct.c	schedstat: consolidate per-task cpu runtime stats	2008-12-18 13:54:01 +01:00
dma-coherent.c	dma-coherent: Restore dma_alloc_from_coherent() large alloc fall back policy.	2009-01-21 18:51:53 +09:00
dma.c	kernel/dma.c: remove a CVS keyword	2008-10-16 11:21:30 -07:00
exec_domain.c	Get rid of indirect include of fs_struct.h	2009-03-31 23:00:27 -04:00
exit.c	Merge branch 'linus' into perfcounters/core-v2	2009-04-06 09:02:57 +02:00
extable.c	Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-04-05 11:04:19 -07:00
fork.c	Merge branch 'linus' into perfcounters/core-v2	2009-04-06 09:02:57 +02:00
freezer.c	freezer_cg: use thaw_process() in unfreeze_cgroup()	2008-10-30 11:38:45 -07:00
futex.c	futex: remove the pointer math from double_unlock_hb, fix	2009-03-13 10:32:07 +01:00
futex_compat.c	CRED: Use RCU to access another task's creds and to release a task's own creds	2008-11-14 10:39:19 +11:00
hrtimer.c	hrtimer: prevent negative expiry value after clock_was_set()	2009-01-30 22:35:34 +01:00
itimer.c	timers: split process wide cpu clocks/timers	2009-02-05 13:04:33 +01:00
kallsyms.c	Ksplice: Add functions for walking kallsyms symbols	2009-03-31 13:05:32 +10:30
kexec.c	kexec: vmcoreinfo_data[] can become static	2009-04-02 19:05:04 -07:00
kfifo.c	…
kgdb.c	kgdb: call touch_softlockup_watchdog on resume	2008-10-06 13:50:59 -05:00
kmod.c	module: create a request_module_nowait()	2009-03-31 13:05:35 +10:30
kprobes.c	kprobes: Fix locking imbalance in kretprobes	2009-03-18 12:51:16 +01:00
ksysfs.c	kernel/ksysfs.c:fix dependence on CONFIG_NET	2009-01-06 10:44:31 -08:00
kthread.c	cpumask: remove dangerous CPU_MASK_ALL_PTR, &CPU_MASK_ALL	2009-03-30 22:05:11 +10:30
latencytop.c	sched, latencytop: incorporate review feedback from Andrew Morton	2009-02-11 10:18:04 +01:00
lockdep.c	Merge branch 'tracing/core-v2' into tracing-for-linus	2009-04-02 00:49:02 +02:00
lockdep_internals.h	lockdep: get_user_chars() redo	2009-02-14 23:28:22 +01:00
lockdep_proc.c	lockstat: warn about disabled lock debugging	2009-02-14 23:28:28 +01:00
lockdep_states.h	lockdep: move state bit definitions around	2009-02-14 23:27:59 +01:00
marker.c	markers/tracpoints: fix non-modular build	2008-11-16 09:52:03 +01:00
module.c	Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-04-05 11:04:19 -07:00
mutex-debug.c	mutex: implement adaptive spinning	2009-01-14 18:09:02 +01:00
mutex-debug.h	mutex: implement adaptive spinning	2009-01-14 18:09:02 +01:00
mutex.c	mutex: drop "inline" from mutex_lock() inside kernel/mutex.c	2009-04-06 09:30:27 +02:00
mutex.h	mutex: implement adaptive spinning	2009-01-14 18:09:02 +01:00
notifier.c	Merge commit 'v2.6.28-rc6' into core/debug	2008-11-26 08:22:50 +01:00
ns_cgroup.c	cgroups: relax ns_can_attach checks to allow attaching to grandchild cgroups	2009-04-02 19:04:53 -07:00
nsproxy.c	User namespaces: set of cleanups (v2)	2008-11-24 18:57:41 -05:00
panic.c	panic: clean up kernel/panic.c	2009-03-13 11:25:53 +01:00
params.c	param: fix charp parameters set via sysfs	2009-03-31 13:05:30 +10:30
perf_counter.c	perf_counter: record time running and time enabled for each counter	2009-04-06 09:30:36 +02:00
pid.c	pids: refactor vnr/nr_ns helpers to make them safe	2009-04-02 19:05:02 -07:00
pid_namespace.c	signals: zap_pid_ns_process() should use force_sig()	2009-04-02 19:04:58 -07:00
pm_qos_params.c	pm_qos_requirement might sleep	2008-09-02 19:21:40 -07:00
posix-cpu-timers.c	posix timers: fix RLIMIT_CPU && fork()	2009-03-23 20:43:35 +01:00
posix-timers.c	[CVE-2009-0029] System call wrappers part 05	2009-01-14 14:15:20 +01:00
printk.c	Merge branch 'printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-04-05 10:23:25 -07:00
profile.c	profiling: fix broken profiling regression	2009-02-10 00:50:37 +01:00
ptrace.c	Merge branch 'core-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-04-03 17:35:06 -07:00
rcuclassic.c	rcu: Teach RCU that idle task is not quiscent state at boot	2009-02-26 04:08:14 +01:00
rcupdate.c	rcu: rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated to online cpu	2009-03-31 00:09:37 +02:00
rcupreempt.c	rcu: Teach RCU that idle task is not quiscent state at boot	2009-02-26 04:08:14 +01:00
rcupreempt_trace.c	"Tree RCU": scalable classic RCU implementation	2008-12-18 21:56:04 +01:00
rcutorture.c	cpumask: convert rcutorture.c	2009-03-30 22:05:16 +10:30
rcutree.c	rcu: Teach RCU that idle task is not quiscent state at boot	2009-02-26 04:08:14 +01:00
rcutree_trace.c	"Tree RCU": scalable classic RCU implementation	2008-12-18 21:56:04 +01:00
relay.c	Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-04-05 11:04:19 -07:00
res_counter.c	memcg: memory cgroup resource counters for hierarchy	2009-01-08 08:31:05 -08:00
resource.c	resources: fix parameter name and kernel-doc	2009-01-15 16:39:38 -08:00
rtmutex-debug.c	…
rtmutex-debug.h	…
rtmutex-tester.c	sysdev: Pass the attribute to the low level sysdev show/store function	2008-07-21 21:55:02 -07:00
rtmutex.c	hrtimer: convert kernel/* to the new hrtimer apis	2008-09-05 21:35:13 -07:00
rtmutex.h	…
rtmutex_common.h	…
rwsem.c	…
sched.c	perf_counter: generic context switch event	2009-04-06 09:30:15 +02:00
sched_clock.c	Merge branch 'tracing/core-v2' into tracing-for-linus	2009-04-02 00:49:02 +02:00
sched_cpupri.c	sched: fix section mismatch	2009-01-06 11:07:15 +01:00
sched_cpupri.h	cpumask: remove cpumask_t from core	2009-03-30 22:05:17 +10:30
sched_debug.c	sched: remove unused fields from struct rq	2009-03-24 23:16:51 +01:00
sched_fair.c	Merge branch 'sched/urgent'; commit 'v2.6.29-rc5' into sched/core	2009-02-15 21:15:16 +01:00
sched_features.h	Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-03-30 17:17:35 -07:00
sched_idletask.c	sched: add CONFIG_SMP consistency	2008-10-22 10:01:52 +02:00
sched_rt.c	Merge branch 'core/percpu' into percpu-cpumask-x86-for-linus-2	2009-03-27 17:28:43 +01:00
sched_stats.h	sched: remove unused fields from struct rq	2009-03-24 23:16:51 +01:00
seccomp.c	x86-64: seccomp: fix 32/64 syscall hole	2009-03-02 15:41:30 -08:00
semaphore.c	semaphore: __down_common: use signal_pending_state()	2008-08-05 14:33:47 -07:00
signal.c	signals: SI_USER: Masquerade si_pid when crossing pid ns boundary	2009-04-02 19:04:58 -07:00
slow-work.c	Document the slow work thread pool	2009-04-03 16:42:35 +01:00
smp.c	generic-ipi: eliminate WARN_ON()s during oops/panic	2009-03-13 10:47:34 +01:00
softirq.c	Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-04-05 11:04:19 -07:00
softlockup.c	softlock: fix false panic which can occur if softlockup_thresh is reduced	2009-01-14 11:48:07 +01:00
spinlock.c	Allow rwlocks to re-enable interrupts	2009-04-02 19:05:11 -07:00
srcu.c	…
stacktrace.c	stacktrace: provide save_stack_trace_tsk() weak alias	2008-12-25 11:44:43 +01:00
stop_machine.c	cpumask: remove cpumask_t from core	2009-03-30 22:05:17 +10:30
sys.c	Merge branch 'linus' into perfcounters/core-v2	2009-04-06 09:02:57 +02:00
sys_ni.c	Merge commit 'v2.6.29-rc2' into perfcounters/core	2009-01-21 16:37:27 +01:00
sysctl.c	Make the slow work pool configurable	2009-04-03 16:42:35 +01:00
sysctl_check.c	net: add ARP notify option for devices	2009-02-01 01:04:33 -08:00
taskstats.c	cpumask: convert rest of files in kernel/	2009-01-01 10:12:28 +10:30
test_kprobes.c	kprobes: add tests for register_kprobes	2009-01-06 15:59:20 -08:00
time.c	[CVE-2009-0029] System call wrappers part 01	2009-01-14 14:15:18 +01:00
timeconst.pl	…
timer.c	Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-03-30 17:17:35 -07:00
tracepoint.c	tracepoints: dont update zero-sized tracepoint sections	2009-03-18 19:55:00 +01:00
tsacct.c	Fix fixpoint divide exception in acct_update_integrals	2009-03-09 08:13:35 -07:00
uid16.c	[CVE-2009-0029] System call wrappers part 19	2009-01-14 14:15:26 +01:00
up.c	smp_call_function_single(): be slightly less stupid, fix #2	2009-01-12 16:04:37 +01:00
user.c	Merge branch 'master' into next	2009-03-24 10:52:46 +11:00
user_namespace.c	Fix recursive lock in free_uid()/free_user_ns()	2009-02-27 16:26:21 -08:00
utsname.c	removed unused #include <linux/version.h>'s	2008-08-23 12:14:12 -07:00
utsname_sysctl.c	proc_sysctl: use CONFIG_PROC_SYSCTL around ipc and utsname proc_handlers	2009-04-02 19:05:01 -07:00
wait.c	wait: prevent exclusive waiter starvation	2009-02-05 12:56:48 -08:00
workqueue.c	Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	2009-04-05 11:04:19 -07:00