Commit Graph

9033 Commits

Author SHA1 Message Date
Don Zickus 47195d5763 nmi_watchdog: Clean up various small details
Mostly copy/paste whitespace damage with a couple of nitpicks by
the checkpatch script. Fix the struct definition as requested by Ingo too.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: peterz@infradead.org
Cc: gorcunov@gmail.com
Cc: aris@redhat.com
LKML-Reference: <1266880143-24943-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--
 arch/x86/kernel/apic/hw_nmi.c |   14 +++++------
 arch/x86/kernel/traps.c       |    6 ++--
 include/linux/nmi.h           |    2 -
 kernel/nmi_watchdog.c         |   51 ++++++++++++++++++++----------------------
 4 files changed, 36 insertions(+), 37 deletions(-)
2010-02-25 12:40:50 +01:00
Don Zickus 96ca4028ac nmi_watchdog: Properly configure for software events
Paul Mackerras brought up a good point that when fallbacking to
software events, I may have been lucky in my configuration.

Modified the code to explicit provide a new configuration for
software events.

Suggested-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: gorcunov@gmail.com
Cc: aris@redhat.com
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1266357745-26671-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-17 08:16:02 +01:00
Don Zickus 6081b6cd97 nmi_watchdog: support for oprofile
Re-arrange the code so that when someone disables nmi_watchdog
with:

  echo 0 > /proc/sys/kernel/nmi_watchdog

it releases the hardware reservation on the PMUs.  This allows
the oprofile module to grab those PMUs and do its thing.
Otherwise oprofile fails to load because the hardware is
reserved by the perf_events subsystem.

Tested using:

  oprofile --vm-linux --start

and watched it failed when nmi_watchdog is enabled and succeed
when:

  oprofile --deinit && echo 0 > /proc/sys/kernel/nmi_watchdog

is run.

Note:  this has the side quirk of having the nmi_watchdog latch
onto the software events instead of hardware events if oprofile
has already reserved the hardware first.  User beware! :-)

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: gorcunov@gmail.com
Cc: aris@redhat.com
Cc: eranian@google.com
LKML-Reference: <1266357892-30504-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-17 08:16:02 +01:00
Don Zickus cf454aecb3 nmi_watchdog: Fallback to software events when no hardware pmu detected
Not all arches have a PMU or have perf_event support for their
PMU.  The nmi_watchdog will fail in those cases.  Fallback to
using software events to generate nmi_watchdog traffic with
local apic interrupts.

Tested on a Pentium4 and it worked as expected, excepting for
detecting cpu lockups.

The problem with using software events as a cpu lock up detector
is the nmi_watchdog uses the logic that if local apic interrupts
stop incrementing then the cpu is probably locked up.  But with
software events we use the local apic to trigger the
nmi_watchdog callback to see if local apic interrupts are still
firing, which obviously they are otherwise we wouldn't have been
triggered.

The algorithm to detect cpu lock ups is the same as the old
nmi_watchdog. Perhaps we need to find a better way to detect
lock ups?

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: peterz@infradead.org
Cc: gorcunov@gmail.com
Cc: aris@redhat.com
LKML-Reference: <1266013161-31197-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-14 09:19:44 +01:00
Don Zickus 504d7cf10e nmi_watchdog: Compile and portability fixes
The original patch was x86_64 centric.  Changed the code to make
it less so.

ested by building and running on a powerpc.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: peterz@infradead.org
Cc: gorcunov@gmail.com
Cc: aris@redhat.com
LKML-Reference: <1266013161-31197-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-14 09:19:43 +01:00
Don Zickus 84e478c6f1 nmi_watchdog: Config option to enable new nmi_watchdog
These are the bits that enable the new nmi_watchdog and safely
isolate the old nmi_watchdog.  Only one or the other can run,
not both at the same time.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: gorcunov@gmail.com
Cc: aris@redhat.com
Cc: peterz@infradead.org
LKML-Reference: <1265424425-31562-4-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-08 08:29:03 +01:00
Don Zickus 1fb9d6ad27 nmi_watchdog: Add new, generic implementation, using perf events
This is a new generic nmi_watchdog implementation using the perf
events infrastructure as suggested by Ingo.

The implementation is simple, just create an in-kernel perf
event and register an overflow handler to check for cpu lockups.

I created a generic implementation that lives in kernel/ and
the hardware specific part that for now lives in arch/x86.

This approach has a number of advantages:

 - It simplifies the x86 PMU implementation in the long run,
   in that it removes the hardcoded low-level PMU implementation
   that was the NMI watchdog before.

 - It allows new NMI watchdog features to be added in a central
   place.

 - It allows other architectures to enable the NMI watchdog,
   as long as they have perf events (that provide NMIs)
   implemented.

 - It also allows for more graceful co-existence of existing
   perf events apps and the NMI watchdog - before these changes
   the relationship was exclusive. (The NMI watchdog will 'spend'
   a perf event when enabled. In later iterations we might be
   able to piggyback from an existing NMI event without having
   to allocate a hardware event for the NMI watchdog - turning
   this into a no-hardware-cost feature.)

As for compatibility, we'll keep the old NMI watchdog code as
well until the new one can 100% replace it on all CPUs, old and
new alike.  That might take some time as the NMI watchdog has
been ported to many CPU models.

I have done light testing to make sure the framework works
correctly and it does.

 v2: Set the correct timeout values based on the old nmi
     watchdog

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: gorcunov@gmail.com
Cc: aris@redhat.com
Cc: peterz@infradead.org
LKML-Reference: <1265424425-31562-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-08 08:29:02 +01:00
Masami Hiramatsu 5ecaafdbf4 kprobes: Add mcount to the kprobes blacklist
Since mcount function can be called from everywhere,
it should be blacklisted. Moreover, the "mcount" symbol
is a special symbol name. So, it is better to put it in
the generic blacklist.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20100205062433.3745.36726.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-05 08:13:57 +01:00
Peter Zijlstra 9717e6cd3d perf_events: Optimize perf_event_task_tick()
Pretty much all of the calls do perf_disable/perf_enable cycles, pull
that out to cut back on hardware programming.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-04 09:59:49 +01:00
Masami Hiramatsu f24bb999d2 ftrace: Remove record freezing
Remove record freezing. Because kprobes never puts probe on
ftrace's mcount call anymore, it doesn't need ftrace to check
whether kprobes on it.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: przemyslaw@pawelczyk.it
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100202214925.4694.73469.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-04 09:36:19 +01:00
Masami Hiramatsu 4554dbcb85 kprobes: Check probe address is reserved
Check whether the address of new probe is already reserved by
ftrace or alternatives (on x86) when registering new probe.
If reserved, it returns an error and not register the probe.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: przemyslaw@pawelczyk.it
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Jason Baron <jbaron@redhat.com>
LKML-Reference: <20100202214918.4694.94179.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-04 09:36:19 +01:00
Masami Hiramatsu 2cfa19780d ftrace/alternatives: Introducing *_text_reserved functions
Introducing *_text_reserved functions for checking the text
address range is partially reserved or not. This patch provides
checking routines for x86 smp alternatives and dynamic ftrace.
Since both functions modify fixed pieces of kernel text, they
should reserve and protect those from other dynamic text
modifier, like kprobes.

This will also be extended when introducing other subsystems
which modify fixed pieces of kernel text. Dynamic text modifiers
should avoid those.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: przemyslaw@pawelczyk.it
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Jason Baron <jbaron@redhat.com>
LKML-Reference: <20100202214911.4694.16587.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-04 09:36:19 +01:00
Masami Hiramatsu 615d0ebbc7 kprobes: Disable booster when CONFIG_PREEMPT=y
Disable kprobe booster when CONFIG_PREEMPT=y at this time,
because it can't ensure that all kernel threads preempted on
kprobe's boosted slot run out from the slot even using
freeze_processes().

The booster on preemptive kernel will be resumed if
synchronize_tasks() or something like that is introduced.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20100202214904.4694.24330.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-04 09:36:18 +01:00
Ingo Molnar ae7f6711d6 Merge branch 'perf/urgent' into perf/core
Merge reason: We want to queue up a dependent patch. Also update to
              later -rc's.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-29 10:36:22 +01:00
Peter Zijlstra 75c9f3284a perf_events: Fix sample_period transfer on inherit
One problem with frequency driven counters is that we cannot
predict the rate at which they trigger, therefore we have to
start them at period=1, this causes a ramp up effect. However,
if we fail to propagate the stable state on fork each new child
will have to ramp up again. This can lead to significant
artifacts in sample data.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: eranian@google.com
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1264752266.4283.2121.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-29 09:15:26 +01:00
Xiao Guangrong 1e12a4a7a3 tracing/kprobe: Cleanup unused return value of tracing functions
The return values of the kprobe's tracing functions are meaningless,
lets remove these.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <4B60E9A3.2040505@cn.fujitsu.com>
[fweisbec@gmail: whitespace fixes, drop useless void returns in end
of functions]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-01-29 02:14:40 +01:00
Xiao Guangrong 430ad5a600 perf: Factorize trace events raw sample buffer operations
Introduce ftrace_perf_buf_prepare() and ftrace_perf_buf_submit() to
gather the common code that operates on raw events sampling buffer.
This cleans up redundant code between regular trace events, syscall
events and kprobe events.

Changelog v1->v2:
- Rename function name as per Masami and Frederic's suggestion
- Add __kprobes for ftrace_perf_buf_prepare() and make
  ftrace_perf_buf_submit() inline as per Masami's suggestion
- Export ftrace_perf_buf_prepare since modules will use it

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <4B60E92D.9000808@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-01-29 02:02:57 +01:00
Mahesh Salgaonkar b23ff0e933 hw_breakpoints: Release the bp slot if arch_validate_hwbkpt_settings() fails.
On a given architecture, when hardware breakpoint registration fails
due to un-supported access type (read/write/execute), we lose the bp
slot since register_perf_hw_breakpoint() does not release the bp slot
on failure.
Hence, any subsequent hardware breakpoint registration starts failing
with 'no space left on device' error.

This patch introduces error handling in register_perf_hw_breakpoint()
function and releases bp slot on error.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Maneesh Soni <maneesh@in.ibm.com>
LKML-Reference: <20100121125516.GA32521@in.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-01-28 14:15:51 +01:00
Peter Zijlstra abd5071394 perf: Reimplement frequency driven sampling
There was a bug in the old period code that caused intel_pmu_enable_all()
or native_write_msr_safe() to show up quite high in the profiles.

In staring at that code it made my head hurt, so I rewrote it in a
hopefully simpler fashion. Its now fully symetric between tick and
overflow driven adjustments and uses less data to boot.

The only complication is that it basically wants to do a u128 division.
The code approximates that in a rather simple truncate until it fits
fashion, taking care to balance the terms while truncating.

This version does not generate that sampling artefact.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-27 08:39:33 +01:00
Linus Torvalds f6760aa024 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  clockevent: Don't remove broadcast device when cpu is dead
2010-01-24 10:38:07 -08:00
Linus Torvalds b8be634e01 Merge git://git.infradead.org/~dwmw2/mtd-2.6.33
* git://git.infradead.org/~dwmw2/mtd-2.6.33:
  mtd: tests: fix read, speed and stress tests on NOR flash
  mtd: Really add ARM pismo support
  kmsg_dump: Dump on crash_kexec as well
2010-01-24 10:31:34 -08:00
Linus Torvalds e80b135985 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: x86: Add support for the ANY bit
  perf: Change the is_software_event() definition
  perf: Honour event state for aux stream data
  perf: Fix perf_event_do_pending() fallback callsite
  perf kmem: Print usage help for unknown commands
  perf kmem: Increase "Hit" column length
  hw-breakpoints, perf: Fix broken mmiotrace due to dr6 by reference change
  perf timechart: Use tid not pid for COMM change
2010-01-21 08:50:04 -08:00
Peter Zijlstra 22e190851f perf: Honour event state for aux stream data
Anton reported that perf record kept receiving events even after calling
ioctl(PERF_EVENT_IOC_DISABLE). It turns out that FORK,COMM and MMAP
events didn't respect the disabled state and kept flowing in.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Anton Blanchard <anton@samba.org>
LKML-Reference: <1263459187.4244.265.camel@laptop>
CC: stable@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:40 +01:00
Peter Zijlstra fe432200ab perf: Fix perf_event_do_pending() fallback callsite
Paul questioned the context in which we should call
perf_event_do_pending(). After looking at that I found that it should be
called from IRQ context these days, however the fallback call-site is
placed in softirq context. Ammend this by placing the callback in the IRQ
timer path.

Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1263374859.4244.192.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:39 +01:00
Yong Zhang 6d558c3ac9 sched: Reassign prev and switch_count when reacquire_kernel_lock() fail
Assume A->B schedule is processing, if B have acquired BKL before and it
need reschedule this time. Then on B's context, it will go to
need_resched_nonpreemptible for reschedule. But at this time, prev and
switch_count are related to A. It's wrong and will lead to incorrect
scheduler statistics.

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <2674af741001102238w7b0ddcadref00d345e2181d11@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:39:04 +01:00
Mike Galbraith 50b926e439 sched: Fix vmark regression on big machines
SD_PREFER_SIBLING is set at the CPU domain level if power saving isn't
enabled, leading to many cache misses on large machines as we traverse
looking for an idle shared cache to wake to.  Change the enabler of
select_idle_sibling() to SD_SHARE_PKG_RESOURCES, and enable same at the
sibling domain level.

Reported-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1262612696.15495.15.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:39:03 +01:00
Xiaotian Feng ea9d8e3f45 clockevent: Don't remove broadcast device when cpu is dead
Marc reported that the BUG_ON in clockevents_notify() triggers on his
system. This happens because the kernel tries to remove an active
clock event device (used for broadcasting) from the device list.

The handling of devices which can be used as per cpu device and as a
global broadcast device is suboptimal.

The simplest solution for now (and for stable) is to check whether the
device is used as global broadcast device, but this needs to be
revisited.

[ tglx: restored the cpuweight check and massaged the changelog ]

Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
Tested-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
LKML-Reference: <1262834564-13033-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
2010-01-18 14:44:50 +01:00
Ingo Molnar f426a7e029 Merge branch 'perf/scheduling' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core 2010-01-18 08:56:41 +01:00
Frederic Weisbecker 329c0e012b perf: Better order flexible and pinned scheduling
When a task gets scheduled in. We don't touch the cpu bound events
so the priority order becomes:

	cpu pinned, cpu flexible, task pinned, task flexible.

So schedule out cpu flexibles when a new task context gets in
and correctly order the groups to schedule in:

	task pinned, cpu flexible, task flexible.

Cpu pinned groups don't need to be touched at this time.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-17 13:11:05 +01:00
Frederic Weisbecker 7defb0f879 perf: Don't schedule out/in pinned events on task tick
We don't need to schedule in/out pinned events on task tick,
now that pinned and flexible groups can be scheduled separately.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-17 13:09:51 +01:00
Frederic Weisbecker 5b0311e1f2 perf: Allow pinned and flexible groups to be scheduled separately
Tune the scheduling helpers so that we can choose to schedule either
pinned and/or flexible groups from a context.

And while at it, refactor a bit the naming of these helpers to make
these more consistent and flexible.

There is no (intended) change in scheduling behaviour in this
patch.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-17 13:08:57 +01:00
Frederic Weisbecker 42cce92f4d perf: Make __perf_event_sched_out static
__perf_event_sched_out doesn't need to be globally available, make
it static.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-17 13:08:01 +01:00
Masami Hiramatsu 231e36f4d2 tracing/kprobe: Update kprobe tracing self test for new syntax
Update kprobe tracing self test for new syntax (it supports
deleting individual probes, and drops $argN support)
and behavior change (new probes are disabled in default).

This selftest includes the following checks:

 - Adding function-entry probe and return probe with arguments.
 - Enabling these probes.
 - Deleting it individually.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100114051211.7814.29436.stgit@localhost6.localdomain6>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-17 08:15:35 +01:00
Linus Torvalds 2a8249daf6 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futexes: Remove rw parameter from get_futex_key()
2010-01-16 12:31:30 -08:00
Linus Torvalds 6ccc347b69 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing/filters: Add comment for match callbacks
  tracing/filters: Fix MATCH_FULL filter matching for PTR_STRING
  tracing/filters: Fix MATCH_MIDDLE_ONLY filter matching
  lib: Introduce strnstr()
  tracing/filters: Fix MATCH_END_ONLY filter matching
  tracing/filters: Fix MATCH_FRONT_ONLY filter matching
  ftrace: Fix MATCH_END_ONLY function filter
  tracing/x86: Derive arch from bits argument in recordmcount.pl
  ring-buffer: Add rb_list_head() wrapper around new reader page next field
  ring-buffer: Wrap a list.next reference with rb_list_head()
2010-01-16 12:27:25 -08:00
David John af2422c42c smp_call_function_any(): pass the node value to cpumask_of_node()
The change in acpi_cpufreq to use smp_call_function_any causes a warning
when it is called since the function erroneously passes the cpu id to
cpumask_of_node rather than the node that the cpu is on.  Fix this.

cpumask_of_node(3): node > nr_node_ids(1)
Pid: 1, comm: swapper Not tainted 2.6.33-rc3-00097-g2c1f189 #223
Call Trace:
 [<ffffffff81028bb3>] cpumask_of_node+0x23/0x58
 [<ffffffff81061f51>] smp_call_function_any+0x65/0xfa
 [<ffffffff810160d1>] ? do_drv_read+0x0/0x2f
 [<ffffffff81015fba>] get_cur_val+0xb0/0x102
 [<ffffffff81016080>] get_cur_freq_on_cpu+0x74/0xc5
 [<ffffffff810168a7>] acpi_cpufreq_cpu_init+0x417/0x515
 [<ffffffff81562ce9>] ? __down_write+0xb/0xd
 [<ffffffff8148055e>] cpufreq_add_dev+0x278/0x922

Signed-off-by: David John <davidjon@xenontk.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-16 12:15:39 -08:00
Andi Kleen 5dab600e6a kfifo: document everywhere that size has to be power of two
On my first try using them I missed that the fifos need to be power of
two, resulting in a runtime bug.  Document that requirement everywhere
(and fix one grammar bug)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Andy Walls <awalls@radix.net>
Cc: Vikram Dhillon <dhillonv10@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-16 12:15:38 -08:00
Andi Kleen a5b9e2c106 kfifo: add kfifo_out_peek
In some upcoming code it's useful to peek into a FIFO without permanentely
removing data.  This patch implements a new kfifo_out_peek() to do this.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Andy Walls <awalls@radix.net>
Cc: Vikram Dhillon <dhillonv10@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-16 12:15:38 -08:00
Andi Kleen 64ce1037c5 kfifo: sanitize *_user error handling
Right now for kfifo_*_user it's not easily possible to distingush between
a user copy failing and the FIFO not containing enough data.  The problem
is that both conditions are multiplexed into the same return code.

Avoid this by moving the "copy length" into a separate output parameter
and only return 0/-EFAULT in the main return value.

I didn't fully adapt the weird "record" variants, those seem
to be unused anyways and were rather messy (should they be just removed?)

I would appreciate some double checking if I did all the conversions
correctly.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Andy Walls <awalls@radix.net>
Cc: Vikram Dhillon <dhillonv10@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-16 12:15:38 -08:00
Andi Kleen 8ecc295153 kfifo: use void * pointers for user buffers
The pointers to user buffers are currently unsigned char *, which requires
a lot of casting in the caller for any non-char typed buffers.  Use void *
instead.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Andy Walls <awalls@radix.net>
Cc: Vikram Dhillon <dhillonv10@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-16 12:15:38 -08:00
Frederic Weisbecker d6f962b57b perf: Export software-only event group characteristic as a flag
Before scheduling an event group, we first check if a group can go
on. We first check if the group is made of software only events
first, in which case it is enough to know if the group can be
scheduled in.

For that purpose, we iterate through the whole group, which is
wasteful as we could do this check when we add/delete an event to
a group.

So we create a group_flags field in perf event that can host
characteristics from a group of events, starting with a first
PERF_GROUP_SOFTWARE flag that reduces the check on the fast path.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-16 12:30:40 +01:00
Frederic Weisbecker e286417378 perf: Round robin flexible groups of events using list_rotate_left()
This is more proper that doing it through a list_for_each_entry()
that breaks after the first entry.

v2: Don't rotate pinned groups as its not needed to time share
them.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-16 12:30:28 +01:00
Frederic Weisbecker 889ff01506 perf/core: Split context's event group list into pinned and non-pinned lists
Split-up struct perf_event_context::group_list into pinned_groups
and flexible_groups (non-pinned).

This first appears to be useless as it duplicates various loops around
the group list handlings.

But it scales better in the fast-path in perf_sched_in(). We don't
anymore iterate twice through the entire list to separate pinned and
non-pinned scheduling. Instead we interate through two distinct lists.

The another desired effect is that it makes easier to define distinct
scheduling rules on both.

Changes in v2:
- Respectively rename pinned_grp_list and
  volatile_grp_list into pinned_groups and flexible_groups as per
  Ingo suggestion.
- Various cleanups

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
2010-01-16 12:27:42 +01:00
Li Zefan d1303dd1d6 tracing/filters: Add comment for match callbacks
We should be clear on 2 things:

- the length parameter of a match callback includes
  tailing '\0'.

- the string to be searched might not be NULL-terminated.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4B4E8770.7000608@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-14 22:38:14 -05:00
Li Zefan 16da27a8bc tracing/filters: Fix MATCH_FULL filter matching for PTR_STRING
MATCH_FULL matching for PTR_STRING is not working correctly:

  # echo 'func == vt' > events/bkl/lock_kernel/filter
  # echo 1 > events/bkl/lock_kernel/enable
  ...
  # cat trace
   Xorg-1484  [000]  1973.392586: lock_kernel: ... func=vt_ioctl()
    gpm-1402  [001]  1974.027740: lock_kernel: ... func=vt_ioctl()

We should pass to regex.match(..., len) the length (including '\0')
of the source string instead of the length of the pattern string.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4B4E8763.5070707@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-14 22:38:12 -05:00
Li Zefan b2af211f28 tracing/filters: Fix MATCH_MIDDLE_ONLY filter matching
The @str might not be NULL-terminated if it's of type
DYN_STRING or STATIC_STRING, so we should use strnstr()
instead of strstr().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4B4E8753.2000102@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-14 22:38:11 -05:00
Li Zefan a3291c14ec tracing/filters: Fix MATCH_END_ONLY filter matching
For '*foo' pattern, we should allow any string ending with
'foo', but event filtering incorrectly disallows strings
like bar_foo_foo:

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4B4E8735.6070604@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-14 22:38:07 -05:00
Li Zefan 285caad415 tracing/filters: Fix MATCH_FRONT_ONLY filter matching
MATCH_FRONT_ONLY actually is a full matching:

  # ./perf record -R -f -a -e lock:lock_acquire \
	--filter 'name ~rcu_*' sleep 1
  # ./perf trace
  (no output)

We should pass the length of the pattern string to strncmp().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4B4E8721.5090301@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-14 22:38:05 -05:00
Li Zefan 751e9983ee ftrace: Fix MATCH_END_ONLY function filter
For '*foo' pattern, we should allow any string ending with
'foo', but ftrace filter incorrectly disallows strings
like bar_foo_foo:

  # echo '*io' > set_ftrace_filter
  # cat set_ftrace_filter | grep 'req_bio_endio'
  # cat available_filter_functions | grep 'req_bio_endio'
  req_bio_endio

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4B4E870E.6060607@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-01-14 22:38:03 -05:00
Jamie Iles 8381f65d09 sched/perf: Make sure irqs are disabled for perf_event_task_sched_in()
perf_event_task_sched_in() expects interrupts to be disabled,
but on architectures with __ARCH_WANT_INTERRUPTS_ON_CTXSW
defined, this isn't true. If this is defined, disable irqs
around the call in finish_task_switch().

Signed-off-by: Jamie Iles <jamie.iles@picochip.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
LKML-Reference: <1262964453-27370-1-git-send-email-jamie.iles@picochip.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13 10:43:08 +01:00