OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
David S. Miller	3494c16676	[TICK] tick-common: Fix one-shot handling in tick_handle_periodic(). When clockevents_program_event() is given an expire time in the past, it does not update dev->next_event, so this looping code would loop forever once the first in-the-past expiration time was used. Keep advancing "next" locally to fix this bug. Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:14:15 -08:00
David S. Miller	9e203bcc10	[TIME] tick-sched: Add missing asm/irq_regs.h include. Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:13:49 -08:00
Eric W. Biederman	2a786b452e	[PATCH] genirq: Mask irqs when migrating them. move_native_irqs tries to do the right thing when migrating irqs by disabling them. However disabling them is a software logical thing, not a hardware thing. This has always been a little flaky and after Ingo's latest round of changes it is guaranteed to not mask the apic. So this patch fixes move_native_irq to directly call the mask and unmask chip methods to guarantee that we mask the irq when we are migrating it. We must do this as it is required by all code that call into the path. Since we don't know the masked status when IRQ_DISABLED is set so we will not be able to restore it. The patch makes the code just give up and trying again the next time this routing is called. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-26 10:34:08 -08:00
Greg Kroah-Hartman	dfff0a0671	Revert "Driver core: let request_module() send a /sys/modules/kmod/-uevent" This reverts commit `c353c3fb07`. It turns out that we end up with a loop trying to load the unix module and calling netfilter to do that. Will redo the patch later to not have this loop. Acked-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-23 14:54:57 -08:00
Adrian Bunk	4541ac94d0	make kernel/kmod.c:kmod_mk static This patch makes the needlessly global struct kmod_mk static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-23 14:52:09 -08:00
Johannes Berg	9c372d06ce	power management: no valid states w/o pm_ops Change /sys/power/state to not advertise any valid states (except for disk if SOFTWARE_SUSPEND is enabled) when no pm_ops have been set so userspace can easily discover what states should be available. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Macheck <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-23 14:52:09 -08:00
Mike Galbraith	63ce18cfe6	driver core: refcounting fix Fix a reference counting bug exposed by commit `725522b545`. If driver.mod_name exists, we take a reference in module_add_driver(), and never release it. Undo that reference in module_remove_driver(). Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-23 14:52:09 -08:00
Jarek Poplawski	60e114d113	[PATCH] lockdep: debug_locks check after check_chain_key In __lock_acquire check_chain_key can turn off debug_locks, so check is needed to assure proper return code. Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-20 17:10:14 -08:00
Srinivasa Ds	346fd59bab	[PATCH] kprobes: list all active probes in the system This patch lists all active probes in the system by scanning through kprobe_table[]. It takes care of aggregate handlers and prints the type of the probe. Letter "k" for kprobes, "j" for jprobes, "r" for kretprobes. It also lists address of the instruction,its symbolic name(function name + offset) and the module name. One can access this file through /sys/kernel/debug/kprobes/list. Output looks like this ===================== llm40:~/a # cat /sys/kernel/debug/kprobes/list c0169ae3 r sys_read+0x0 c0169ae3 k sys_read+0x0 c01694c8 k vfs_write+0x0 c0167d20 r sys_open+0x0 f8e658a6 k reiserfs_delete_inode+0x0 reiserfs c0120f4a k do_fork+0x0 c0120f4a j do_fork+0x0 c0169b4a r sys_write+0x0 c0169b4a k sys_write+0x0 c0169622 r vfs_read+0x0 ================================= [akpm@linux-foundation.org: cleanup] [ananth@in.ibm.com: sparc build fix] Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-20 17:10:14 -08:00
Thomas Gleixner	bc5393a6c9	[PATCH] NOHZ: Produce debug output instead of a BUG() The BUG_ON() in tick_nohz_stop_sched_tick() triggers on some boxen. Remove the BUG_ON and print information about the pending softirq to allow better debugging of the problem. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-19 14:18:43 -08:00
Ingo Molnar	6ba9b346e1	[PATCH] NOHZ: Fix RCU handling When a CPU is needed for RCU the tick has to continue even when it was stopped before. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-19 14:18:43 -08:00
Linus Torvalds	cb4aaf46c0	Merge branch 'audit.b37' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b37' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] AUDIT_FD_PAIR [PATCH] audit config lockdown [PATCH] minor update to rule add/delete messages (ver 2)	2007-02-19 13:29:54 -08:00
Linus Torvalds	874ff01bd9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits) Documentation/kernel-docs.txt update. arch/cris: typo in KERN_INFO Storage class should be before const qualifier kernel/printk.c: comment fix update I/O sched Kconfig help texts - CFQ is now default, not AS. Remove duplicate listing of Cris arch from README kbuild: more doc. cleanups doc: make doc. for maxcpus= more visible drivers/net/eexpress.c: remove duplicate comment add a help text for BLK_DEV_GENERIC correct a dead URL in the IP_MULTICAST help text fix the BAYCOM_SER_HDX help text fix SCSI_SCAN_ASYNC help text trivial documentation patch for platform.txt Fix typos concerning hierarchy Fix comment typo "spin_lock_irqrestore". Fix misspellings of "agressive". drivers/scsi/a100u2w.c: trivial typo patch Correct trivial typo in log2.h. Remove useless FIND_FIRST_BIT() macro from cardbus.c. ...	2007-02-19 13:29:02 -08:00
Al Viro	db3495099d	[PATCH] AUDIT_FD_PAIR Provide an audit record of the descriptor pair returned by pipe() and socketpair(). Rewritten from the original posted to linux-audit by John D. Ramsdell <ramsdell@mitre.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-02-17 21:30:15 -05:00
Steve Grubb	6a01b07fae	[PATCH] audit config lockdown The following patch adds a new mode to the audit system. It uses the audit_enabled config option to introduce the idea of audit enabled, but configuration is immutable. Any attempt to change the configuration while in this mode is audited. To change the audit rules, you'd need to reboot the machine. To use this option, you'd need a modified version of auditctl and use "-e 2". This is intended to go at the end of the audit.rules file for people that want an immutable configuration. This patch also adds "res=" to a number of configuration commands that did not have it before. Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-02-17 21:30:12 -05:00
Steve Grubb	a17b4ad778	[PATCH] minor update to rule add/delete messages (ver 2) I was looking at parsing some of these messages and found that I wanted what it was doing next to an op= for the parser to key on. Also missing was the list number and results. Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-02-17 21:30:09 -05:00
Patrick Pletscher	0bbfb7c2e4	kernel/printk.c: comment fix Signed-off-by: Patrick Pletscher <pat@pletscher.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-02-17 20:10:16 +01:00
Matthew Wilcox	c3de4b3815	Revert "[PATCH] make kernel/signal.c:kill_proc_info() static" This reverts commit `d3228a887c`. DeBunk this code. We need it for compat_sys_rt_sigqueueinfo. Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>	2007-02-17 01:20:07 -05:00
Randy Dunlap	ef665c1a06	sysfs: fix build errors: uevent with CONFIG_SYSFS=n Fix source files to build with CONFIG_SYSFS=n. module_subsys is not available. SYSFS=n, MODULES=y: T:y SYSFS=n, MODULES=n: T:y SYSFS=y, MODULES=y: T:y SYSFS=y, MODULES=n: T:y Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-16 15:19:18 -08:00
Mariusz Kozlowski	b92be9f1ec	Driver: remove redundant kobject_unregister checks Here is a patch that removes all redundant kobject_unregister argument checks. Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-16 15:19:17 -08:00
Kay Sievers	c353c3fb07	Driver core: let request_module() send a /sys/modules/kmod/-uevent On recent systems, calls to /sbin/modprobe are handled by udev depending on the kind of device the kernel has discovered. This patch creates an uevent for the kernels internal request_module(), to let udev take control over the request, instead of forking the binary directly by the kernel. The direct execution of /sbin/modprobe can be disabled by setting: /sys/module/kmod/mod_request_helper (/proc/sys/kernel/modprobe) to an empty string, the same way /proc/sys/kernel/hotplug is disabled on an udev system. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-16 15:19:15 -08:00
Jan Beulich	5575ddf75c	[PATCH] small irq management simplification Use mask_ack_irq() where possible. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:14:00 -08:00
Randy Dunlap	472900b8b0	[PATCH] IRQ kernel-doc fixes Fix kernel-doc warnings in IRQ management. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:14:00 -08:00
Ingo Molnar	76d2160147	[PATCH] genirq: do not mask interrupts by default Never mask interrupts immediately upon request. Disabling interrupts in high-performance codepaths is rare, and on the other hand this change could recover lost edges (or even other types of lost interrupts) by conservatively only masking interrupts after they happen. (NOTE: with this change the highlevel irq-disable code still soft-disables this IRQ line - and if such an interrupt happens then the IRQ flow handler keeps the IRQ masked.) Mark i8529A controllers as 'never loses an edge'. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:14:00 -08:00
Paul E. McKenney	1f2ea0837d	[PATCH] posix timers: RCU optimization for clock_gettime() Use RCU to avoid the need to acquire tasklist_lock in the single-threaded case of clock_gettime(). It still acquires tasklist_lock when for a (potentially multithreaded) process. This change allows realtime applications to frequently monitor CPU consumption of individual tasks, as requested (and now deployed) by some off-list users. This has been in Ingo Molnar's -rt patchset since late 2005 with no problems reported, and tests successfully on 2.6.20-rc6, so I believe that it is long-since ready for mainline adoption. [paulmck@linux.vnet.ibm.com: fix exit()/posix_cpu_clock_get() race spotted by Oleg] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:14:00 -08:00
john stultz	c37e7bb5d2	[PATCH] time: x86_64: split x86_64/kernel/time.c up In preparation for the x86_64 generic time conversion, this patch splits out TSC and HPET related code from arch/x86_64/kernel/time.c into respective hpet.c and tsc.c files. [akpm@osdl.org: fix printk timestamps] [akpm@osdl.org: cleanup] Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@muc.de> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:14:00 -08:00
john stultz	acc9a9dcdd	[PATCH] generic: vsyscall-gtod support for GENERIC_TIME Provides generic infrastructure for vsyscall-gtod. [akpm@osdl.org: cleanup] Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@muc.de> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:14:00 -08:00
Ingo Molnar	289f480af8	[PATCH] Add debugging feature /proc/timer_list add /proc/timer_list, which prints all currently pending (high-res) timers, all clock-event sources and their parameters in a human-readable form. Sample output: Timer List Version: v0.1 HRTIMER_MAX_CLOCK_BASES: 2 now at 4246046273872 nsecs cpu: 0 clock 0: .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1273998312645738432 nsecs active timers: clock 1: .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: <f5a90ec8>, hrtimer_sched_tick, hrtimer_stop_sched_tick, swapper/0 # expires at 4246432689566 nsecs [in 386415694 nsecs] #1: <f5a90ec8>, hrtimer_wakeup, do_nanosleep, pcscd/2050 # expires at 4247018194689 nsecs [in 971920817 nsecs] #2: <f5a90ec8>, hrtimer_wakeup, do_nanosleep, irqbalance/1909 # expires at 4247351358392 nsecs [in 1305084520 nsecs] #3: <f5a90ec8>, hrtimer_wakeup, do_nanosleep, crond/2157 # expires at 4249097614968 nsecs [in 3051341096 nsecs] #4: <f5a90ec8>, it_real_fn, do_setitimer, syslogd/1888 # expires at 4251329900926 nsecs [in 5283627054 nsecs] .expires_next : 4246432689566 nsecs .hres_active : 1 .check_clocks : 0 .nr_events : 31306 .idle_tick : 4246020791890 nsecs .tick_stopped : 1 .idle_jiffies : 986504 .idle_calls : 40700 .idle_sleeps : 36014 .idle_entrytime : 4246019418883 nsecs .idle_sleeptime : 4178181972709 nsecs cpu: 1 clock 0: .index: 0 .resolution: 1 nsecs .get_time: ktime_get_real .offset: 1273998312645738432 nsecs active timers: clock 1: .index: 1 .resolution: 1 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: <f5a90ec8>, hrtimer_sched_tick, hrtimer_restart_sched_tick, swapper/0 # expires at 4246050084568 nsecs [in 3810696 nsecs] #1: <f5a90ec8>, hrtimer_wakeup, do_nanosleep, atd/2227 # expires at 4261010635003 nsecs [in 14964361131 nsecs] #2: <f5a90ec8>, hrtimer_wakeup, do_nanosleep, smartd/2332 # expires at 5469485798970 nsecs [in 1223439525098 nsecs] .expires_next : 4246050084568 nsecs .hres_active : 1 .check_clocks : 0 .nr_events : 24043 .idle_tick : 4246046084568 nsecs .tick_stopped : 0 .idle_jiffies : 986510 .idle_calls : 26360 .idle_sleeps : 22551 .idle_entrytime : 4246043874339 nsecs .idle_sleeptime : 4170763761184 nsecs tick_broadcast_mask: 00000003 event_broadcast_mask: 00000001 CPU#0's local event device: Clock Event Device: lapic capabilities: 0000000e max_delta_ns: 807385544 min_delta_ns: 1443 mult: 44624025 shift: 32 set_next_event: lapic_next_event set_mode: lapic_timer_setup event_handler: hrtimer_interrupt .installed: 1 .expires: 4246432689566 nsecs CPU#1's local event device: Clock Event Device: lapic capabilities: 0000000e max_delta_ns: 807385544 min_delta_ns: 1443 mult: 44624025 shift: 32 set_next_event: lapic_next_event set_mode: lapic_timer_setup event_handler: hrtimer_interrupt .installed: 1 .expires: 4246050084568 nsecs Clock Event Device: hpet capabilities: 00000007 max_delta_ns: 2147483647 min_delta_ns: 3352 mult: 61496110 shift: 32 set_next_event: hpet_next_event set_mode: hpet_set_mode event_handler: handle_nextevt_broadcast Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:59 -08:00
Ingo Molnar	82f67cd9fc	[PATCH] Add debugging feature /proc/timer_stat Add /proc/timer_stats support: debugging feature to profile timer expiration. Both the starting site, process/PID and the expiration function is captured. This allows the quick identification of timer event sources in a system. Sample output: # echo 1 > /proc/timer_stats # cat /proc/timer_stats Timer Stats Version: v0.1 Sample period: 4.010 s 24, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick) 11, 0 swapper sk_reset_timer (tcp_delack_timer) 6, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick) 2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) 17, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick) 2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) 4, 2050 pcscd do_nanosleep (hrtimer_wakeup) 5, 4179 sshd sk_reset_timer (tcp_write_timer) 4, 2248 yum-updatesd schedule_timeout (process_timeout) 18, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick) 3, 0 swapper sk_reset_timer (tcp_delack_timer) 1, 1 swapper neigh_table_init_no_netlink (neigh_periodic_timer) 2, 1 swapper e1000_up (e1000_watchdog) 1, 1 init schedule_timeout (process_timeout) 100 total events, 25.24 events/sec [ cleanups and hrtimers support from Thomas Gleixner <tglx@linutronix.de> ] [bunk@stusta.de: nr_entries can become static] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:59 -08:00
Thomas Gleixner	8bfd9a7a22	[PATCH] hrtimers: prevent possible itimer DoS Fix potential setitimer DoS with high-res timers by pushing itimer rearm processing to process context. [Fixes from: Ingo Molnar <mingo@elte.hu>] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:59 -08:00
Thomas Gleixner	54cdfdb47f	[PATCH] hrtimers: add high resolution timer support Implement high resolution timers on top of the hrtimers infrastructure and the clockevents / tick-management framework. This provides accurate timers for all hrtimer subsystem users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:59 -08:00
Thomas Gleixner	79bf2bb335	[PATCH] tick-management: dyntick / highres functionality With Ingo Molnar <mingo@elte.hu> Add functions to provide dynamic ticks and high resolution timers. The code which keeps track of jiffies and handles the long idle periods is shared between tick based and high resolution timer based dynticks. The dyntick functionality can be disabled on the kernel commandline. Provide also the infrastructure to support high resolution timers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:59 -08:00
Thomas Gleixner	f8381cba04	[PATCH] tick-management: broadcast functionality With Ingo Molnar <mingo@elte.hu> Add broadcast functionality, so per cpu clock event devices can be registered as dummy devices or switched from/to broadcast on demand. The broadcast function distributes the events via the broadcast function of the clock event device. This is primarily designed to replace the switch apic timer to / from IPI in power states, where the apic stops. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:59 -08:00
Thomas Gleixner	906568c9c6	[PATCH] tick-management: core functionality With Ingo Molnar <mingo@elte.hu> The tick-management code is the first user of the clockevents layer. It takes clock event devices from the clock events core and uses them to provide the periodic tick. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:59 -08:00
Thomas Gleixner	d316c57ff6	[PATCH] clockevents: add core functionality Architectures register their clock event devices, in the clock events core. Users of the clockevents core can get clock event devices for their use. The clockevents core code provides notification mechanisms for various clock related management events. This allows to control the clock event devices without the architectures having to worry about the details of function assignment. This is also a preliminary for high resolution timers and dynamic ticks to allow the core code to control the clock functionality without intrusive changes to the architecture code. [Fixes-by: Ingo Molnar <mingo@elte.hu>] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:59 -08:00
Thomas Gleixner	5cfb6de7cd	[PATCH] hrtimers: clean up callback tracking Reintroduce ktimers feature "optimized away" by the ktimers review process: remove the curr_timer pointer from the cpu-base and use the hrtimer state. No functional changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:58 -08:00
Thomas Gleixner	303e967ff9	[PATCH] hrtimers; add state tracking Reintroduce ktimers feature "optimized away" by the ktimers review process: multiple hrtimer states to enable the running of hrtimers without holding the cpu-base-lock. (The "optimized" rbtree hack carried only 2 states worth of information and we need 4 for high resolution timers and dynamic ticks.) No functional changes. Build-fixes-from: Andrew Morton <akpm@osdl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:58 -08:00
Thomas Gleixner	3c8aa39d7c	[PATCH] hrtimers: cleanup locking Improve kernel/hrtimers.c locking: use a per-CPU base with a lock to control locking of all clocks belonging to a CPU. This simplifies code that needs to lock all clocks at once. This makes life easier for high-res timers and dyntick. No functional changes. [ optimization change from Andrew Morton <akpm@osdl.org> ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:58 -08:00
Thomas Gleixner	c9cb2e3d7c	[PATCH] hrtimers: namespace and enum cleanup - hrtimers did not use the hrtimer_restart enum and relied on the implict int representation. Fix the prototypes and the functions using the enums. - Use seperate name spaces for the enumerations - Convert hrtimer_restart macro to inline function - Add comments No functional changes. [akpm@osdl.org: fix input driver] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:58 -08:00
Thomas Gleixner	fd064b9b77	[PATCH] Extend next_timer_interrupt() to use a reference jiffie For CONFIG_NO_HZ we need to calculate the next timer wheel event based on a given jiffie value. Extend the existing code to allow the extra 'now' argument. Provide a compability function for the existing implementations to call the function with now == jiffies. (This also solves the racyness of the original code vs. jiffies changing during the iteration.) No functional changes to existing users of this infrastructure. [ remove WARN_ON() that triggered on s390, by Carsten Otte <cotte@de.ibm.com> ] [ made new helper static, Adrian Bunk <bunk@stusta.de> ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:58 -08:00
Thomas Gleixner	1cfd68496e	[PATCH] Fix cascade lookup of next_timer_interrupt When searching for the next pending timer in the timer wheel we need to take the cascade into account. The current code has several problems: 1. it looks into the previous cascade 2. it ignores a pending cascade 3. it ignores multiple cascades Change the cascade lookup, so it calculates the array index from the point of the next cascade and always look at the cascade buckets, when the cascade is pending, i.e. gets executed in the next timer softirq. When multiple cascades are pending, then lookup the next buckets too. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:58 -08:00
Ingo Molnar	dde4b2b5f4	[PATCH] uninline irq_enter() Uninline irq_enter(). [dynticks adds more stuff to it] No functional changes. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:58 -08:00
Thomas Gleixner	5d8b34fdcb	[PATCH] clocksource: Add verification (watchdog) helper The TSC needs to be verified against another clocksource. Instead of using hardwired assumptions of available hardware, provide a generic verification mechanism. The verification uses the best available clocksource and handles the usability for high resolution timers / dynticks of the clocksource which needs to be verified. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:57 -08:00
Thomas Gleixner	7e69f2b1ea	[PATCH] clocksource: Remove the update callback The clocksource code allows direct updates of the rating of a given clocksource now. Change TSC unstable tracking to use this interface and remove the update callback. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:57 -08:00
Thomas Gleixner	73b08d2aa4	[PATCH] clocksource: replace is_continuous by a flag field Using a flag filed allows to encode more than one information into a variable. Preparatory patch for the generic clocksource verification. [mingo@elte.hu: convert vmitime.c to the new clocksource flag] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:57 -08:00
Thomas Gleixner	92c7e00254	[PATCH] Simplify the registration of clocksources Enqueue clocksources in rating order to make selection of the clocksource easier. Also check the match with an user override at enqueue time. Preparatory patch for the generic clocksource verification. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:57 -08:00
John Stultz	411187fb05	[PATCH] GTOD: persistent clock support Persistent clock support: do proper timekeeping across suspend/resume. [bunk@stusta.de: cleanup] Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:57 -08:00
Ingo Molnar	41cf54455d	[PATCH] Fix multiple conversion bugs in msecs_to_jiffies Fix multiple conversion bugs in msecs_to_jiffies(). The main problem is that this condition: if (m > jiffies_to_msecs(MAX_JIFFY_OFFSET)) overflows if HZ is smaller than 1000! This change is user-visible: for HZ=250 SUS-compliant poll()-timeout value of -20 is mistakenly converted to 'immediate timeout'. (The new dyntick code also triggered this, that's how we noticed.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:56 -08:00
Ingo Molnar	8b9365d753	[PATCH] Uninline jiffies.h functions There are loads of fat functions hidden in jiffies.h. Uninline them. No code changes. [jeremy@goop.org: export fix] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:56 -08:00
john stultz	f4304ab215	[PATCH] HZ free ntp Distangle the NTP update from HZ. This is necessary for dynamic tick enabled kernels. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:56 -08:00
Thomas Gleixner	771ee3b04e	[PATCH] Add a function to handle interrupt affinity setting Provide funtions to: - check, whether an interrupt can set the affinity - pin the interrupt to a given cpu Necessary for the ability to setup clocksources more flexible (e.g. use the different HPET channels per CPU) [akpm@osdl.org: alpha build fix] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:56 -08:00
Thomas Gleixner	950f4427c2	[PATCH] Add irq flag to disable balancing for an interrupt Add a flag so we can prevent the irq balancing of an interrupt. Move the bits, so we have room for more :) Necessary for the ability to setup clocksources more flexible (e.g. use the different HPET channels per CPU) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-16 08:13:56 -08:00
Linus Torvalds	414f827c46	Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (94 commits) [PATCH] x86-64: Remove mk_pte_phys() [PATCH] i386: Fix broken CONFIG_COMPAT_VDSO on i386 [PATCH] i386: fix 32-bit ioctls on x64_32 [PATCH] x86: Unify pcspeaker platform device code between i386/x86-64 [PATCH] i386: Remove extern declaration from mm/discontig.c, put in header. [PATCH] i386: Rename cpu_gdt_descr and remove extern declaration from smpboot.c [PATCH] i386: Move mce_disabled to asm/mce.h [PATCH] i386: paravirt unhandled fallthrough [PATCH] x86_64: Wire up compat epoll_pwait [PATCH] x86: Don't require the vDSO for handling a.out signals [PATCH] i386: Fix Cyrix MediaGX detection [PATCH] i386: Fix warning in cpu initialization [PATCH] i386: Fix warning in microcode.c [PATCH] x86: Enable NMI watchdog for AMD Family 0x10 CPUs [PATCH] x86: Add new CPUID bits for AMD Family 10 CPUs in /proc/cpuinfo [PATCH] i386: Remove fastcall in paravirt.[ch] [PATCH] x86-64: Fix wrong gcc check in bitops.h [PATCH] x86-64: survive having no irq mapping for a vector [PATCH] i386: geode configuration fixes [PATCH] i386: add option to show more code in oops reports ...	2007-02-14 09:46:06 -08:00
Eric W. Biederman	d912b0cc1a	[PATCH] sysctl: add a parent entry to ctl_table and set the parent entry Add a parent entry into the ctl_table so you can walk the list of parents and find the entire path to a ctl_table entry. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:10:00 -08:00
Eric W. Biederman	77b14db502	[PATCH] sysctl: reimplement the sysctl proc support With this change the sysctl inodes can be cached and nothing needs to be done when removing a sysctl table. For a cost of 2K code we will save about 4K of static tables (when we remove de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or about half that on a 32bit arch. The speed feels about the same, even though we can now cache the sysctl dentries :( We get the core advantage that we don't need to have a 1 to 1 mapping between ctl table entries and proc files. Making it possible to have /proc/sys vary depending on the namespace you are in. The currently merged namespaces don't have an issue here but the network namespace under /proc/sys/net needs to have different directories depending on which network adapters are visible. By simply being a cache different directories being visible depending on who you are is trivial to implement. [akpm@osdl.org: fix uninitialised var] [akpm@osdl.org: fix ARM build] [bunk@stusta.de: make things static] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:10:00 -08:00
Eric W. Biederman	1ff007eb8e	[PATCH] sysctl: allow sysctl_perm to be called from outside of sysctl.c Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	805b5d5e06	[PATCH] sysctl: factor out sysctl_head_next from do_sysctl The current logic to walk through the list of sysctl table headers is slightly painful and implement in a way it cannot be used by code outside sysctl.c I am in the process of implementing a version of the sysctl proc support that instead of using the proc generic non-caching monster, just uses the existing sysctl data structure as backing store for building the dcache entries and for doing directory reads. To use the existing data structures however I need a way to get at them. [akpm@osdl.org: warning fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	0b4d414714	[PATCH] sysctl: remove insert_at_head from register_sysctl The semantic effect of insert_at_head is that it would allow new registered sysctl entries to override existing sysctl entries of the same name. Which is pain for caching and the proc interface never implemented. I have done an audit and discovered that none of the current users of register_sysctl care as (excpet for directories) they do not register duplicate sysctl entries. So this patch simply removes the support for overriding existing entries in the sys_sysctl interface since no one uses it or cares and it makes future enhancments harder. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Neil Brown <neilb@suse.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jan Kara <jack@ucw.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	ae83681026	[PATCH] sysctl: remove support for directory strategy routines parse_table has support for calling a strategy routine when descending into a directory. To date no one has used this functionality and the /proc/sys interface has no analog to it. So no one is using this functionality kill it and make the binary sysctl code easier to follow. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	6703ddfcce	[PATCH] sysctl: remove support for CTL_ANY There are currently no users in the kernel for CTL_ANY and it only has effect on the binary interface which is practically unused. So this complicates sysctl lookups for no good reason so just remove it. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	2abc26fc6b	[PATCH] sysctl: create sys/fs/binfmt_misc as an ordinary sysctl entry binfmt_misc has a mount point in the middle of the sysctl and that mount point is created as a proc_generic directory. Doing it that way gets in the way of cleaning up the sysctl proc support as it continues the existence of a horrible hack. So instead simply create the directory as an ordinary sysctl directory. At least that removes the magic special case. [akpm@osdl.org: warning fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	a5494dcd8b	[PATCH] sysctl: move SYSV IPC sysctls to their own file This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded with special cases, and by keeping all of the ipc logic to together it makes the code a little more readable. [gcoady.lk@gmail.com: build fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Grant Coady <gcoady.lk@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	39732acd96	[PATCH] sysctl: move utsname sysctls to their own file This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded with special cases, and by keeping all of the utsname logic to together it makes the code a little more readable. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:58 -08:00
Eric W. Biederman	b04c3afb2b	[PATCH] sysctl: move init_irq_proc into init/main where it belongs Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:58 -08:00
Thomas Gleixner	38515e908b	[PATCH] Scheduled removal of SA_xxx interrupt flags fixups The obsolete SA_xxx interrupt flags have been used despite the scheduled removal. Fixup the remaining users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: Wim Van Sebroeck <wim@iguana.be> Cc: Roland Dreier <rolandd@cisco.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Cc: Dave Airlie <airlied@linux.ie> Cc: James Simmons <jsimmons@infradead.org> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:54 -08:00
Tim Schmielau	cd354f1ae7	[PATCH] remove many unneeded #includes of sched.h After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:54 -08:00
Andi Kleen	a98f0dd34d	[PATCH] x86-64: Allow to run a program when a machine check event is detected When a machine check event is detected (including a AMD RevF threshold overflow event) allow to run a "trigger" program. This allows user space to react to such events sooner. The trigger is configured using a new trigger entry in the machinecheck sysfs interface. It is currently shared between all CPUs. I also fixed the AMD threshold handler to run the machine check polling code immediately to actually log any events that might have caused the threshold interrupt. Also added some documentation for the mce sysfs interface. Signed-off-by: Andi Kleen <ak@suse.de>	2007-02-13 13:26:23 +01:00
Zachary Amsden	9226d125d9	[PATCH] i386: paravirt CPU hypercall batching mode The VMI ROM has a mode where hypercalls can be queued and batched. This turns out to be a significant win during context switch, but must be done at a specific point before side effects to CPU state are visible to subsequent instructions. This is similar to the MMU batching hooks already provided. The same hooks could be used by the Xen backend to implement a context switch multicall. To explain a bit more about lazy modes in the paravirt patches, basically, the idea is that only one of lazy CPU or MMU mode can be active at any given time. Lazy MMU mode is similar to this lazy CPU mode, and allows for batching of multiple PTE updates (say, inside a remap loop), but to avoid keeping some kind of state machine about when to flush cpu or mmu updates, we just allow one or the other to be active. Although there is no real reason a more comprehensive scheme could not be implemented, there is also no demonstrated need for this extra complexity. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org>	2007-02-13 13:26:21 +01:00
Eric Dumazet	5809f9d442	[PATCH] x86-64: get rid of ARCH_HAVE_XTIME_LOCK ARCH_HAVE_XTIME_LOCK is used by x86_64 arch . This arch needs to place a read only copy of xtime_lock into vsyscall page. This read only copy is named __xtime_lock, and xtime_lock is defined in arch/x86_64/kernel/vmlinux.lds.S as an alias. So the declaration of xtime_lock in kernel/timer.c was guarded by ARCH_HAVE_XTIME_LOCK define, defined to true on x86_64. We can get same result with _attribute__((weak)) in the declaration. linker should do the job. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org>	2007-02-13 13:26:21 +01:00
Arjan van de Ven	92e1d5be91	[PATCH] mark struct inode_operations const 2 Many struct inode_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
Arjan van de Ven	9a32144e9d	[PATCH] mark struct file_operations const 7 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
Nick Piggin	ff91691bcc	[PATCH] sched: avoid div in rebalance_tick Avoid expensive integer divide 3 times per CPU per tick. A userspace test of this loop went from 26ns, down to 19ns on a G5; and from 123ns down to 28ns on a P3. (Also avoid a variable bit shift, as suggested by Alan. The effect of this wasn't noticable on the CPUs I tested with). Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:37 -08:00
Eric W. Biederman	27b0b2f44a	[PATCH] pid: remove the now unused kill_pg kill_pg_info and __kill_pg_info Now that I have changed all of the in-tree users remove the old version of these functions. This should make it clear to any out of tree users that they should be using kill_pgrp kill_pgrp_info or __kill_pgrp_info instead. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:32 -08:00
Eric W. Biederman	41487c65bf	[PATCH] pid: replace do/while_each_task_pid with do/while_each_pid_task There isn't any real advantage to this change except that it allows the old functions to be removed. Which is easier on maintenance and puts the code in a more uniform style. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:32 -08:00
Eric W. Biederman	ab521dc0f8	[PATCH] tty: update the tty layer to work with struct pid Of kernel subsystems that work with pids the tty layer is probably the largest consumer. But it has the nice virtue that the assiation with a session only lasts until the session leader exits. Which means that no reference counting is required. So using struct pid winds up being a simple optimization to avoid hash table lookups. In the long term the use of pid_nr also ensures that when we have multiple pid spaces mixed everything will work correctly. Signed-off-by: Eric W. Biederman <eric@maxwell.lnxi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:32 -08:00
Eric W. Biederman	3e7cd6c413	[PATCH] pid: replace is_orphaned_pgrp with is_current_pgrp_orphaned Every call to is_orphaned_pgrp passed in process_group(current) which is racy with respect to another thread changing our process group. It didn't bite us because we were dealing with integers and the worse we would get would be a stale answer. In switching the checks to use struct pid to be a little more efficient and prepare the way for pid namespaces this race became apparent. So I simplified the calls to the more specialized is_current_pgrp_orphaned so I didn't have to worry about making logic changes to avoid the race. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:32 -08:00
Eric W. Biederman	0475ac0845	[PATCH] pid: use struct pid for talking about process groups in exitc Modify has_stopped_jobs and will_become_orphan_pgrp to use struct pid based process groups. This reduces the number of hash tables looks ups and paves the way for multiple pid spaces. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:32 -08:00
Eric W. Biederman	04a2e6a5cb	[PATCH] pid: make session_of_pgrp use struct pid instead of pid_t To properly implement a pid namespace I need to deal exclusively in terms of struct pid, because pid_t values become ambiguous. To this end session_of_pgrp is transformed to take and return a struct pid pointer. To avoid the need to worry about reference counting I now require my caller to hold the appropriate locks. Leaving callers repsonsible for increasing the reference count if they need access to the result outside of the locks. Since session_of_pgrp currently only has one caller and that caller simply uses only test the result for equality with another process group, the locking change means I don't actually have to acquire the tasklist_lock at all. tiocspgrp is also modified to take and release the lock. The logic there is a little more complicated but nothing I won't need when I convert pgrp of a tty to a struct pid pointer. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:31 -08:00
Eric W. Biederman	8d42db189c	[PATCH] signal: rewrite kill_something_info so it uses newer helpers The goal is to remove users of the old signal helper functions so they can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:31 -08:00
Ingo Molnar	944be0b224	[PATCH] close_files(): add scheduling point close_files() can sometimes take long enough to trigger the soft lockup detector. Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:30 -08:00
Alan Cox	3f05044715	[PATCH] kernel: shut up the IRQ mismatch messages The problem is various drivers legally validly and sensibly try to claim IRQs but the kernel insists on vomiting forth a giant irrelevant debugging spew when the types clash. Edit kernel/irq/manage.c go down to mismatch: in setup_irq() and ifdef out the if clause that checks for mismatches. It'll then just do the right thing and work sanely. For the current -mm kernel this will do the trick (and moves it into shared irq debugging as in debug mode the info spew is useful). I've had a variant of this in my private tree for some time as I got fed up on the mess on boxes where old legacy IRQs get reused. Signed-off-by: Alan Cox <alan@redhat.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:28 -08:00
David Woodhouse	a304e1b828	[PATCH] Debug shared irqs Drivers registering IRQ handlers with SA_SHIRQ really ought to be able to handle an interrupt happening before request_irq() returns. They also ought to be able to handle an interrupt happening during the start of their call to free_irq(). Let's test that hypothesis.... [bunk@stusta.de: Kconfig fixes] Signed-off-by: David Woodhouse <dwmw2@infradead.org> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:28 -08:00
Al Viro	5ea8176994	[PATCH] sort the devres mess out * Split the implementation-agnostic stuff in separate files. * Make sure that targets using non-default request_irq() pull kernel/irq/devres.o * Introduce new symbols (HAS_IOPORT and HAS_IOMEM) defaulting to positive; allow architectures to turn them off (we needed these symbols anyway for dependencies of quite a few drivers). * protect the ioport-related parts of lib/devres.o with CONFIG_HAS_IOPORT. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:07 -08:00
Alexey Dobriyan	4b98d11b40	[PATCH] ifdef ->rchar, ->wchar, ->syscr, ->syscw from task_struct They are fat: 4x8 bytes in task_struct. They are uncoditionally updated in every fork, read, write and sendfile. They are used only if you have some "extended acct fields feature". And please, please, please, read(2) knows about bytes, not characters, why it is called "rchar"? Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:07 -08:00
Oleg Nesterov	8d06087714	[PATCH] _proc_do_string(): fix short reads If you try to read things like /proc/sys/kernel/osrelease with single-byte reads, you get just one byte and then EOF. This is because _proc_do_string() assumes that the caller is read()ing into a buffer which is large enough to fit the whole string in a single hit. Fix. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:07 -08:00
Robert P. J. Day	501b9ebf43	[PATCH] Fix apparent typo CONFIG_LOCKDEP_DEBUG Replace the apparent typo CONFIG_LOCKDEP_DEBUG with the correct CONFIG_DEBUG_LOCKDEP. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:06 -08:00
Mathieu Desnoyers	1efc5da3cf	[PATCH] order of lockdep off/on in vprintk() should be changed The order of locking between lockdep_off/on() and local_irq_save/restore() in vprintk() should be changed. * In kernel/printk.c : vprintk() does : preempt_disable() local_irq_save() lockdep_off() spin_lock(&logbuf_lock) spin_unlock(&logbuf_lock) if(!down_trylock(&console_sem)) up(&console_sem) lockdep_on() local_irq_restore() preempt_enable() The goals here is to make sure we do not call printk() recursively from kernel/lockdep.c:__lock_acquire() (called from spin_* and down/up) nor from kernel/lockdep.c:trace_hardirqs_on/off() (called from local_irq_restore/save). It can then potentially call printk() through mark_held_locks/mark_lock. It correctly protects against the spin_lock call and the up/down call, but it does not protect against local_irq_restore. It could cause infinite recursive printk/trace_hardirqs_on() calls when printk() is called from the mark_lock() error handing path. We should change the locking so it becomes correct : preempt_disable() lockdep_off() local_irq_save() spin_lock(&logbuf_lock) spin_unlock(&logbuf_lock) if(!down_trylock(&console_sem)) up(&console_sem) local_irq_restore() lockdep_on() preempt_enable() Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:06 -08:00
Kirill Korotaev	e3e8a75d2a	[PATCH] Extract and use wake_up_klogd() Remove hack with printing space to wake up klogd. Use explicit wake_up_klogd(). See earlier discussion http://groups.google.com/group/fa.linux.kernel/browse_frm/thread/75f496668409f58d/1a8f28983a51e1ff?lnk=st&q=wake_up_klogd+group%3Afa.linux.kernel&rnum=2#1a8f28983a51e1ff Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:34 -08:00
Ingo Molnar	11f57cedcf	[PATCH] audit: fix audit_filter_user_rules() initialization bug gcc emits this warning: kernel/auditfilter.c: In function 'audit_filter_user': kernel/auditfilter.c:1611: warning: 'state' is used uninitialized in this function I tend to agree with gcc - there are a couple of plausible exit paths from audit_filter_user_rules() where it does not set 'state', keeping the variable uninitialized. For example if a filter rule has an AUDIT_POSSIBLE action. Initialize to 'wont audit'. Fix whitespace damage too. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:34 -08:00
Kyle McMartin	d4d23add3a	[PATCH] Common compat_sys_sysinfo I noticed that almost all architectures implemented exactly the same sys32_sysinfo... except parisc, where a bug was to be found in handling of the uptime. So let's remove a whole whack of code for fun and profit. Cribbed compat_sys_sysinfo from x86_64's implementation, since I figured it would be the best tested. This patch incorporates Arnd's suggestion of not using set_fs/get_fs, but instead extracting out the common code from sys_sysinfo. Cc: Christoph Hellwig <hch@infradead.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:32 -08:00
Robert P. J. Day	72fd4a35a8	[PATCH] Numerous fixes to kernel-doc info in source files. A variety of (mostly) innocuous fixes to the embedded kernel-doc content in source files, including: * make multi-line initial descriptions single line * denote some function names, constants and structs as such * change erroneous opening '/' to '/' in a few places reword some text for clarity Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:32 -08:00
Alexey Dobriyan	b653d081c1	[PATCH] proc: remove useless (and buggy) ->nlink settings Bug: pnx8550 code creates directory but resets ->nlink to 1. create_proc_entry() et al will correctly set ->nlink for you. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Corey Minyard <minyard@acm.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Greg KH <greg@kroah.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:32 -08:00
Andrew Morton	cb799b8988	[PATCH] sysctl warning fix kernel/sysctl.c:2816: warning: 'sysctl_ipc_data' defined but not used Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:31 -08:00
Helge Deller	3db5db4fcd	[PATCH] use cycle_t instead of u64 in struct time_interpolator The 32bit and 64bit PARISC Linux kernels suffers from the problem, that the gettimeofday() call sometimes returns non-monotonic times. The easiest way to fix this, is to drop the PARISC-specific implementation and switch over to the generic TIME_INTERPOLATION framework. But in order to make it even compile on 32bit PARISC, the patch below which touches the generic Linux code, is mandatory. More information and the full patch with the parisc-specific changes is included in this thread: http://lists.parisc-linux.org/pipermail/parisc-linux/2006-December/031003.html As far as I could see, this patch does not change anything for the existing architectures which use this framework (IA64 and SPARC64), since "cycles_t" is defined there as unsigned 64bit-integer anyway (which then makes this patch a no-change for them). Signed-off-by: Helge Deller <deller@gmx.de> Cc: <linux-arch@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:31 -08:00
Theodore Ts'o	34f5a39899	[PATCH] Add TAINT_USER and ability to set taint flags from userspace Allow taint flags to be set from userspace by writing to /proc/sys/kernel/tainted, and add a new taint flag, TAINT_USER, to be used when userspace has potentially done something dangerous that might compromise the kernel. This will allow support personnel to ask further questions about what may have caused the user taint flag to have been set. For example, they might examine the logs of the realtime JVM to see if the Java program has used the really silly, stupid, dangerous, and completely-non-portable direct access to physical memory feature which MUST be implemented according to the Real-Time Specification for Java (RTSJ). Sigh. What were those silly people at Sun thinking? [akpm@osdl.org: build fix] [bunk@stusta.de: cleanup] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:29 -08:00
Alexey Dobriyan	b035b6de24	[PATCH] Consolidate default sched_clock() Use attribute(weak). Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:28 -08:00
Mathieu Desnoyers	23c887522e	[PATCH] Relay: add CPU hotplug support Mathieu originally needed to add this for tracing Xen, but it's something that's needed for any application that can be tracing while cpus are added. unplug isn't supported by this patch. The thought was that at minumum a new buffer needs to be added when a cpu comes up, but it wasn't worth the effort to remove buffers on cpu down since they'd be freed soon anyway when the channel was closed. [zanussi@us.ibm.com: avoid lock_cpu_hotplug deadlock] Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:28 -08:00
Robert P. J. Day	c376222960	[PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc(). Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the corresponding "kmem_cache_zalloc()" call. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Roland McGrath <roland@redhat.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Acked-by: Joel Becker <Joel.Becker@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:27 -08:00
Jason Baron	068135e635	[PATCH] lockdep: add graph depth information to /proc/lockdep Generate locking graph information into /proc/lockdep, for lock hierarchy documentation and visualization purposes. sample output: c089fd5c OPS: 138 FD: 14 BD: 1 --..: &tty->termios_mutex -> [c07a3430] tty_ldisc_lock -> [c07a37f0] &port_lock_key -> [c07afdc0] &rq->rq_lock_key#2 The lock classes listed are all the first-hop lock dependencies that lockdep has seen so far. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:26 -08:00
Jarek Poplawski	381a229209	[PATCH] lockdep: more unlock-on-error fixes - returns after DEBUG_LOCKS_WARN_ON added in 3 places - debug_locks checking after lookup_chain_cache() added in __lock_acquire() - locking for testing and changing global variable max_lockdep_depth added in __lock_acquire() From: Ingo Molnar <mingo@elte.hu> My __acquire_lock() cleanup introduced a locking bug: on SMP systems we'd release a non-owned graph lock. Fix this by moving the graph unlock back, and by leaving the max_lockdep_depth variable update possibly racy. (we dont care, it's just statistics) Also add some minimal debugging code to graph_unlock()/graph_lock(), which caught this locking bug. Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:26 -08:00
Oleg Nesterov	0c12b51712	[PATCH] kill_pid_info: kill acquired_tasklist_lock Kill acquired_tasklist_lock, sig_needs_tasklist() is very cheap nowadays. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:26 -08:00
Alexey Dobriyan	3ee75ac3c0	[PATCH] sysctl_{,ms_}jiffies: fix oldlen semantics currently it's 1) if oldlenp == 0, don't writeback anything 2) if oldlenp >= table->maxlen, don't writeback more than table->maxlen bytes and rewrite oldlenp don't look at underlying type granularity 3) if 0 < oldlenp < table->maxlen, cough string sysctls don't writeback more than oldlenp bytes. OK, that's because sizeof(char) == 1 int sysctls writeback anything in (0, table->maxlen] range Though accept integers divisible by sizeof(int) for writing. sysctl_jiffies and sysctl_ms_jiffies don't writeback anything but sizeof(int), which violates 1) and 2). So, make sysctl_jiffies and sysctl_ms_jiffies accept a) oldlenp == 0, not doing writeback b) oldlenp >= sizeof(int), writing one integer. -EINVAL still returned for oldlenp == 1, 2, 3. Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:24 -08:00
Mathieu Desnoyers	dc29a3657b	[PATCH] kernel/time/clocksource.c needs struct task_struct on m68k kernel/time/clocksource.c needs struct task_struct on m68k. Because it uses spin_unlock_irq(), which, on m68k, uses hardirq_count(), which uses preempt_count(), which needs to dereference struct task_struct, we have to include sched.h. Because it would cause a loop inclusion, we cannot include sched.h in any other of asm-m68k/system.h, linux/thread_info.h, linux/hardirq.h, which leaves this ugly include in a C file as the only simple solution. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:20 -08:00
Rafael J. Wysocki	2b5b09b3b5	[PATCH] swsusp: Change pm_ops handling by userland interface Make the userland interface of swsusp call pm_ops->finish() after enable_nonboot_cpus() and before resume_device(), as indicated by the recent discussion on Linux-PM (cf. http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). This patch changes the SNAPSHOT_PMOPS ioctl so that its first function, PMOPS_PREPARE, only sets a switch turning the platform suspend mode on, and its last function, PMOPS_FINISH, only checks if the platform mode is enabled. This should allow the older userland tools to work with new kernels without any modifications. The changes here only affect the userland interface of swsusp. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:20 -08:00
Andrew Morton	d12c610e08	[PATCH] swsusp-change-code-ordering-in-userc-sanity The compiler will do that. And if it doesn't, we don't want to either ;) Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:19 -08:00
Rafael J. Wysocki	259130526c	[PATCH] swsusp: Change code ordering in user.c Change the ordering of code in kernel/power/user.c so that device_suspend() is called before disable_nonboot_cpus() and device_resume() is called after enable_nonboot_cpus(). This is needed to make the userland suspend call pm_ops->finish() after enable_nonboot_cpus() and before device_resume(), as indicated by the recent discussion on Linux-PM (cf. http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). The changes here only affect the userland interface of swsusp. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:19 -08:00
Rafael J. Wysocki	ed746e3b18	[PATCH] swsusp: Change code ordering in disk.c Change the ordering of code in kernel/power/disk.c so that device_suspend() is called before disable_nonboot_cpus() and platform_finish() is called after enable_nonboot_cpus() and before device_resume(), as indicated by the recent discussion on Linux-PM (cf. http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). The changes here only affect the built-in swsusp. [alexey.y.starikovskiy@linux.intel.com: fix LED blinking during image load] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Patrick Mochel <mochel@digitalimplant.org> Cc: Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:19 -08:00
Rafael J. Wysocki	e3c7db621b	[PATCH] PM: Change code ordering in main.c As indicated in a recent thread on Linux-PM, it's necessary to call pm_ops->finish() before devce_resume(), but enable_nonboot_cpus() has to be called before pm_ops->finish() (cf. http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). For consistency, it seems reasonable to call disable_nonboot_cpus() after device_suspend(). This way the suspend code will remain symmetrical with respect to the resume code and it may allow us to speed up things in the future by suspending and resuming devices and/or saving the suspend image in many threads. The following series of patches reorders the suspend and resume code so that nonboot CPUs are disabled after devices have been suspended and enabled before the devices are resumed. It also causes pm_ops->finish() to be called after enable_nonboot_cpus() wherever necessary. This patch: Change the ordering of code in kernel/power/main.c so that device_suspend() is called before disable_nonboot_cpus() and pm_ops->finish() is called after enable_nonboot_cpus() and before device_resume(), as indicated by recent discussion on Linux-PM (cf. http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:19 -08:00
Eric Paris	6ff1b4426e	[PATCH] make reading /proc/sys/kernel/cap-bould not require CAP_SYS_MODULE Reading /proc/sys/kernel/cap-bound requires CAP_SYS_MODULE. (see proc_dointvec_bset in kernel/sysctl.c) sysctl appears to drive all over proc reading everything it can get it's hands on and is complaining when it is being denied access to read cap-bound. Clearly writing to cap-bound should be a sensitive operation but requiring CAP_SYS_MODULE to read cap-bound seems a bit to strong. I believe the information could with reasonable certainty be obtained by looking at a bunch of the output of /proc/pid/status which has very low security protection, so at best we are just getting a little obfuscation of information. Currently SELinux policy has to 'dontaudit' capability checks for CAP_SYS_MODULE for things like sysctl which just want to read cap-bound. In doing so we also as a byproduct have to hide warnings of potential exploits such as if at some time that sysctl actually tried to load a module. I wondered if anyone would have a problem opening cap-bound up to read from anyone? Acked-by: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:19 -08:00
Christoph Lameter	9617729941	[PATCH] Drop free_pages() nr_free_pages is now a simple access to a global variable. Make it a macro instead of a function. The nr_free_pages now requires vmstat.h to be included. There is one occurrence in power management where we need to add the include. Directly refrer to global_page_state() there to clarify why the #include was added. [akpm@osdl.org: arm build fix] [akpm@osdl.org: sparc64 build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:18 -08:00
Christoph Lameter	d23ad42324	[PATCH] Use ZVC for free_pages This is again simplifies some of the VM counter calculations through the use of the ZVC consolidated counters. [michal.k.k.piotrowski@gmail.com: build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:17 -08:00
Tejun Heo	9ac7849e35	devres: device resource management Implement device resource management, in short, devres. A device driver can allocate arbirary size of devres data which is associated with a release function. On driver detach, release function is invoked on the devres data, then, devres data is freed. devreses are typed by associated release functions. Some devreses are better represented by single instance of the type while others need multiple instances sharing the same release function. Both usages are supported. devreses can be grouped using devres group such that a device driver can easily release acquired resources halfway through initialization or selectively release resources (e.g. resources for port 1 out of 4 ports). This patch adds devres core including documentation and the following managed interfaces. * alloc/free : devm_kzalloc(), devm_kzfree() * IO region : devm_request_region(), devm_release_region() * IRQ : devm_request_irq(), devm_free_irq() * DMA : dmam_alloc_coherent(), dmam_free_coherent(), dmam_declare_coherent_memory(), dmam_pool_create(), dmam_pool_destroy() * PCI : pcim_enable_device(), pcim_pin_device(), pci_is_managed() * iomap : devm_ioport_map(), devm_ioport_unmap(), devm_ioremap(), devm_ioremap_nocache(), devm_iounmap(), pcim_iomap_table(), pcim_iomap(), pcim_iounmap() Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-02-09 17:39:36 -05:00
Ralf Baechle	7726942fb1	[APM] Add shared version of APM emulation Currently ARM and MIPS both have nearly identical copies of the APM emulation code in their arch code. Add yet another copy of it to drivers char and make it selectable through SYS_SUPPORTS_APM_EMULATION. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2007-02-09 17:08:57 +00:00
Linus Torvalds	78149df6d5	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (41 commits) Revert "PCI: remove duplicate device id from ata_piix" msi: Make MSI useable more architectures msi: Kill the msi_desc array. msi: Remove attach_msi_entry. msi: Fix msi_remove_pci_irq_vectors. msi: Remove msi_lock. msi: Kill msi_lookup_irq MSI: Combine pci_(save\|restore)_msi/msix_state MSI: Remove pci_scan_msi_device() MSI: Replace pci_msi_quirk with calls to pci_no_msi() PCI: remove duplicate device id from ipr PCI: remove duplicate device id from ata_piix PCI: power management: remove noise on non-manageable hw PCI: cleanup MSI code PCI: make isa_bridge Alpha-only PCI: remove quirk_sis_96x_compatible() PCI: Speed up the Intel SMBus unhiding quirk PCI Quirk: 1k I/O space IOBL_ADR fix on P64H2 shpchp: delete trailing whitespace shpchp: remove DBG_XXX_ROUTINE ...	2007-02-07 19:23:44 -08:00
Eric W. Biederman	5b912c108c	msi: Kill the msi_desc array. We need to be able to get from an irq number to a struct msi_desc. The msi_desc array in msi.c had several short comings the big one was that it could not be used outside of msi.c. Using irq_data in struct irq_desc almost worked except on some architectures irq_data needs to be used for something else. So this patch adds a msi_desc pointer to irq_desc, adds the appropriate wrappers and changes all of the msi code to use them. The dynamic_irq_init/cleanup code was tweaked to ensure the new field is left in a well defined state. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-07 15:50:08 -08:00
Kay Sievers	270a6c4cad	/sys/modules/*/holders /sys/module/usbcore/ \|-- drivers \| \|-- usb:hub -> ../../../subsystem/usb/drivers/hub \| \|-- usb:usb -> ../../../subsystem/usb/drivers/usb \| `-- usb:usbfs -> ../../../subsystem/usb/drivers/usbfs \|-- holders \| \|-- ehci_hcd -> ../../../module/ehci_hcd \| \|-- uhci_hcd -> ../../../module/uhci_hcd \| \|-- usb_storage -> ../../../module/usb_storage \| `-- usbhid -> ../../../module/usbhid \|-- initstate Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-07 10:37:12 -08:00
Greg Kroah-Hartman	fe480a2675	Modules: only add drivers/ direcory if needed This changes the module core to only create the drivers/ directory if we are going to put something in it. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-07 10:37:12 -08:00
Kay Sievers	f30c53a873	MODULES: add the module name for built in kernel drivers Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-02-07 10:37:12 -08:00
Al Viro	9abcf40b1d	[PATCH] fork_idle() should be __cpuinit, not __devinit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-01 16:17:06 -08:00
Masami Hiramatsu	ab40c5c6b6	[PATCH] kprobes: replace magic numbers with enum Replace the magic numbers with an enum, and gets rid of a warning on the specific architectures (ex. powerpc) on which the compiler considers 'char' as 'unsigned char'. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-30 16:01:35 -08:00
Serge E. Hallyn	0f2452855d	[PATCH] namespaces: fix task exit disaster This is based on a patch by Eric W. Biederman, who pointed out that pid namespaces are still fake, and we only have one ever active. So for the time being, we can modify any code which could access tsk->nsproxy->pid_ns during task exit to just use &init_pid_ns instead, and move the exit_task_namespaces call in do_exit() back above exit_notify(), so that an exiting nfs server has a valid tsk->sighand to work with. Long term, pulling pid_ns out of nsproxy might be the cleanest solution. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> [ Eric's patch fixed to take care of free_pid() too ] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-30 13:40:36 -08:00
Linus Torvalds	444f378b23	Revert "[PATCH] namespaces: fix exit race by splitting exit" This reverts commit `7a238fcba0` in preparation for a better and simpler fix proposed by Eric Biederman (and fixed up by Serge Hallyn) Acked-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-30 13:35:18 -08:00
Serge E. Hallyn	7a238fcba0	[PATCH] namespaces: fix exit race by splitting exit Fix exit race by splitting the nsproxy putting into two pieces. First piece reduces the nsproxy refcount. If we dropped the last reference, then it puts the mnt_ns, and returns the nsproxy as a hint to the caller. Else it returns NULL. The second piece of exiting task namespaces sets tsk->nsproxy to NULL, and drops the references to other namespaces and frees the nsproxy only if an nsproxy was passed in. A little awkward and should probably be reworked, but hopefully it fixes the NFS oops. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Daniel Hokka Zakrisson <daniel@hozac.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-30 08:26:44 -08:00
Linus Torvalds	8528b0f1de	Clear spurious irq stat information when adding irq handler Any newly added irq handler may obviously make any old spurious irq status invalid, since the new handler may well be the thing that is supposed to handle any interrupts that came in. So just clear the statistics when adding handlers. Pointed-out-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-23 14:16:31 -08:00
Ingo Molnar	1b5180b651	[PATCH] notifiers: fix blocking_notifier_call_chain() scalability while lock-profiling the -rt kernel i noticed weird contention during mmap-intense workloads, and the tracer showed the following gem, in one of our MM hotpaths: threaded-2771 1.... 65us : sys_munmap (sysenter_do_call) threaded-2771 1.... 66us : profile_munmap (sys_munmap) threaded-2771 1.... 66us : blocking_notifier_call_chain (profile_munmap) threaded-2771 1.... 66us : rt_down_read (blocking_notifier_call_chain) ouch! a global rw-semaphore taken in one of the most performance- sensitive codepaths of the kernel. And i dont even have oprofile enabled! All distro kernels have CONFIG_PROFILING enabled, so this scalability problem affects the majority of Linux users. The fix is to enhance blocking_notifier_call_chain() to only take the lock if there appears to be work on the call-chain. With this patch applied i get nicely saturated system, and much higher munmap performance, on SMP systems. And as a bonus this also fixes a similar scalability bottleneck in the thread-exit codepath: profile_task_exit() ... Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-23 11:08:03 -08:00
Andrew Morton	bbe1a59b3a	[PATCH] fix "kvm: add vm exit profiling" export profile_hits() on !SMP too. Cc: Ingo Molnar <mingo@elte.hu> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-23 07:52:05 -08:00
Ingo Molnar	07031e14c1	[PATCH] KVM: add VM-exit profiling This adds the profile=kvm boot option, which enables KVM to profile VM exits. Use: "readprofile -m ./System.map \| sort -n" to see the resulting output: [...] 18246 serial_out 148.3415 18945 native_flush_tlb 378.9000 23618 serial_in 212.7748 29279 __spin_unlock_irq 622.9574 43447 native_apic_write 2068.9048 52702 enable_8259A_irq 742.2817 54250 vgacon_scroll 89.3740 67394 ide_inb 6126.7273 79514 copy_page_range 98.1654 84868 do_wp_page 86.6000 140266 pit_read 783.6089 151436 ide_outb 25239.3333 152668 native_io_delay 21809.7143 174783 mask_and_ack_8259A 783.7803 362404 native_set_pte_at 36240.4000 1688747 total 0.5009 Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-11 18:18:21 -08:00
Gautham R Shenoy	b282b6f8a8	[PATCH] Change cpu_up and co from __devinit to __cpuinit Compiling the kernel with CONFIG_HOTPLUG = y and CONFIG_HOTPLUG_CPU = n with CONFIG_RELOCATABLE = y generates the following modpost warnings WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between '_cpu_up' (at offset 0xc0141b7d) and 'cpu_up' WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between '_cpu_up' (at offset 0xc0141b9c) and 'cpu_up' WARNING: vmlinux - Section mismatch: reference to .init.text:__cpu_up from .text between '_cpu_up' (at offset 0xc0141bd8) and 'cpu_up' WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between '_cpu_up' (at offset 0xc0141c05) and 'cpu_up' WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between '_cpu_up' (at offset 0xc0141c26) and 'cpu_up' WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between '_cpu_up' (at offset 0xc0141c37) and 'cpu_up' This is because cpu_up, _cpu_up and __cpu_up (in some architectures) are defined as __devinit AND __cpu_up calls some __cpuinit functions. Since __cpuinit would map to __init with this kind of a configuration, we get a .text refering .init.data warning. This patch solves the problem by converting all of __cpu_up, _cpu_up and cpu_up from __devinit to __cpuinit. The approach is justified since the callers of cpu_up are either dependent on CONFIG_HOTPLUG_CPU or are of __init type. Thus when CONFIG_HOTPLUG_CPU=y, all these cpu up functions would land up in .text section, and when CONFIG_HOTPLUG_CPU=n, all these functions would land up in .init section. Tested on a i386 SMP machine running linux-2.6.20-rc3-mm1. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-11 18:18:20 -08:00
Nathan Lynch	e5e5673f82	[PATCH] sched: tasks cannot run on cpus onlined after boot Commit `5c1e176781` ("sched: force /sbin/init off isolated cpus") sets init's cpus_allowed to a subset of cpu_online_map at boot time, which means that tasks won't be scheduled on cpus that are added to the system later. Make init's cpus_allowed a subset of cpu_possible_map instead. This should still preserve the behavior that Nick's change intended. Thanks to Giuliano Pochini for reporting this and testing the fix: http://ozlabs.org/pipermail/linuxppc-dev/2006-December/029397.html Signed-off-by: Nathan Lynch <ntl@pobox.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-11 18:18:20 -08:00
Vivek Goyal	343cde51b3	[PATCH] x86-64: Make noirqdebug_setup function non init to fix modpost warning o noirqdebug_setup() is __init but it is being called by quirk_intel_irqbalance() which if of type __devinit. If CONFIG_HOTPLUG=y, quirk_intel_irqbalance() is put into text section and it is wrong to call a function in __init section. o MODPOST flags this on i386 if CONFIG_RELOCATABLE=y WARNING: vmlinux - Section mismatch: reference to .init.text:noirqdebug_setup from .text between 'quirk_intel_irqbalance' (at offset 0xc010969e) and 'i8237A_suspend' o Make noirqdebug_setup() non-init. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-01-11 01:52:44 +01:00
Linus Torvalds	d0abc451a6	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: [PATCH] Driver core: Fix prefix driver links in /sys/module by bus-name	2007-01-06 00:10:55 -08:00
Ingo Molnar	a75acf850c	[PATCH] profiling: fix sched profiling typo Fix sched profiling typo, introduced by the sleep profiling patch. This bug caused profile=sched to not work. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-05 23:55:22 -08:00
Rafael J. Wysocki	7bf2368742	[PATCH] swsusp: Do not fail if resume device is not set In the kernels later than 2.6.19 there is a regression that makes swsusp fail if the resume device is not explicitly specified. It can be fixed by adding an additional parameter to mm/swapfile.c:swap_type_of() allowing us to pass the (struct block_device *) corresponding to the first available swap back to the caller. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-05 23:55:22 -08:00
Ard van Breemen	a416aba637	[PATCH] kernelparams: detect if and which parameter parsing enabled irq's The parsing of some kernel parameters seem to enable irq's at a stage that irq's are not supposed to be enabled (Particularly the ide kernel parameters). Having irq's enabled before the irq controller is initialized might lead to a kernel panic. This patch only detects this behaviour and warns about wich parameter caused it. [akpm@osdl.org: cleanups] Signed-off-by: Ard van Breemen <ard@telegraafnet.nl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-05 23:55:21 -08:00
Kay Sievers	7f422e2e84	[PATCH] Driver core: Fix prefix driver links in /sys/module by bus-name Modules may have drivers with the same name on different buses. This patch fixes this problem. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-01-05 12:33:38 -08:00
Oleg Nesterov	241ceee0b4	[PATCH] restore ->pdeath_signal behaviour Commit `b2b2cbc4b2` introduced a user- visible change: ->pdeath_signal is sent only when the entire thread group exits. While this change is imho good, it may break things. So restore the old behaviour for now. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> To: Albert Cahalan <acahalan@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@osdl.org> Cc: Linus Torvalds <torvalds@osdl.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Qi Yong <qiyong@fc-cn.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-31 14:41:18 -08:00
Andrew Morton	755cd90029	[PATCH] lockdep: printk warning fix kernel/lockdep.c: In function `lookup_chain_cache': kernel/lockdep.c:1339: warning: long long unsigned int format, u64 arg (arg 2) kernel/lockdep.c:1344: warning: long long unsigned int format, u64 arg (arg 2) Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-30 10:56:43 -08:00
Andrew Morton	089e34b600	[PATCH] cpuset procfs warning fix fs/proc/base.c:1869: warning: initialization discards qualifiers from pointer target type fs/proc/base.c:2150: warning: initialization discards qualifiers from pointer target type Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-30 10:56:43 -08:00
Akinobu Mita	a3f99f8ba8	[PATCH] module: fix mod_sysfs_setup() return value mod_sysfs_setup() doesn't return error when kobject_add_dir() failed. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-30 10:56:41 -08:00
Ingo Molnar	9414232fa0	[PATCH] sched: fix cond_resched_softirq() offset Remove the __resched_legal() check: it is conceptually broken. The biggest problem it had is that it can mask buggy cond_resched() calls. A cond_resched() call is only legal if we are not in an atomic context, with two narrow exceptions: - if the system is booting - a reacquire_kernel_lock() down() done while PREEMPT_ACTIVE is set But __resched_legal() hid this and just silently returned whenever these primitives were called from invalid contexts. (Same goes for cond_resched_locked() and cond_resched_softirq()). Furthermore, the __legal_resched(0) call was buggy in that it caused unnecessarily long softirq latencies via cond_resched_softirq(). (which is only called from softirq-off sections, hence the code did nothing.) The fix is to resurrect the efficiency of the might_sleep checks and to only allow the narrow exceptions. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-30 10:56:41 -08:00
Ingo Molnar	e4e6bdbb42	[PATCH] rcu: rcutorture suspend fix Fix suspend hang: rcutorture threads need to be nofreeze. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-30 10:55:55 -08:00
Ingo Molnar	e1d9fd2e3d	[PATCH] suspend: fix suspend on single-CPU systems Clark Williams reported that suspend doesnt work on his laptop on 2.6.20-rc1-rt kernels. The bug was introduced by the following cleanup commit: commit `112cecb2cc` Author: Siddha, Suresh B <suresh.b.siddha@intel.com> Date: Wed Dec 6 20:34:31 2006 -0800 [PATCH] suspend: don't change cpus_allowed for task initiating the suspend because with this change 'error' is not initialized to 0 anymore, if there are no other online CPUs. (i.e. if the system is single-CPU). the fix is the initialize it to 0. The really weird thing is that my version of gcc does not warn about this non-initialized variable situation ... (also fix the kernel printk in the error branch, it was missing a newline) Reported-by: Clark Williams <williams@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-23 13:59:33 -08:00
Linus Torvalds	18ed1c0513	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (68 commits) ACPI: replace kmalloc+memset with kzalloc ACPI: Add support for acpi_load_table/acpi_unload_table_id fbdev: update after backlight argument change ACPI: video: Add dev argument for backlight_device_register ACPI: Implement acpi_video_get_next_level() ACPI: Kconfig - depend on PM rather than selecting it ACPI: fix NULL check in drivers/acpi/osl.c ACPI: make drivers/acpi/ec.c:ec_ecdt static ACPI: prevent processor module from loading on failures ACPI: fix single linked list manipulation ACPI: ibm_acpi: allow clean removal ACPI: fix git automerge failure ACPI: ibm_acpi: respond to workqueue update ACPI: dock: add uevent to indicate change in device status ACPI: ec: Lindent once again ACPI: ec: Change #define to enums there possible. ACPI: ec: Style changes. ACPI: ec: Acquire Global Lock under EC mutex. ACPI: ec: Drop udelay() from poll mode. Loop by reading status field instead. ACPI: ec: Rename gpe_bit to gpe ...	2006-12-22 18:46:56 -08:00
Eric W. Biederman	b2b2cbc4b2	[PATCH] Fix reparenting to the same thread group. (take 2) This patch fixes the case when we reparent to a different thread in the same thread group. This modifies the code so that we do not send signals and do not change the signal to send to SIGCHLD unless we have change the thread group of our parents. It also suppresses sending pdeath_sig in this cas as well since the result of geppid doesn't change. Thanks to Oleg for spotting my bug of only fixing this for non-ptraced tasks. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Albert Cahalan <acahalan@gmail.com> Cc: Andrew Morton <akpm@osdl.org> Cc: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Coywolf Qi Hunt <qiyong@fc-cn.com> Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 09:03:41 -08:00
Andrew Morton	192636ad90	[PATCH] relay: remove inlining text data bss dec hex filename before: 4036 44 0 4080 ff0 kernel/relay.o after: 3727 44 0 3771 ebb kernel/relay.o Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Tom Zanussi <zanussi@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:51 -08:00
Vadim Lobanov	01b2d93ca4	[PATCH] fdtable: Provide free_fdtable() wrapper Christoph Hellwig has expressed concerns that the recent fdtable changes expose the details of the RCU methodology used to release no-longer-used fdtable structures to the rest of the kernel. The trivial patch below addresses these concerns by introducing the appropriate free_fdtable() calls, which simply wrap the release RCU usage. Since free_fdtable() is a one-liner, it makes sense to promote it to an inline helper. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:50 -08:00
Andrew Morton	5b149bcc23	[PATCH] schedule_timeout(): improve warning message Kyle is hitting this warning, and we don't have a clue what it's caused by. Add the obligatory dump_stack(). Cc: kyle <kylewong@southa.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:49 -08:00
Akinobu Mita	3e1fbd12c9	[PATCH] audit: fix kstrdup() error check kstrdup() returns NULL on error. Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:49 -08:00
Thomas Gleixner	9d7ac8be4b	[PATCH] genirq: fix irq flow handler uninstall The sanity check for no_irq_chip in __set_irq_hander() is unconditional on both install and uninstall of an handler. This triggers false warnings and replaces no_irq_chip by dummy_irq_chip in the uninstall case. Check only, when a real handler is installed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Sylvain Munaut <tnt@246tNt.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:48 -08:00
Tim Chen	67af63a6ab	[PATCH] sched: remove __cpuinitdata anotation to cpu_isolated_map The structure cpu_isolated_map is used not only during initialization. Multi-core scheduler configuration changes and exclusive cpusets use this during run time. During setting of sched_mc_power_savings policy, this structure is accessed to update sched_domains. Signed-off-by: Tim Chen <tim.c.chen@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:47 -08:00
Adrian Bunk	99eea6a105	[PATCH] make kernel/printk.c:ignore_loglevel_setup() static Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:47 -08:00
Randy Dunlap	af9997e426	[PATCH] fix kernel-doc warnings in 2.6.20-rc1 Fix kernel-doc warnings in 2.6.20-rc1. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:47 -08:00
Mark Fasheh	ba0084048a	[PATCH] Conditionally check expected_preempt_count in __resched_legal() Commit `2d7d253548` ("fix cond_resched() fix") introduced an 'expected_preempt_count' parameter to __resched_legal() to fix a bug where it was returning a false negative when called from cond_resched_lock() and preemption was enabled. Unfortunately this broke things for when preemption is disabled. preempt_count() will always return zero, thus failing the check against any value of expected_preempt_count not equal to zero. cond_resched_lock() for example, passes an expected_preempt_count value of 1. So fix the fix for the cond_resched() fix by skipping the check of preempt_count() against expected_preempt_count when preemption is disabled. Credit should go to Sunil Mushran for spotting the bug during testing. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-22 08:55:46 -08:00
Ingo Molnar	9bfb18392e	[PATCH] workqueue: fix schedule_on_each_cpu() fix the schedule_on_each_cpu() implementation: __queue_work() is now stricter, hence set the work-pending bit before passing in the new work. (found in the -rt tree, using Peter Zijlstra's files-lock scalability patchset) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-21 00:20:01 -08:00
Peter Williams	bc947631d1	[PATCH] sched: improve efficiency of sched_fork() Problem: sched_fork() has always called scheduler_tick() in some (unlikely) circumstances in order to update the current task in light of those circumstances. It has always been the case that the work done by scheduler_tick() was more than was required to handle the problem in hand but no harm was done except for the waste of a few CPU cycles. However, the splitting of scheduler_tick() into two procedures in 2.6.20-rc1 enables the wasted cycles to be saved as the new procedure task_running_tick() does all the work that is required to rectify the problem being handled. Solution: Replace the call to scheduler_tick() in sched_fork() with a call to task_running_tick(). Signed-off-by: Peter Williams <pwil3058@bigpond.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-21 00:11:51 -08:00
Geert Uytterhoeven	b039db8eea	[PATCH] __set_irq_handler bogus space __set_irq_handler: Kill a bogus space Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-21 00:08:27 -08:00
Len Brown	9774f33841	merge linus into test branch	2006-12-20 02:53:13 -05:00
Linus Torvalds	a08727bae7	Make workqueue bit operations work on "atomic_long_t" On architectures where the atomicity of the bit operations is handled by external means (ie a separate spinlock to protect concurrent accesses), just doing a direct assignment on the workqueue data field (as done by commit `4594bf159f`) can cause the assignment to be lost due to lack of serialization with the bitops on the same word. So we need to serialize the assignment with the locks on those architectures (notably older ARM chips, PA-RISC and sparc32). So rather than using an "unsigned long", let's use "atomic_long_t", which already has a safe assignment operation (atomic_long_set()) on such architectures. This requires that the atomic operations use the same atomicity locks as the bit operations do, but that is largely the case anyway. Sparc32 will probably need fixing. Architectures (including modern ARM with LL/SC) that implement sane atomic operations for SMP won't see any of this matter. Cc: Russell King <rmk+lkml@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: David Miller <davem@davemloft.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Linux Arch Maintainers <linux-arch@vger.kernel.org> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-16 09:53:50 -08:00
Len Brown	cfee47f99b	Pull bugfix into test branch Conflicts: kernel/power/disk.c	2006-12-16 01:01:18 -05:00
Linus Torvalds	d1526e2cda	Remove stack unwinder for now It has caused more problems than it ever really solved, and is apparently not getting cleaned up and fixed. We can put it back when it's stable and isn't likely to make warning or bug events worse. In the meantime, enable frame pointers for more readable stack traces. Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-15 08:47:51 -08:00
David Brownell	f89bce3d9a	Driver core: deprecate PM_LEGACY, default it to N Deprecate the old "legacy" PM API, and more importantly default it to "n". Virtually nothing in-tree uses it any more. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-12-13 15:38:46 -08:00
Kay Sievers	1f71740ab9	Driver core: show "initstate" of module Show the initialization state(live, coming, going) of the module: $ cat /sys/module/usbcore/initstate live Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-12-13 15:38:45 -08:00
Ralf Baechle	ec8c0446b6	[PATCH] Optimize D-cache alias handling on fork Virtually index, physically tagged cache architectures can get away without cache flushing when forking. This patch adds a new cache flushing function flush_cache_dup_mm(struct mm_struct *) which for the moment I've implemented to do the same thing on all architectures except on MIPS where it's a no-op. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:27:08 -08:00
Eric Dumazet	cd7175edf9	[PATCH] Optimize calc_load() calc_load() is called by timer interrupt to update avenrun[]. It currently calls nr_active() at each timer tick (HZ per second), while the update of avenrun[] is done only once every 5 seconds. (LOAD_FREQ=5 Hz) nr_active() is quite expensive on SMP machines, since it has to sum up nr_running and nr_uninterruptible of all online CPUS, bringing foreign dirty cache lines. This patch is an optimization of calc_load() so that nr_active() is called only if we need it. The use of unlikely() is welcome since the condition is true only once every 5*HZ time. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:55 -08:00
Robert P. J. Day	cd86128088	[PATCH] Fix numerous kcalloc() calls, convert to kzalloc() All kcalloc() calls of the form "kcalloc(1,...)" are converted to the equivalent kzalloc() calls, and a few kcalloc() calls with the incorrect ordering of the first two arguments are fixed. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Adam Belay <ambx1@neo.rr.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:52 -08:00
Ingo Molnar	74c383f140	[PATCH] lockdep: fix possible races while disabling lock-debugging Jarek Poplawski noticed that lockdep global state could be accessed in a racy way if one CPU did a lockdep assert (shutting lockdep down), while the other CPU would try to do something that changes its global state. This patch fixes those races and cleans up lockdep's internal locking by adding a graph_lock()/graph_unlock()/debug_locks_off_graph_unlock helpers. (Also note that as we all know the Linux kernel is, by definition, bug-free and perfect, so this code never triggers, so these fixes are highly theoretical. I wrote this patch for aesthetic reasons alone.) [akpm@osdl.org: build fix] [jarkao2@o2.pl: build fix's refix] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:50 -08:00
Ingo Molnar	3117df0453	[PATCH] lockdep: print irq-trace info on asserts When we print an assert due to scheduling-in-atomic bugs, and if lockdep is enabled, then the IRQ tracing information of lockdep can be printed to pinpoint the code location that disabled interrupts. This saved me quite a bit of debugging time in cases where the backtrace did not identify the irq-disabling site well enough. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:50 -08:00
Ingo Molnar	27c3b23226	[PATCH] lockdep: use chain hash on CONFIG_DEBUG_LOCKDEP too CONFIG_DEBUG_LOCKDEP is unacceptably slow because it does not utilize the chain-hash. Turn the chain-hash back on in this case too. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:50 -08:00
Ingo Molnar	33e94e960b	[PATCH] lockdep: clean up VERY_VERBOSE define Cleanup: the VERY_VERBOSE define was unnecessarily dependent on #ifdef VERBOSE - while the VERBOSE switch is 0 or 1 (always defined). Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:50 -08:00
Ingo Molnar	23d95a03d6	[PATCH] lockdep: improve lockdep_reset() Clear all the chains during lockdep_reset(). This fixes some locking-selftest false positives i saw on -rt. (never saw those on mainline though, but it could happen.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:50 -08:00
Ingo Molnar	81fc685a89	[PATCH] lockdep: improve verbose messages Make verbose lockdep messages (off by default) more informative by printing out the hash chain key. (this patch was what helped me catch the earlier lockdep hash-collision bug) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:50 -08:00
Ingo Molnar	a664089741	[PATCH] lockdep: filter off by default Fix typo in the class_filter() function. (filtering is not used by default so this only affects lockdep-internal debugging cases) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:50 -08:00
Ingo Molnar	5d6f647fc6	[PATCH] debug: add sysrq_always_enabled boot option Most distributions enable sysrq support but set it to 0 by default. Add a sysrq_always_enabled boot option to always-enable sysrq keys. Useful for debugging - without having to modify the disribution's config files (which might not be possible if the kernel is on a live CD, etc.). Also, while at it, clean up the sysrq interfaces. [bunk@stusta.de: make sysrq_always_enabled_setup() static] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:50 -08:00
Rafael J. Wysocki	8a102eed9c	[PATCH] PM: Fix SMP races in the freezer Currently, to tell a task that it should go to the refrigerator, we set the PF_FREEZE flag for it and send a fake signal to it. Unfortunately there are two SMP-related problems with this approach. First, a task running on another CPU may be updating its flags while the freezer attempts to set PF_FREEZE for it and this may leave the task's flags in an inconsistent state. Second, there is a potential race between freeze_process() and refrigerator() in which freeze_process() running on one CPU is reading a task's PF_FREEZE flag while refrigerator() running on another CPU has just set PF_FROZEN for the same task and attempts to reset PF_FREEZE for it. If the refrigerator wins the race, freeze_process() will state that PF_FREEZE hasn't been set for the task and will set it unnecessarily, so the task will go to the refrigerator once again after it's been thawed. To solve first of these problems we need to stop using PF_FREEZE to tell tasks that they should go to the refrigerator. Instead, we can introduce a special TIF_* flag and use it for this purpose, since it is allowed to change the other tasks' TIF_* flags and there are special calls for it. To avoid the freeze_process()-refrigerator() race we can make freeze_process() to always check the task's PF_FROZEN flag after it's read its "freeze" flag. We should also make sure that refrigerator() will always reset the task's "freeze" flag after it's set PF_FROZEN for it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Andi Kleen <ak@muc.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:49 -08:00
Rafael J. Wysocki	3df494a32b	[PATCH] PM: Fix freezing of stopped tasks Currently, if a task is stopped (ie. it's in the TASK_STOPPED state), it is considered by the freezer as unfreezeable. However, there may be a race between the freezer and the delivery of the continuation signal to the task resulting in the task running after we have finished freezing the other tasks. This, in turn, may lead to undesirable effects up to and including data corruption. To prevent this from happening we first need to make the freezer consider stopped tasks as freezeable. For this purpose we need to make freezeable() stop returning 0 for these tasks and we need to force them to enter the refrigerator. However, if there's no continuation signal in the meantime, the stopped tasks should remain stopped after all processes have been thawed, so we need to send an additional SIGSTOP to each of them before waking it up. Also, a stopped task that has just been woken up should first check if there's a freezing request for it and go to the refrigerator if that's the case. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:49 -08:00
Paul Jackson	02a0e53d82	[PATCH] cpuset: rework cpuset_zone_allowed api Elaborate the API for calling cpuset_zone_allowed(), so that users have to explicitly choose between the two variants: cpuset_zone_allowed_hardwall() cpuset_zone_allowed_softwall() Until now, whether or not you got the hardwall flavor depended solely on whether or not you or'd in the __GFP_HARDWALL gfp flag to the gfp_mask argument. If you didn't specify __GFP_HARDWALL, you implicitly got the softwall version. Unfortunately, this meant that users would end up with the softwall version without thinking about it. Since only the softwall version might sleep, this led to bugs with possible sleeping in interrupt context on more than one occassion. The hardwall version requires that the current tasks mems_allowed allows the node of the specified zone (or that you're in interrupt or that __GFP_THISNODE is set or that you're on a one cpuset system.) The softwall version, depending on the gfp_mask, might allow a node if it was allowed in the nearest enclusing cpuset marked mem_exclusive (which requires taking the cpuset lock 'callback_mutex' to evaluate.) This patch removes the cpuset_zone_allowed() call, and forces the caller to explicitly choose between the hardwall and the softwall case. If the caller wants the gfp_mask to determine this choice, they should (1) be sure they can sleep or that __GFP_HARDWALL is set, and (2) invoke the cpuset_zone_allowed_softwall() routine. This adds another 100 or 200 bytes to the kernel text space, due to the few lines of nearly duplicate code at the top of both cpuset_zone_allowed_* routines. It should save a few instructions executed for the calls that turned into calls of cpuset_zone_allowed_hardwall, thanks to not having to set (before the call) then check (within the call) the __GFP_HARDWALL flag. For the most critical call, from get_page_from_freelist(), the same instructions are executed as before -- the old cpuset_zone_allowed() routine it used to call is the same code as the cpuset_zone_allowed_softwall() routine that it calls now. Not a perfect win, but seems worth it, to reduce this chance of hitting a sleeping with irq off complaint again. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:49 -08:00
Eric W. Biederman	5f8442edfb	[PATCH] Revert "[PATCH] identifier to nsproxy" This reverts commit `373beb35cd`. No one is using this identifier yet. The purpose of this identifier is to export nsproxy to user space which is wrong. nsproxy is an internal implementation optimization, which should keep our fork times from getting slower as we increase the number of global namespaces you don't have to share. Adding a global identifier like this is inappropriate because it makes namespaces inherently non-recursive, greatly limiting what we can do with them in the future. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-13 09:05:47 -08:00
Daniel Walker	f5f1a24a2c	[PATCH] clocksource: small cleanup Mostly changing alignment. Just some general cleanup. [akpm@osdl.org: build fix] Signed-off-by: Daniel Walker <dwalker@mvista.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:57:22 -08:00
Daniel Walker	2b0137001d	[PATCH] clocksource: add usage of CONFIG_SYSFS Simply adds some ifdefs to remove clocksoure sysfs code when CONFIG_SYSFS isn't turn on. Signed-off-by: Daniel Walker <dwalker@mvista.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:57:22 -08:00
Arjan van de Ven	4c36a5dec2	[PATCH] round_jiffies infrastructure Introduce a round_jiffies() function as well as a round_jiffies_relative() function. These functions round a jiffies value to the next whole second. The primary purpose of this rounding is to cause all "we don't care exactly when" timers to happen at the same jiffy. This avoids multiple timers firing within the second for no real reason; with dynamic ticks these extra timers cause wakeups from deep sleep CPU sleep states and thus waste power. The exact wakeup moment is skewed by the cpu number, to avoid all cpus from waking up at the exact same time (and hitting the same lock/cachelines there) [akpm@osdl.org: fix variable type] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:57:22 -08:00
Vadim Lobanov	4fd45812cb	[PATCH] fdtable: Remove the free_files field An fdtable can either be embedded inside a files_struct or standalone (after being expanded). When an fdtable is being discarded after all RCU references to it have expired, we must either free it directly, in the standalone case, or free the files_struct it is contained within, in the embedded case. Currently the free_files field controls this behavior, but we can get rid of it entirely, as all the necessary information is already recorded. We can distinguish embedded and standalone fdtables using max_fds, and if it is embedded we can divine the relevant files_struct using container_of(). Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:57:22 -08:00
Vadim Lobanov	bbea9f6966	[PATCH] fdtable: Make fdarray and fdsets equal in size Currently, each fdtable supports three dynamically-sized arrays of data: the fdarray and two fdsets. The code allows the number of fds supported by the fdarray (fdtable->max_fds) to differ from the number of fds supported by each of the fdsets (fdtable->max_fdset). In practice, it is wasteful for these two sizes to differ: whenever we hit a limit on the smaller-capacity structure, we will reallocate the entire fdtable and all the dynamic arrays within it, so any delta in the memory used by the larger-capacity structure will never be touched at all. Rather than hogging this excess, we shouldn't even allocate it in the first place, and keep the capacities of the fdarray and the fdsets equal. This patch removes fdtable->max_fdset. As an added bonus, most of the supporting code becomes simpler. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:57:22 -08:00
Vadim Lobanov	f3d19c90fb	[PATCH] fdtable: Delete pointless code in dup_fd() The dup_fd() function creates a new files_struct and fdtable embedded inside that files_struct, and then possibly expands the fdtable using expand_files(). The out_release error path is invoked when expand_files() returns an error code. However, when this attempt to expand fails, the fdtable is left in its original embedded form, so it is pointless to try to free the associated fdarray and fdsets. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:57:21 -08:00
Miguel Ojeda Sandonis	33859f7f97	[PATCH] kernel/sched.c: whitespace cleanups [akpm@osdl.org: additional cleanups] Signed-off-by: Miguel Ojeda Sandonis <maxextreme@gmail.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:57:20 -08:00
Chen, Kenneth W	62ab616d54	[PATCH] sched: optimize activate_task for RT task RT task does not participate in interactiveness priority and thus shouldn't be bothered with timestamp and p->sleep_type manipulation when task is being put on run queue. Bypass all of the them with a single if (rt_task) test. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:43 -08:00
Chen, Kenneth W	06066714f6	[PATCH] sched: remove lb_stopbalance counter Remove scheduler stats lb_stopbalance counter. This counter can be calculated by: lb_balanced - lb_nobusyg - lb_nobusyq. There is no need to create gazillion counters while we can derive the value. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:43 -08:00
Siddha, Suresh B	783609c6cb	[PATCH] sched: decrease number of load balances Currently at a particular domain, each cpu in the sched group will do a load balance at the frequency of balance_interval. More the cores and threads, more the cpus will be in each sched group at SMP and NUMA domain. And we endup spending quite a bit of time doing load balancing in those domains. Fix this by making only one cpu(first idle cpu or first cpu in the group if all the cpus are busy) in the sched group do the load balance at that particular sched domain and this load will slowly percolate down to the other cpus with in that group(when they do load balancing at lower domains). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:43 -08:00
Mike Galbraith	b18ec80396	[PATCH] sched: improve migration accuracy Co-opt rq->timestamp_last_tick to maintain a cache_hot_time evaluation reference timestamp at both tick and sched times to prevent said reference, formerly rq->timestamp_last_tick, from being behind task->last_ran at evaluation time, and to move said reference closer to current time on the remote processor, intent being to improve cache hot evaluation and timestamp adjustment accuracy for task migration. Fix minor sched_time double accounting error which occurs when a task passing through schedule() does not schedule off, and takes the next timer tick. [kenneth.w.chen@intel.com: cleanup] Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Ken Chen <kenneth.w.chen@intel.com> Cc: Don Mullis <dwm@meer.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:43 -08:00
Christoph Lameter	08c183f31b	[PATCH] sched: add option to serialize load balancing Large sched domains can be very expensive to scan. Add an option SD_SERIALIZE to the sched domain flags. If that flag is set then we make sure that no other such domain is being balanced. [akpm@osdl.org: build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Peter Williams <pwil3058@bigpond.net.au> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:43 -08:00
Christoph Lameter	1bd77f2da5	[PATCH] sched: call tasklet less frequently Trigger softirq less frequently We trigger the softirq before this patch using offset of sd->interval. However, if the queue is busy then it is sufficient to schedule the softirq with sd->interval * busy_factor. So we modify the calculation of the next time to balance by taking the interval added to last_balance again. This is only the right value if the idle/busy situation continues as is. There are two potential trouble spots: - If the queue was idle and now gets busy then we call rebalance early. However, that is not a problem because we will then use the longer interval for the next period. - If the queue was busy and becomes idle then we potentially wait too long before rebalancing. However, when the task goes idle then idle_balance is called. We add another calculation of the next balance time based on sd->interval in idle_balance so that we will rebalance soon. V2->V3: - Calculate rebalance time based on current jiffies and not based on the jiffies at the last time we load balanced. We no longer rely on staggering and therefore we can affort to do this now. V3->V4: - Use functions to do jiffy comparisons. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Peter Williams <pwil3058@bigpond.net.au> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:43 -08:00
Christoph Lameter	c9819f4593	[PATCH] sched: use softirq for load balancing Call rebalance_tick (renamed to run_rebalance_domains) from a newly introduced softirq. We calculate the earliest time for each layer of sched domains to be rescanned (this is the rescan time for idle) and use the earliest of those to schedule the softirq via a new field "next_balance" added to struct rq. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Peter Williams <pwil3058@bigpond.net.au> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:42 -08:00
Christoph Lameter	e418e1c2bf	[PATCH] sched: move idle status calculation into rebalance_tick() Perform the idle state determination in rebalance_tick. If we separate balancing from sched_tick then we also need to determine the idle state in rebalance_tick. V2->V3 Remove useless idlle != 0 check. Checking nr_running seems to be sufficient. Thanks Suresh. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Peter Williams <pwil3058@bigpond.net.au> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:42 -08:00
Christoph Lameter	7835b98bc6	[PATCH] sched: extract load calculation from rebalance_tick A load calculation is always done in rebalance_tick() in addition to the real load balancing activities that only take place when certain jiffie counts have been reached. Move that processing into a separate function and call it directly from scheduler_tick(). Also extract the time slice handling from scheduler_tick and put it into a separate function. Then we can clean up scheduler_tick significantly. It will no longer have any gotos. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Peter Williams <pwil3058@bigpond.net.au> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:42 -08:00
Christoph Lameter	fe2eea3faf	[PATCH] sched: disable interrupts for locking in load_balance() Interrupts must be disabled for request queue locks if we want to run load_balance() with interrupts enabled. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Peter Williams <pwil3058@bigpond.net.au> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:42 -08:00
Christoph Lameter	4211a9a2e9	[PATCH] sched: remove staggering of load balancing Timer interrupts already are staggered. We do not need an additional layer of time staggering for short load balancing actions that take a reasonably small portion of the time slice. For load balancing on large sched_domains we will add a serialization later that avoids concurrent load balance operations and thus has the same effect as load staggering. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Peter Williams <pwil3058@bigpond.net.au> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:42 -08:00
Christoph Lameter	571f6d2fb0	[PATCH] sched: avoid taking rq lock in wake_priority_sleeper Avoid taking the request queue lock in wake_priority_sleeper if there are no running processes. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Peter Williams <pwil3058@bigpond.net.au> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:42 -08:00
Kirill Korotaev	054b9108e0	[PATCH] move_task_off_dead_cpu() should be called with disabled ints move_task_off_dead_cpu() requires interrupts to be disabled, while migrate_dead() calls it with enabled interrupts. Added appropriate comments to functions and added BUG_ON(!irqs_disabled()) into double_rq_lock() and double_lock_balance() which are the origin sources of such bugs. Signed-off-by: Kirill Korotaev <dev@openvz.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:42 -08:00
Siddha, Suresh B	6711cab43e	[PATCH] ched domain: move sched group allocations to percpu area Move the sched group allocations to percpu area. This will minimize cross node memory references and also cleans up the sched groups allocation for allnodes sched domain. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:42 -08:00
Robert P. J. Day	cc2a73b5ca	[PATCH] sched.c: correct comment for this_rq_lock() Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:42 -08:00
Andrew Morton	4a7864ca63	[PATCH] io-accounting: via taskstats Deliver IO accounting via taskstats. Cc: Jay Lan <jlan@sgi.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Chris Sturtivant <csturtiv@sgi.com> Cc: Tony Ernst <tee@sgi.com> Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Cc: David Wright <daw@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:41 -08:00

... 2 3 4 5 6 ...

2046 Commits