linux-sg2042/kernel
Paul Jackson 181b648036 [PATCH] cpuset: fix obscure attach_task vs exiting race
Fix obscure race condition in kernel/cpuset.c attach_task() code.

There is basically zero chance of anyone accidentally being harmed by this
race.

It requires a special 'micro-stress' load and a special timing loop hacks
in the kernel to hit in less than an hour, and even then you'd have to hit
it hundreds or thousands of times, followed by some unusual and senseless
cpuset configuration requests, including removing the top cpuset, to cause
any visibly harm affects.

One could, with perhaps a few days or weeks of such effort, get the
reference count on the top cpuset below zero, and manage to crash the
kernel by asking to remove the top cpuset.

I found it by code inspection.

The race was introduced when 'the_top_cpuset_hack' was introduced, and one
piece of code was not updated.  An old check for a possibly null task
cpuset pointer needed to be changed to a check for a task marked
PF_EXITING.  The pointer can't be null anymore, thanks to
the_top_cpuset_hack (documented in kernel/cpuset.c).  But the task could
have gone into PF_EXITING state after it was found in the task_list scan.

If a task is PF_EXITING in this code, it is possible that its task->cpuset
pointer is pointing to the top cpuset due to the_top_cpuset_hack, rather
than because the top_cpuset was that tasks last valid cpuset.  In that
case, the wrong cpuset reference counter would be decremented.

The fix is trivial.  Instead of failing the system call if the tasks cpuset
pointer is null here, fail it if the task is in PF_EXITING state.

The code for 'the_top_cpuset_hack' that changes an exiting tasks cpuset to
the top_cpuset is done without locking, so could happen at anytime.  But it
is done during the exit handling, after the PF_EXITING flag is set.  So if
we verify that a task is still not PF_EXITING after we copy out its cpuset
pointer (into 'oldcs', below), we know that 'oldcs' is not one of these
hack references to the top_cpuset.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:25 -07:00
..
irq [PATCH] irq: remove a extra line 2006-09-29 09:18:07 -07:00
power Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 2006-09-26 11:49:46 -07:00
time [PATCH] time: rename clocksource functions 2006-06-26 09:58:21 -07:00
.gitignore gitignore: ignore more generated files 2006-01-03 11:35:26 +01:00
Kconfig.hz [PATCH] i386: Selectable Frequency of the Timer Interrupt 2005-06-23 09:45:10 -07:00
Kconfig.preempt [PATCH] sched: voluntary kernel preemption 2005-06-25 16:24:45 -07:00
Makefile [PATCH] per-task-delay-accounting: taskstats interface 2006-07-14 21:53:56 -07:00
acct.c [PATCH] audit/accounting: tty locking 2006-09-29 09:18:25 -07:00
audit.c [PATCH] selinux: rename selinux_ctxid_to_string 2006-09-26 08:48:52 -07:00
audit.h [PATCH] audit: AUDIT_PERM support 2006-09-11 13:32:30 -04:00
auditfilter.c [PATCH] selinux: rename selinux_ctxid_to_string 2006-09-26 08:48:52 -07:00
auditsc.c [PATCH] audit/accounting: tty locking 2006-09-29 09:18:25 -07:00
capability.c [PATCH] pidspace: is_init() 2006-09-29 09:18:12 -07:00
compat.c [PATCH] posix-timers: Fix clock_nanosleep() doesn't return the remaining time in compatibility mode 2006-09-29 09:18:15 -07:00
configs.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
cpu.c [PATCH] Disable CPU hotplug during suspend 2006-09-26 08:48:59 -07:00
cpuset.c [PATCH] cpuset: fix obscure attach_task vs exiting race 2006-09-29 09:18:25 -07:00
delayacct.c [PATCH] task delay accounting fixes 2006-09-01 11:39:08 -07:00
dma.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
exec_domain.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
exit.c [PATCH] introduce TASK_DEAD state 2006-09-29 09:18:21 -07:00
extable.c [PATCH] symbol_put_addr() locks kernel 2006-05-15 11:20:55 -07:00
fork.c [PATCH] copy_process: cosmetic ->ioprio tweak 2006-09-29 09:18:18 -07:00
futex.c [PATCH] sys_get_robust_list(): don't take tasklist_lock 2006-09-29 09:18:18 -07:00
futex_compat.c [PATCH] futex: Apply recent futex fixes to futex_compat 2006-08-06 08:57:49 -07:00
hrtimer.c [PATCH] posix-timers: Fix clock_nanosleep() doesn't return the remaining time in compatibility mode 2006-09-29 09:18:15 -07:00
itimer.c [PATCH] hrtimers: remove data field 2006-03-26 08:57:03 -08:00
kallsyms.c [PATCH] null-terminate over-long /proc/kallsyms symbols 2006-07-14 21:53:52 -07:00
kexec.c [PATCH] kexec warning fix 2006-09-29 09:18:15 -07:00
kfifo.c [PATCH] memory ordering in __kfifo primitives 2006-09-29 09:18:13 -07:00
kmod.c [PATCH] Fix ____call_usermodehelper errors being silently ignored 2006-09-29 09:18:16 -07:00
kprobes.c [PATCH] IA64: kprobe invalidate icache of jump buffer 2006-07-31 13:28:38 -07:00
ksysfs.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
kthread.c [PATCH] remove kernel/kthread.c:kthread_stop_sem() 2006-07-14 21:53:52 -07:00
lockdep.c [PATCH] lockdep core: improve the lock-chain-hash 2006-09-29 09:18:25 -07:00
lockdep_internals.h [PATCH] lockdep: double the number of stack-trace entries 2006-09-13 07:32:14 -07:00
lockdep_proc.c [PATCH] lockdep: procfs 2006-07-03 15:27:04 -07:00
module.c [PATCH] /sys/modules: allow full length section names 2006-09-29 09:18:23 -07:00
mutex-debug.c [PATCH] lockdep: prove mutex locking correctness 2006-07-03 15:27:04 -07:00
mutex-debug.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
mutex.c [PATCH] lockdep: prove mutex locking correctness 2006-07-03 15:27:04 -07:00
mutex.h [PATCH] lockdep: prove mutex locking correctness 2006-07-03 15:27:04 -07:00
panic.c [PATCH] Add the __stack_chk_fail() function 2006-09-26 10:52:39 +02:00
params.c [PATCH] module_subsys: initialize earlier 2006-09-29 09:18:08 -07:00
pid.c [PATCH] pid: remove temporary debug code in attach_pid 2006-09-27 08:26:19 -07:00
posix-cpu-timers.c [PATCH] posix-timers: Fix the flags handling in posix_cpu_nsleep() 2006-09-29 09:18:15 -07:00
posix-timers.c [PATCH] posix-timers: Fix clock_nanosleep() doesn't return the remaining time in compatibility mode 2006-09-29 09:18:15 -07:00
printk.c [PATCH] PM: make it possible to disable console suspending 2006-09-26 08:49:03 -07:00
profile.c [PATCH] Profiling: require buffer allocation on the correct node 2006-09-26 08:48:50 -07:00
ptrace.c [PATCH] pidspace: is_init() 2006-09-29 09:18:12 -07:00
rcupdate.c [PATCH] rcu_do_batch: make ->qlen decrement irq safe 2006-09-13 07:32:14 -07:00
rcutorture.c [PATCH] rcu: add lock annotations to rcu{,_bh}_torture_read_{lock,unlock} 2006-09-29 09:18:08 -07:00
relay.c [PATCH] kernel-doc for relay interface 2006-09-29 09:18:06 -07:00
resource.c Resources: insert identical resources above existing resources 2006-09-26 17:43:52 -07:00
rtmutex-debug.c [PATCH] sched: cleanup, remove task_t, convert to struct task_struct 2006-07-03 15:27:11 -07:00
rtmutex-debug.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
rtmutex-tester.c [PATCH] Add try_to_freeze() to rt-test kthreads 2006-07-14 21:53:53 -07:00
rtmutex.c [PATCH] clean up and remove some extra spinlocks from rtmutex 2006-09-29 09:18:09 -07:00
rtmutex.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
rtmutex_common.h [PATCH] pi-futex: futex_lock_pi/futex_unlock_pi support 2006-06-27 17:32:47 -07:00
rwsem.c [PATCH] lockdep: prove rwsem locking correctness 2006-07-03 15:27:04 -07:00
sched.c [PATCH] introduce TASK_DEAD state 2006-09-29 09:18:21 -07:00
seccomp.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
signal.c [PATCH] __dequeue_signal() cleanup 2006-09-29 09:18:15 -07:00
softirq.c [PATCH] check return value of cpu_callback 2006-09-29 09:18:14 -07:00
softlockup.c [PATCH] check return value of cpu_callback 2006-09-29 09:18:14 -07:00
spinlock.c [PATCH] remove generic__raw_read_trylock() 2006-09-29 09:18:03 -07:00
stacktrace.c [PATCH] lockdep: stacktrace subsystem, core 2006-07-03 15:27:02 -07:00
stop_machine.c [PATCH] stop_machine.c copyright 2006-09-29 09:18:24 -07:00
sys.c [PATCH] kill extraneous printk in kernel_restart() 2006-09-29 09:18:16 -07:00
sys_ni.c [PATCH] sys_move_pages: 32bit support (i386, x86_64) 2006-06-23 07:42:53 -07:00
sysctl.c [PATCH] pidspace: is_init() 2006-09-29 09:18:12 -07:00
taskstats.c [NETLINK]: Extend netlink messaging interface 2006-09-22 14:53:43 -07:00
time.c [PATCH] Time: Introduce arch generic time accessors 2006-06-26 09:58:20 -07:00
timer.c [PATCH] simplify update_times (avoid jiffies/jiffies_64 aliasing problem) 2006-09-29 09:18:15 -07:00
uid16.c [PATCH] Add more prevent_tail_call() 2006-04-19 16:27:18 -07:00
unwind.c [PATCH] unwind: fix unused variable warning when !CONFIG_MODULES 2006-09-29 09:18:11 -07:00
user.c [PATCH] selinux: add hooks for key subsystem 2006-06-22 15:05:55 -07:00
wait.c [PATCH] uninline init_waitqueue_head() 2006-07-10 13:24:25 -07:00
workqueue.c [PATCH] workqueue: remove lock_cpu_hotplug() 2006-08-14 12:54:29 -07:00