Commit Graph

9696 Commits

Author SHA1 Message Date
Randy Dunlap eed34d0fc5 [PATCH] kernel-doc for kernel/dma.c
Add kernel-doc function headers in kernel/dma.c and use it in DocBook.

Clean up kernel-doc in mca_dma.h (the colon (':') represents a
section header).

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:03:41 -07:00
Franck Bui-Huu ffc5089196 [PATCH] Create kallsyms_lookup_size_offset()
Some uses of kallsyms_lookup() do not need to find out the name of a symbol
and its module's name it belongs.  This is specially true in arch specific
code, which needs to unwind the stack to show the back trace during oops
(mips is an example).  In this specific case, we just need to retreive the
function's size and the offset of the active intruction inside it.

Adds a new entry "kallsyms_lookup_size_offset()" This new entry does
exactly the same as kallsyms_lookup() but does not require any buffers to
store any names.

It returns 0 if it fails otherwise 1.

Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:03:41 -07:00
David Howells afefdbb28a [PATCH] VFS: Make filldir_t and struct kstat deal in 64-bit inode numbers
These patches make the kernel pass 64-bit inode numbers internally when
communicating to userspace, even on a 32-bit system.  They are required
because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS
for example.  The 64-bit inode numbers are then propagated to userspace
automatically where the arch supports it.

Problems have been seen with userspace (eg: ld.so) using the 64-bit inode
number returned by stat64() or getdents64() to differentiate files, and
failing because the 64-bit inode number space was compressed to 32-bits, and
so overlaps occur.

This patch:

Make filldir_t take a 64-bit inode number and struct kstat carry a 64-bit
inode number so that 64-bit inode numbers can be passed back to userspace.

The stat functions then returns the full 64-bit inode number where
available and where possible.  If it is not possible to represent the inode
number supplied by the filesystem in the field provided by userspace, then
error EOVERFLOW will be issued.

Similarly, the getdents/readdir functions now pass the full 64-bit inode
number to userspace where possible, returning EOVERFLOW instead when a
directory entry is encountered that can't be properly represented.

Note that this means that some inodes will not be stat'able on a 32-bit
system with old libraries where they were before - but it does mean that
there will be no ambiguity over what a 32-bit inode number refers to.

Note similarly that directory scans may be cut short with an error on a
32-bit system with old libraries where the scan would work before for the
same reasons.

It is judged unlikely that this situation will occur because modern glibc
uses 64-bit capable versions of stat and getdents class functions
exclusively, and that older systems are unlikely to encounter
unrepresentable inode numbers anyway.

[akpm: alpha build fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:03:40 -07:00
Andrew Morton 1d32849b14 [PATCH] pid.h cleanup
Make the pid.h macros look less revolting in an 80-col window.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03 08:03:40 -07:00
Paul Mundt fac99d9746 sh: Fixup __raw_read_trylock().
generic__raw_read_trylock() was broken, fix up the __raw_read_trylock()
implementation for something sensible. Taken from m32r, which has the
same use cases.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-10-03 14:13:09 +09:00
Paul Mundt 2914d4da17 sh: Kill off remaining config.h references.
A few of these managed to sneak back in, get rid of them once
and for all.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-10-03 13:19:02 +09:00
Paul Mundt 3e6c999de9 sh: Initial gitignore list
Ignore build-time generated files.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-10-03 13:16:15 +09:00
Paul Mundt 711fa80968 sh: build fixes for defconfigs.
Get all of the defconfigs building again.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-10-03 13:14:04 +09:00
Paul Mundt 059fbd6a5e sh: Kill off more dead headers.
Some old rtc and io headers were left hanging around, kill them off..

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2006-10-03 13:12:38 +09:00
Linus Torvalds b65d04a785 Merge master.kernel.org:/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog
* master.kernel.org:/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  [WATCHDOG] improve machzwd detection
  [WATCHDOG] use ENOTTY instead of ENOIOCTLCMD in ioctl()
  [WATCHDOG] s3c24XX nowayout
  [WATCHDOG] pnx4008: add cpu_relax()
  [WATCHDOG] pnx4008_wdt.c - spinlock fixes.
  [WATCHDOG] pnx4008_wdt.c - remove patch
  [WATCHDOG] pnx4008_wdt.c - nowayout patch
  [WATCHDOG] pnx4008: add watchdog support
  [WATCHDOG] i8xx_tco remove pci_find_device.
  [WATCHDOG] alim remove pci_find_device
2006-10-02 14:48:07 -07:00
David S. Miller 36d046bbbd [SPARC64]: Do not include compat.h from asm-sparc64/signal.h any more.
It's not needed, now that all of that stuff is now in
asm/compat_signal.h, and it breaks the build too :-)

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-02 14:30:45 -07:00
David S. Miller 14cc6abada [SPARC64]: Move signal compat bits to new header file.
Create asm-sparc64/compat_signal.h and stuff things there.

This avoids the "linux/compat.h includes asm/signal.h but
asm/signal.h needs compat_sigset_t which isn't defined yet"
problems introduced recently.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-02 14:24:18 -07:00
Linus Torvalds 0235497f7a Add prototype for sigset_from_compat()
Duh.  I screwed up editing David Howells patch in commit
3f2e05e90e, and the actual declaration for
the sigset_from_compat() function went missing. My bad.

Olaf Hering saved the day and noticed that I'm a moron.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 14:05:20 -07:00
Vitaly Wool 9325fa3615 [WATCHDOG] pnx4008: add watchdog support
Add watchdog support for Philips PNX4008 ARM board inlined.

Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2006-10-02 23:02:37 +02:00
Linus Torvalds 3e04767a46 Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
  [MTD] Cleanup of 'ioremap balanced with iounmap for drivers/mtd subsystem'
  [MTD] fix nftl_write warning
  [MTD] fix printk warning
  [MTD ONENAND] Check OneNAND lock scheme & all block unlock command support
  [MTD ONENAND] Remove unused MTD_ONENAND_SYNC_READ configuration
  [MTD ONENAND] Fix OneNAND probe
  [MTD NAND] Provide prototype for newly-exported nand_wait_ready()
  [MTD] Remove #ifndef __KERNEL__ hack in <mtd/mtd-abi.h>
  [MTD NAND] Allow override of page read and write functions.
  [MTD NAND] Allocate chip->buffers separately to allow it to be overridden
  [MTD NAND] Split nand_scan() into two parts; allow board driver to intervene
  [MTD NAND] Export nand_wait_ready() for use by board drivers
2006-10-02 08:22:17 -07:00
Linus Torvalds a12f66fccf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (35 commits)
  Input: wistron - add support for Acer TravelMate 2424NWXCi
  Input: wistron - fix setting up special buttons
  Input: add KEY_BLUETOOTH and KEY_WLAN definitions
  Input: add new BUS_VIRTUAL bus type
  Input: add driver for stowaway serial keyboards
  Input: make input_register_handler() return error codes
  Input: remove cruft that was needed for transition to sysfs
  Input: fix input module refcounting
  Input: constify input core
  Input: libps2 - rearrange exports
  Input: atkbd - support Microsoft Natural Elite Pro keyboards
  Input: i8042 - disable MUX mode on Toshiba Equium A110
  Input: i8042 - get rid of polling timer
  Input: send key up events at disconnect
  Input: constify psmouse driver
  Input: i8042 - add Amoi to the MUX blacklist
  Input: logips2pp - add sugnature 56 (Cordless MouseMan Wheel), cleanup
  Input: add driver for Touchwin serial touchscreens
  Input: add driver for Touchright serial touchscreens
  Input: add driver for Penmount serial touchscreens
  ...
2006-10-02 08:20:33 -07:00
Linus Torvalds 12dce6263d Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] Remove unused galileo-boars header files
  [MIPS] Rename SERIAL_PORT_DEFNS for EV64120
  [MIPS] Add UART IRQ number for EV64120
  [MIPS] Remove excite_flash.c
  [MIPS] Update i8259 resources.
  [MIPS] Make unwind_stack() can dig into interrupted context
  [MIPS] Stacktrace build-fix and improvement
  [MIPS] QEMU: Add support for little endian mips
  [MIPS] Remove __flush_icache_page
  [MIPS] lockdep: update defconfigs
  [MIPS] lockdep: Add STACKTRACE_SUPPORT and enable LOCKDEP_SUPPORT
  [MIPS] lockdep: fix TRACE_IRQFLAGS_SUPPORT
2006-10-02 08:18:43 -07:00
David Howells 3f2e05e90e [PATCH] BLOCK: Revert patch to hack around undeclared sigset_t in linux/compat.h
Revert Andrew Morton's patch to temporarily hack around the lack of a
declaration of sigset_t in linux/compat.h to make the block-disablement
patches build on IA64.  This got accidentally pushed to Linus and should
be fixed in a different manner.

Also make linux/compat.h #include asm/signal.h to gain a definition of
sigset_t so that it can externally declare sigset_from_compat().

This has been compile-tested for i386, x86_64, ia64, mips, mips64, frv, ppc and
ppc64 and run-tested on frv.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 08:03:31 -07:00
Cedric Le Goater 9ec52099e4 [PATCH] replace cad_pid by a struct pid
There are a few places in the kernel where the init task is signaled.  The
ctrl+alt+del sequence is one them.  It kills a task, usually init, using a
cached pid (cad_pid).

This patch replaces the pid_t by a struct pid to avoid pid wrap around
problem.  The struct pid is initialized at boot time in init() and can be
modified through systctl with

	/proc/sys/kernel/cad_pid

[ I haven't found any distro using it ? ]

It also introduces a small helper routine kill_cad_pid() which is used
where it seemed ok to use cad_pid instead of pid 1.

[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:25 -07:00
Oleg Nesterov 1a657f78dc [PATCH] introduce get_task_pid() to fix unsafe get_pid()
proc_pid_make_inode:

	ei->pid = get_pid(task_pid(task));

I think this is not safe.  get_pid() can be preempted after checking "pid
!= NULL".  Then the task exits, does detach_pid(), and RCU frees the pid.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:25 -07:00
Haavard Skinnemoen c5f2420a06 [PATCH] AVR32: Implement kernel_execve
Move execve() into arch/avr32/kernel/sys_avr32.c, rename it to
kernel_execve() and return the syscall return value directly without
setting errno.

This also gets rid of the __KERNEL_SYSCALLS__ stuff from unistd.h and
expands #ifdef __KERNEL__ to cover everything in unistd.h except the
__NR_foo definitions.

Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:24 -07:00
Arnd Bergmann 135ab6ec8f [PATCH] remove remaining errno and __KERNEL_SYSCALLS__ references
The last in-kernel user of errno is gone, so we should remove the definition
and everything referring to it.  This also removes the now-unused lib/execve.c
file that was introduced earlier.

Also remove every trace of __KERNEL_SYSCALLS__ that still remained in the
kernel.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:23 -07:00
Arnd Bergmann 3db03b4afb [PATCH] rename the provided execve functions to kernel_execve
Some architectures provide an execve function that does not set errno, but
instead returns the result code directly.  Rename these to kernel_execve to
get the right semantics there.  Moreover, there is no reasone for any of these
architectures to still provide __KERNEL_SYSCALLS__ or _syscallN macros, so
remove these right away.

[akpm@osdl.org: build fix]
[bunk@stusta.de: build fix]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:23 -07:00
Kirill Korotaev 73ea41302b [PATCH] IPC namespace - utils
This patch adds basic IPC namespace functionality to
IPC utils:
- init_ipc_ns
- copy/clone/unshare/free IPC ns
- /proc preparations

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:22 -07:00
Kirill Korotaev 25b21cb2f6 [PATCH] IPC namespace core
This patch set allows to unshare IPCs and have a private set of IPC objects
(sem, shm, msg) inside namespace.  Basically, it is another building block of
containers functionality.

This patch implements core IPC namespace changes:
- ipc_namespace structure
- new config option CONFIG_IPC_NS
- adds CLONE_NEWIPC flag
- unshare support

[clg@fr.ibm.com: small fix for unshare of ipc namespace]
[akpm@osdl.org: build fix]
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:22 -07:00
Serge E. Hallyn 071df104f8 [PATCH] namespaces: utsname: implement CLONE_NEWUTS flag
Implement a CLONE_NEWUTS flag, and use it at clone and sys_unshare.

[clg@fr.ibm.com: IPC unshare fix]
[bunk@stusta.de: cleanup]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:22 -07:00
Serge E. Hallyn bf47fdcda6 [PATCH] namespaces: utsname: remove system_utsname
The system_utsname isn't needed now that kernel/sysctl.c is fixed.
Nuke it.

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:21 -07:00
Serge E. Hallyn 4865ecf131 [PATCH] namespaces: utsname: implement utsname namespaces
This patch defines the uts namespace and some manipulators.
Adds the uts namespace to task_struct, and initializes a
system-wide init namespace.

It leaves a #define for system_utsname so sysctl will compile.
This define will be removed in a separate patch.

[akpm@osdl.org: build fix, cleanup]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:21 -07:00
Serge E. Hallyn 96b644bdec [PATCH] namespaces: utsname: use init_utsname when appropriate
In some places, particularly drivers and __init code, the init utsns is the
appropriate one to use.  This patch replaces those with a the init_utsname
helper.

Changes: Removed several uses of init_utsname().  Hope I picked all the
	right ones in net/ipv4/ipconfig.c.  These are now changed to
	utsname() (the per-process namespace utsname) in the previous
	patch (2/7)

[akpm@osdl.org: CIFS fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:21 -07:00
Serge E. Hallyn e9ff3990f0 [PATCH] namespaces: utsname: switch to using uts namespaces
Replace references to system_utsname to the per-process uts namespace
where appropriate.  This includes things like uname.

Changes: Per Eric Biederman's comments, use the per-process uts namespace
	for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c

[jdike@addtoit.com: UML fix]
[clg@fr.ibm.com: cleanup]
[akpm@osdl.org: build fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:21 -07:00
Serge E. Hallyn 0bdd7aab7f [PATCH] namespaces: utsname: introduce temporary helpers
Define utsname() and init_utsname() which return &system_utsname.  Users of
system_utsname will be changed to use these helpers, after which
system_utsname will disappear.

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:21 -07:00
Serge E. Hallyn 1651e14e28 [PATCH] namespaces: incorporate fs namespace into nsproxy
This moves the mount namespace into the nsproxy.  The mount namespace count
now refers to the number of nsproxies point to it, rather than the number of
tasks.  As a result, the unshare_namespace() function in kernel/fork.c no
longer checks whether it is being shared.

Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Serge E. Hallyn ab516013ad [PATCH] namespaces: add nsproxy
This patch adds a nsproxy structure to the task struct.  Later patches will
move the fs namespace pointer into this structure, and introduce a new utsname
namespace into the nsproxy.

The vserver and openvz functionality, then, would be implemented in large part
by virtualizing/isolating more and more resources into namespaces, each
contained in the nsproxy.

[akpm@osdl.org: build fix]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Peter Zijlstra 12fd352038 [PATCH] nfsd: lockdep annotation
while doing a kernel make modules_install install over an NFS mount.

  =============================================
  [ INFO: possible recursive locking detected ]
  ---------------------------------------------
  nfsd/9550 is trying to acquire lock:
   (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

  but task is already holding lock:
   (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

  other info that might help us debug this:
  2 locks held by nfsd/9550:
   #0:  (hash_sem){..--}, at: [<cc895223>] exp_readlock+0xd/0xf [nfsd]
   #1:  (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f

  stack backtrace:
   [<c0103508>] show_trace_log_lvl+0x58/0x152
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa57>] __lock_acquire+0x77a/0x9a3
   [<c012af4a>] lock_acquire+0x60/0x80
   [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034c845>] mutex_lock+0x1c/0x1f
   [<c0162edc>] vfs_unlink+0x34/0x8a
   [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd]
   [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033e84d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb
  DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
  Leftover inexact backtrace:
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa57>] __lock_acquire+0x77a/0x9a3
   [<c012af4a>] lock_acquire+0x60/0x80
   [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034c845>] mutex_lock+0x1c/0x1f
   [<c0162edc>] vfs_unlink+0x34/0x8a
   [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd]
   [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033e84d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb

  =============================================
  [ INFO: possible recursive locking detected ]
  ---------------------------------------------
  nfsd/9580 is trying to acquire lock:
   (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

  but task is already holding lock:
   (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

  other info that might help us debug this:
  2 locks held by nfsd/9580:
   #0:  (hash_sem){..--}, at: [<cc89522b>] exp_readlock+0xd/0xf [nfsd]
   #1:  (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f

  stack backtrace:
   [<c0103508>] show_trace_log_lvl+0x58/0x152
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa63>] __lock_acquire+0x77a/0x9a3
   [<c012af56>] lock_acquire+0x60/0x80
   [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034cc1d>] mutex_lock+0x1c/0x1f
   [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd]
   [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd]
   [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033ec1d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb
  DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
  Leftover inexact backtrace:
   [<c0103b8b>] show_trace+0xd/0x10
   [<c0103c2f>] dump_stack+0x19/0x1b
   [<c012aa63>] __lock_acquire+0x77a/0x9a3
   [<c012af56>] lock_acquire+0x60/0x80
   [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e
   [<c034cc1d>] mutex_lock+0x1c/0x1f
   [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd]
   [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd]
   [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd]
   [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
   [<c033ec1d>] svc_process+0x3a5/0x5ed
   [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
   [<c0101005>] kernel_thread_helper+0x5/0xb

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Neil Brown <neilb@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Greg Banks bfd241600a [PATCH] knfsd: make rpc threads pools numa aware
Actually implement multiple pools.  On NUMA machines, allocate a svc_pool per
NUMA node; on SMP a svc_pool per CPU; otherwise a single global pool.  Enqueue
sockets on the svc_pool corresponding to the CPU on which the socket bh is run
(i.e.  the NIC interrupt CPU).  Threads have their cpu mask set to limit them
to the CPUs in the svc_pool that owns them.

This is the patch that allows an Altix to scale NFS traffic linearly
beyond 4 CPUs and 4 NICs.

Incorporates changes and feedback from Neil Brown, Trond Myklebust, and
Christoph Hellwig.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:20 -07:00
Greg Banks a74554429e [PATCH] knfsd: add svc_set_num_threads
Currently knfsd keeps its own list of all nfsd threads in nfssvc.c; add a new
way of managing the list of all threads in a svc_serv.  Add
svc_create_pooled() to allow creation of a svc_serv whose threads are managed
by the sunrpc code.  Add svc_set_num_threads() to manage the number of threads
in a service, either per-pool or globally across the service.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:19 -07:00
Greg Banks 9a24ab5749 [PATCH] knfsd: add svc_get
add svc_get() for those occasions when we need to temporarily bump up
svc_serv->sv_nrthreads as a pseudo refcount.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:19 -07:00
Greg Banks 3262c816a3 [PATCH] knfsd: split svc_serv into pools
Split out the list of idle threads and pending sockets from svc_serv into a
new svc_pool structure, and allocate a fixed number (in this patch, 1) of
pools per svc_serv.  The new structure contains a lock which takes over
several of the duties of svc_serv->sv_lock, which is now relegated to
protecting only sv_tempsocks, sv_permsocks, and sv_tmpcnt in svc_serv.

The point is to move the hottest fields out of svc_serv and into svc_pool,
allowing a following patch to arrange for a svc_pool per NUMA node or per CPU.
 This is a major step towards making the NFS server NUMA-friendly.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:19 -07:00
Greg Banks 5685f0fa1c [PATCH] knfsd: convert sk_reserved to atomic_t
Convert the svc_sock->sk_reserved variable from an int protected by
svc_serv->sv_lock, to an atomic.  This reduces (by 1) the number of places we
need to take the (effectively global) svc_serv->sv_lock.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:19 -07:00
Greg Banks 1a68d952af [PATCH] knfsd: use new lock for svc_sock deferred list
Protect the svc_sock->sk_deferred list with a new lock svc_sock->sk_defer_lock
instead of svc_serv->sv_lock.  Using the more fine-grained lock reduces the
number of places we need to take the svc_serv lock.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:19 -07:00
Greg Banks c45c357d7d [PATCH] knfsd: convert sk_inuse to atomic_t
Convert the svc_sock->sk_inuse counter from an int protected by
svc_serv->sv_lock, to an atomic.  This reduces the number of places we need to
take the (effectively global) svc_serv->sv_lock.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:19 -07:00
Greg Banks 36bdfc8bae [PATCH] knfsd: move tempsock aging to a timer
Following are 11 patches from Greg Banks which combine to make knfsd more
Numa-aware.  They reduce hitting on 'global' data structures, and create some
data-structures that can be node-local.

knfsd threads are bound to a particular node, and the thread to handle a new
request is chosen from the threads that are attach to the node that received
the interrupt.

The distribution of threads across nodes can be controlled by a new file in
the 'nfsd' filesystem, though the default approach of an even spread is
probably fine for most sites.

Some (old) numbers that show the efficacy of these patches: N == number of
NICs == number of CPUs == nmber of clients.  Number of NUMA nodes == N/2

N	Throughput, MiB/s	CPU usage, % (max=N*100)
	Before	After		Before	After
	---	------	----		-----	-----
	4	312	435		350	228
	6	500	656		501	418
	8	562	804		690	589

This patch:

Move the aging of RPC/TCP connection sockets from the main svc_recv() loop to
a timer which uses a mark-and-sweep algorithm every 6 minutes.  This reduces
the amount of work that needs to be done in the main RPC loop and the length
of time we need to hold the (effectively global) svc_serv->sv_lock.

[akpm@osdl.org: cleanup]
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:19 -07:00
NeilBrown 6fb2b47fa1 [PATCH] knfsd: Drop 'serv' option to svc_recv and svc_process
It isn't needed as it is available in rqstp->rq_server, and dropping it allows
some local vars to be dropped.

[akpm@osdl.org: build fix]
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown b41b66d63c [PATCH] knfsd: allow sockets to be passed to nfsd via 'portlist'
Userspace should create and bind a socket (but not connectted) and write the
'fd' to portlist.  This will cause the nfs server to listen on that socket.

To close a socket, the name of the socket - as read from 'portlist' can be
written to 'portlist' with a preceding '-'.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown 80212d59e3 [PATCH] knfsd: define new nfsdfs file: portlist - contains list of ports
This file will list all ports that nfsd has open.
Default when TCP enabled will be
   ipv4 udp 0.0.0.0 2049
   ipv4 tcp 0.0.0.0 2049

Later, the list of ports will be settable.

'portlist' chosen rather than 'ports', to avoid unnecessary confusion with
non-mainline patches which created 'ports' with different semantics.

[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:18 -07:00
NeilBrown 6658d3a7bb [PATCH] knfsd: remove nfsd_versbits as intermediate storage for desired versions
We have an array 'nfsd_version' which lists the available versions of nfsd,
and 'nfsd_versions' (poor choice there :-() which lists the currently active
versions.

Then we have a bitmap - nfsd_versbits which says which versions are wanted.
The bits in this bitset cause content to be copied from nfsd_version to
nfsd_versions when nfsd starts.

This patch removes nfsd_versbits and moves information directly from
nfsd_version to nfsd_versions when requests for version changes arrive.

Note that this doesn't make it possible to change versions while the server is
running.  This is because serv->sv_xdrsize is calculated when a service is
created, and used when threads are created, and xdrsize depends on the active
versions.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:17 -07:00
NeilBrown 24e36663c3 [PATCH] knfsd: be more selective in which sockets lockd listens on
Currently lockd listens on UDP always, and TCP if CONFIG_NFSD_TCP is set.

However as lockd performs services of the client as well, this is a problem.
If CONFIG_NfSD_TCP is not set, and a tcp mount is used, the server will not be
able to call back to lockd.

So:
 - add an option to lockd_up saying which protocol is needed
 - Always open sockets for which an explicit port was given, otherwise
   only open a socket of the type required
 - Change nfsd to do one lockd_up per socket rather than one per thread.

This
 - removes the dependancy on CONFIG_NFSD_TCP
 - means that lockd may open sockets other than at startup
 - means that lockd will *not* listen on UDP if the only
   mounts are TCP mount (and nfsd hasn't started).

The latter is the only one that concerns me at all - I don't know if this
might be a problem with some servers.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:17 -07:00
NeilBrown bc591ccff2 [PATCH] knfsd: add a callback for when last rpc thread finishes
nfsd has some cleanup that it wants to do when the last thread exits, and
there will shortly be some more.  So collect this all into one place and
define a callback for an rpc service to call when the service is about to be
destroyed.

[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:17 -07:00
Greg Banks 0f532f3861 [PATCH] cpumask: add highest_possible_node_id
cpumask: add highest_possible_node_id(), analogous to
highest_possible_processor_id().

[pj@sgi.com: fix typo]
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:17 -07:00
bibo,mao 99219a3fbc [PATCH] kretprobe spinlock deadlock patch
kprobe_flush_task() possibly calls kfree function during holding
kretprobe_lock spinlock, if kfree function is probed by kretprobe that will
incur spinlock deadlock.  This patch moves kfree function out scope of
kretprobe_lock.

Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:16 -07:00
Ananth N Mavinakayanahalli b3f827cb0f [PATCH] Add regs_return_value() helper
Add the regs_return_value() macro to extract the return value in an
architecture agnostic manner, given the pt_regs.

Other architecture maintainers may want to add similar helpers.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:16 -07:00
Ananth N Mavinakayanahalli 412998cf6b [PATCH] kprobes: handle symbol resolution when <module:.symbol> is specified
kallsyms_lookup_name() allows for <module:symbol> style specification for
looking up symbol addresses.  Handle the case where the user specifies
<module:.symbol> on powerpc, given that 64-bit powerpc uses function
descriptors.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:16 -07:00
Ananth N Mavinakayanahalli 3a872d89ba [PATCH] Kprobes: Make kprobe modules more portable
In an effort to make kprobe modules more portable, here is a patch that:

o Introduces the "symbol_name" field to struct kprobe.
  The symbol->address resolution now happens in the kernel in an
  architecture agnostic manner. 64-bit powerpc users no longer have
  to specify the ".symbols"
o Introduces the "offset" field to struct kprobe to allow a user to
  specify an offset into a symbol.
o The legacy mechanism of specifying the kprobe.addr is still supported.
  However, if both the kprobe.addr and kprobe.symbol_name are specified,
  probe registration fails with an -EINVAL.
o The symbol resolution code uses kallsyms_lookup_name(). So
  CONFIG_KPROBES now depends on CONFIG_KALLSYMS
o Apparantly kprobe modules were the only legitimate out-of-tree user of
  the kallsyms_lookup_name() EXPORT. Now that the symbol resolution
  happens in-kernel, remove the EXPORT as suggested by Christoph Hellwig
o Modify tcp_probe.c that uses the kprobe interface so as to make it
  work on multiple platforms (in its earlier form, the code wouldn't
  work, say, on powerpc)

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:16 -07:00
Eric W. Biederman 2425c08b37 [PATCH] usb: fixup usb so it uses struct pid
The problem with remembering a user space process by its pid is that it is
possible that the process will exit, pid wrap around will occur.
Converting to a struct pid avoid that problem, and paves the way for
implementing a pid namespace.

Also since usb is the only user of kill_proc_info_as_uid rename
kill_proc_info_as_uid to kill_pid_info_as_uid and have the new version take
a struct pid.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:15 -07:00
Sukadev Bhattiprolu 3fbc964864 [PATCH] Define struct pspace
Define a per-container pid space object.  And create one instance of this
object, init_pspace, to define the entire pid space.  Subsequent patches
will provide/use interfaces to create/destroy pid spaces.

Its a subset/rework of Eric Biederman's patch
http://lkml.org/lkml/2006/2/6/285 .

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:15 -07:00
Sukadev Bhattiprolu aa5a6662f9 [PATCH] Move pidmap to pspace.h
Move struct pidmap and PIDMAP_ENTRIES to a new file, include/linux/pspace.h
where it will be used in subsequent patches to define pid spaces.

Its a subset of Eric Biederman's patch http://lkml.org/lkml/2006/2/6/285

[akpm@osdl.org: cleanups]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:15 -07:00
Oleg Nesterov d387cae075 [PATCH] pid: simplify pid iterators
I think it is hardly possible to read the current do_each_task_pid().  The
new version is much simpler and makes the code smaller.

Only the do_each_task_pid change is tested, the do_each_pid_task isn't.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:15 -07:00
Jeff Dike b68e31d0eb [PATCH] const struct tty_operations
As part of an SMP cleanliness pass over UML, I consted a bunch of
structures in order to not have to document their locking.  One of these
structures was a struct tty_operations.  In order to const it in UML
without introducing compiler complaints, the declaration of
tty_set_operations needs to be changed, and then all of its callers need to
be fixed.

This patch declares all struct tty_operations in the tree as const.  In all
cases, they are static and used only as input to tty_set_operations.  As an
extra check, I ran an i386 allyesconfig build which produced no extra
warnings.

53 drivers are affected.  I checked the history of a bunch of them, and in
most cases, there have been only a handful of maintenance changes in the
last six months.  serial_core.c was the busiest one that I looked at.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:14 -07:00
Eric W. Biederman 609d7fa956 [PATCH] file: modify struct fown_struct to use a struct pid
File handles can be requested to send sigio and sigurg to processes.  By
tracking the destination processes using struct pid instead of pid_t we make
the interface safe from all potential pid wrap around problems.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:14 -07:00
Eric W. Biederman bde0d2c98b [PATCH] vt: Make vt_pid a struct pid (making it pid wrap around safe).
I took a good hard look at the locking and it appears the locking on vt_pid
is the console semaphore.  Every modified path is called under the console
semaphore except reset_vc when it is called from fn_SAK or do_SAK both of
which appear to be in interrupt context.  In addition I need to be careful
because in the presence of an oops the console_sem may be arbitrarily
dropped.

Which leads me to conclude the current locking is inadequate for my needs.

Given the weird cases we could hit because of oops printing instead of
introducing an extra spin lock to protect the data and keep the pid to
signal and the signal to send in sync, I have opted to use xchg on just the
struct pid * pointer instead.

Due to console_sem we will stay in sync between vt_pid and vt_mode except
for a small window during a SAK, or oops handling.  SAK handling should
kill any user space process that care, and oops handling we are broken
anyway.  Besides the worst that can happen is that I try to send the wrong
signal.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:14 -07:00
Eric W. Biederman 81af8d67d4 [PATCH] vt: rework the console spawning variables
This is such a rare path it took me a while to figure out how to test
this after soring out the locking.

This patch does several things.
- The variables used are moved into a structure and declared in vt_kern.h
- A spinlock is added so we don't have SMP races updating the values.
- Instead of raw pid_t value a struct_pid is used to guard against
  pid wrap around issues, if the daemon to spawn a new console dies.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:13 -07:00
Eric W. Biederman 5feb8f5f84 [PATCH] pid: implement pid_nr
As we stop storing pid_t's and move to storing struct pid *.  We need a way to
get the pid_t from the struct pid to report to user space what we have stored.

Having a clean well defined way to do this is especially important as we move
to multiple pid spaces as may need to report a different value to the caller
depending on which pid space the caller is in.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:13 -07:00
Eric W. Biederman c4b92fc112 [PATCH] pid: implement signal functions that take a struct pid *
Currently the signal functions all either take a task or a pid_t argument.
This patch implements variants that take a struct pid *.  After all of the
users have been update it is my intention to remove the variants that take a
pid_t as using pid_t can be more work (an extra hash table lookup) and
difficult to get right in the presence of multiple pid namespaces.

There are two kinds of functions introduced in this patch.  The are the
general use functions kill_pgrp and kill_pid which take a priv argument that
is ultimately used to create the appropriate siginfo information, Then there
are _kill_pgrp_info, kill_pgrp_info, kill_pid_info the internal implementation
helpers that take an explicit siginfo.

The distinction is made because filling out an explcit siginfo is tricky, and
will be even more tricky when pid namespaces are introduced.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:13 -07:00
Eric W. Biederman 558cb32548 [PATCH] pid: add do_each_pid_task
To avoid pid rollover confusion the kernel needs to work with struct pid *
instead of pid_t.  Currently there is not an iterator that walks through all
of the tasks of a given pid type starting with a struct pid.  This prevents us
replacing some pid_t instances with struct pid.  So this patch adds
do_each_pid_task which walks through the set of task for a given pid type
starting with a struct pid.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:13 -07:00
Eric W. Biederman 22c935f47c [PATCH] pid: implement access helpers for a tacks various process groups
In the last round of cleaning up the pid hash table a more general struct pid
was introduced, that can be referenced counted.

With the more general struct pid most if not all places where we store a pid_t
we can now store a struct pid * and remove the need for a hash table lookup,
and avoid any possible problems with pid roll over.

Looking forward to the pid namespaces struct pid * gives us an absolute form a
pid so we can compare and use them without caring which pid namespace we are
in.

This patchset introduces the infrastructure needed to use struct pid instead
of pid_t, and then it goes on to convert two different kernel users that
currently store a pid_t value.

There are a lot more places to go but this is enough to get the basic idea.

Before we can merge a pid namespace patch all of the kernel pid_t users need
to be examined.  Those that deal with user space processes need to be
converted to using a struct pid *.  Those that deal with kernel processes need
to converted to using the kthread api.  A rare few that only use their current
processes pid values get to be left alone.

This patch:

task_session returns the struct pid of a tasks session.
task_pgrp    returns the struct pid of a tasks process group.
task_tgid    returns the struct pid of a tasks thread group.
task_pid     returns the struct pid of a tasks process id.

These can be used to avoid unnecessary hash table lookups, and to implement
safe pid comparisions in the face of a pid namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:13 -07:00
Eric W. Biederman 20cdc894c4 [PATCH] proc: modify proc_pident_lookup to be completely table driven
Currently proc_pident_lookup gets the names and types from a table and then
has a huge switch statement to get the inode and file operations it needs.
That is silly and is becoming increasingly hard to maintain so I just put all
of the information in the table.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:13 -07:00
Eric W. Biederman 0804ef4b0d [PATCH] proc: readdir race fix (take 3)
The problem: An opendir, readdir, closedir sequence can fail to report
process ids that are continually in use throughout the sequence of system
calls.  For this race to trigger the process that proc_pid_readdir stops at
must exit before readdir is called again.

This can cause ps to fail to report processes, and it is in violation of
posix guarantees and normal application expectations with respect to
readdir.

Currently there is no way to work around this problem in user space short
of providing a gargantuan buffer to user space so the directory read all
happens in on system call.

This patch implements the normal directory semantics for proc, that
guarantee that a directory entry that is neither created nor destroyed
while reading the directory entry will be returned.  For directory that are
either created or destroyed during the readdir you may or may not see them.
 Furthermore you may seek to a directory offset you have previously seen.

These are the guarantee that ext[23] provides and that posix requires, and
more importantly that user space expects.  Plus it is a simple semantic to
implement reliable service.  It is just a matter of calling readdir a
second time if you are wondering if something new has show up.

These better semantics are implemented by scanning through the pids in
numerical order and by making the file offset a pid plus a fixed offset.

The pid scan happens on the pid bitmap, which when you look at it is
remarkably efficient for a brute force algorithm.  Given that a typical
cache line is 64 bytes and thus covers space for 64*8 == 200 pids.  There
are only 40 cache lines for the entire 32K pid space.  A typical system
will have 100 pids or more so this is actually fewer cache lines we have to
look at to scan a linked list, and the worst case of having to scan the
entire pid bitmap is pretty reasonable.

If we need something more efficient we can go to a more efficient data
structure for indexing the pids, but for now what we have should be
sufficient.

In addition this takes no additional locks and is actually less code than
what we are doing now.

Also another very subtle bug in this area has been fixed.  It is possible
to catch a task in the middle of de_thread where a thread is assuming the
thread of it's thread group leader.  This patch carefully handles that case
so if we hit it we don't fail to return the pid, that is undergoing the
de_thread dance.

Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for
providing the first fix, pointing this out and working on it.

[oleg@tv-sign.ru: fix it]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:12 -07:00
Randy Dunlap 2bc2d61a96 [PATCH] list module taint flags in Oops/panic
When listing loaded modules during an oops or panic, also list each
module's Tainted flags if non-zero (P: Proprietary or F: Forced load only).

If a module is did not taint the kernel, it is just listed like
	usbcore
but if it did taint the kernel, it is listed like
	wizmodem(PF)

Example:
[ 3260.121718] Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
[ 3260.121729]  [<ffffffff8804c099>] :dump_test:proc_dump_test+0x99/0xc8
[ 3260.121742] PGD fe8d067 PUD 264a6067 PMD 0
[ 3260.121748] Oops: 0002 [1] SMP
[ 3260.121753] CPU 1
[ 3260.121756] Modules linked in: dump_test(P) snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device ide_cd generic ohci1394 snd_hda_intel snd_hda_codec snd_pcm snd_timer snd ieee1394 snd_page_alloc piix ide_core arcmsr aic79xx scsi_transport_spi usblp
[ 3260.121785] Pid: 5556, comm: bash Tainted: P      2.6.18-git10 #1

[Alternatively, I can look into listing tainted flags with 'lsmod',
but that won't help in oopsen/panics so much.]

[akpm@osdl.org: cleanup]
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:12 -07:00
Steve Wise 322acc96d4 [PATCH] LIB: add gen_pool_destroy()
Modules using the genpool allocator need to be able to destroy the data
structure when unloading.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Dean Nelson <dcn@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02 07:57:12 -07:00
Richard Purdie d14b272bc6 [ARM] 3848/1: pxafb: Add option of fixing video modes and spitz QVGA mode support
Add the ability to have pxafb use only certain fixed video modes
(selected on a per platform basis). This is useful on production
hardware such as the Zaurus cxx00 models where the valid modes are
known in advance and any other modes could result in hardware damage.

Following this, add support for the cxx00 QVGA mode. Mode information
is passed to the lcd_power call to allowing the panel drivers to
configure the display hardware accordingly (corgi_lcd already contains
the functionality for the cxx00 panel).

This mirrors the setup already used by w100fb.

Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-10-02 13:33:37 +01:00
Li Yang 5e98082358 [POWERPC] Fix rheap alignment problem
Honor alignment parameter in the rheap allocator.  This is needed by
qe_lib.
Remove compile warning.

Signed-off-by: Pantelis Antoniou <pantelis@embeddedalley.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Acked-by: Kumar Galak <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-10-02 20:27:47 +10:00
Kim Phillips 7a69af63e7 [POWERPC] Add powerpc get/set_rtc_time interface to new generic rtc class
Add powerpc get/set_rtc_time interface to new generic rtc class. This
abstracts rtc chip specific code from the platform code for rtc-over-i2c
platforms.  Specific RTC chip support is now configured under
Device Drivers -> Real Time Clock. Setting time of day from the RTC
on startup is also configurable.

this time without the potentially platform breaking initcall.

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-10-02 17:48:47 +10:00
Yoichi Yuasa 04b314b2c3 [MIPS] Remove unused galileo-boars header files
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-10-01 23:17:00 +01:00
Yoichi Yuasa b772da30b4 [MIPS] Rename SERIAL_PORT_DEFNS for EV64120
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-10-01 23:16:59 +01:00
Yoichi Yuasa 998ec2901a [MIPS] Add UART IRQ number for EV64120
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-10-01 23:16:59 +01:00
Atsushi Nemoto 1924600cdb [MIPS] Make unwind_stack() can dig into interrupted context
If the PC was ret_from_irq or ret_from_exception, there will be no
more normal stackframe.  Instead of stopping the unwinding, use PC and
RA saved by an exception handler to continue unwinding into the
interrupted context.  This also simplifies the CONFIG_STACKTRACE code.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-10-01 23:16:59 +01:00
Atsushi Nemoto c59a0f15be [MIPS] Remove __flush_icache_page
__flash_icache_page is unused, so kill it.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-10-01 23:16:58 +01:00
Atsushi Nemoto 1df0f0ff7e [MIPS] lockdep: Add STACKTRACE_SUPPORT and enable LOCKDEP_SUPPORT
Implement stacktrace interface by using unwind_stack() and enable lockdep
support in Kconfig.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-10-01 23:16:57 +01:00
Atsushi Nemoto eae6c0da9d [MIPS] lockdep: fix TRACE_IRQFLAGS_SUPPORT
In handle_sys and its variants, we must reload some registers which
might be clobbered by trace_hardirqs_on().
Also we must make sure trace_hardirqs_on() called in kernel level (not
exception level).

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2006-10-01 23:16:57 +01:00
Simon Tatham d8d64d6b29 [SERIAL] Magic SysRq SAK does nothing on serial consoles
Make sysrq-K work on serial console by passing in the tty.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-10-01 20:03:20 +01:00
David Woodhouse 8a84fc15ae Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Manually resolve conflict in include/mtd/Kbuild

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-10-01 17:55:53 +01:00
Russell King a6b93a9085 [SERIAL] Fix oops when removing suspended serial port
A serial card might have been removed when the system is resumed.
This results in a suspended port being shut down, which results in
the ports shutdown method being called twice in a row.  This causes
BUGs.  Avoid this by tracking the suspended state separately from
the initialised state.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-10-01 17:17:40 +01:00
Zachary Amsden 5a73fdc5ea [PATCH] Some config.h removals
During tracking down a PAE compile failure, I found that config.h was being
included in a bunch of places in i386 code.  It is no longer necessary, so
drop it.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:34 -07:00
Zachary Amsden 789e6ac0a7 [PATCH] paravirt: update pte hook
Add a pte_update_hook which notifies about pte changes that have been made
without using the set_pte / clear_pte interfaces.  This allows shadow mode
hypervisors which do not trap on page table access to maintain synchronized
shadows.

It also turns out, there was one pte update in PAE mode that wasn't using any
accessor interface at all for setting NX protection.  Considering it is PAE
specific, and the accessor is i386 specific, I didn't want to add a generic
encapsulation of this behavior yet.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:34 -07:00
Zachary Amsden a93cb055a2 [PATCH] paravirt: remove set pte atomic
Now that ptep_establish has a definition in PAE i386 3-level paging code, the
only paging model which is insane enough to have multi-word hardware PTEs
which are not efficient to set atomically, we can remove the ghost of
set_pte_atomic from other architectures which falesly duplicated it, and
remove all knowledge of it from the generic pgtable code.

set_pte_atomic is now a private pte operator which is specific to i386

Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:34 -07:00
Zachary Amsden d6d861e3c9 [PATCH] paravirt: optimize ptep establish for pae
The ptep_establish macro is only used on user-level PTEs, for P->P mapping
changes.  Since these always happen under protection of the pagetable lock,
the strong synchronization of a 64-bit cmpxchg is not needed, in fact, not
even a lock prefix needs to be used.  We can simply instead clear the P-bit,
followed by a normal set.  The write ordering is still important to avoid the
possibility of the TLB snooping a partially written PTE and getting a bad
mapping installed.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:34 -07:00
Zachary Amsden 23002d88be [PATCH] paravirt: kpte flush
Create a new PTE function which combines clearing a kernel PTE with the
subsequent flush.  This allows the two to be easily combined into a single
hypercall or paravirt-op.  More subtly, reverse the order of the flush for
kmap_atomic.  Instead of flushing on establishing a mapping, flush on clearing
a mapping.  This eliminates the possibility of leaving stale kmap entries
which may still have valid TLB mappings.  This is required for direct mode
hypervisors, which need to reprotect all mappings of a given page when
changing the page type from a normal page to a protected page (such as a page
table or descriptor table page).  But it also provides some nicer semantics
for real hardware, by providing extra debug-proofing against using stale
mappings, as well as ensuring that no stale mappings exist when changing the
cacheability attributes of a page, which could lead to cache conflicts when
two different types of mappings exist for the same page.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:34 -07:00
Zachary Amsden 25e4df5bae [PATCH] paravirt: combine flush accessed dirty.patch
Remove ptep_test_and_clear_{dirty|young} from i386, and instead use the
dominating functions, ptep_clear_flush_{dirty|young}.  This allows the TLB
page flush to be contained in the same macro, and allows for an eager
optimization - if reading the PTE initially returned dirty/accessed, we can
assume the fact that no subsequent update to the PTE which cleared accessed /
dirty has occurred, as the only way A/D bits can change without holding the
page table lock is if a remote processor clears them.  This eliminates an
extra branch which came from the generic version of the code, as we know that
no other CPU could have cleared the A/D bit, so the flush will always be
needed.

We still export these two defines, even though we do not actually define
the macros in the i386 code:

 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY

The reason for this is that the only use of these functions is within the
generic clear_flush functions, and we want a strong guarantee that there
are no other users of these functions, so we want to prevent the generic
code from defining them for us.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:34 -07:00
Zachary Amsden 6606c3e0da [PATCH] paravirt: lazy mmu mode hooks.patch
Implement lazy MMU update hooks which are SMP safe for both direct and shadow
page tables.  The idea is that PTE updates and page invalidations while in
lazy mode can be batched into a single hypercall.  We use this in VMI for
shadow page table synchronization, and it is a win.  It also can be used by
PPC and for direct page tables on Xen.

For SMP, the enter / leave must happen under protection of the page table
locks for page tables which are being modified.  This is because otherwise,
you end up with stale state in the batched hypercall, which other CPUs can
race ahead of.  Doing this under the protection of the locks guarantees the
synchronization is correct, and also means that spurious faults which are
generated during this window by remote CPUs are properly handled, as the page
fault handler must re-check the PTE under protection of the same lock.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:33 -07:00
Zachary Amsden 9888a1cae3 [PATCH] paravirt: pte clear not present
Change pte_clear_full to a more appropriately named pte_clear_not_present,
allowing optimizations when not-present mapping changes need not be reflected
in the hardware TLB for protected page table modes.  There is also another
case that can use it in the fremap code.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:33 -07:00
Andi Kleen e239ca5405 [PATCH] Create call_usermodehelper_pipe()
A new member in the ever growing family of call_usermode* functions is
born.  The new call_usermodehelper_pipe() function allows to pipe data to
the stdin of the called user mode progam and behaves otherwise like the
normal call_usermodehelp() (except that it always waits for the child to
finish)

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:33 -07:00
Andi Kleen d6cbd281d1 [PATCH] Some cleanup in the pipe code
Split the big and hard to read do_pipe function into smaller pieces.

This creates new create_write_pipe/free_write_pipe/create_read_pipe
functions.  These functions are made global so that they can be used by
other parts of the kernel.

The resulting code is more generic and easier to read and has cleaner error
handling and less gotos.

[akpm@osdl.org: cleanup]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:33 -07:00
Haavard Skinnemoen 74588d8ba3 [PATCH] Generic ioremap_page_range: implementation
This patch adds a generic implementation of ioremap_page_range() in
lib/ioremap.c based on the i386 implementation. It differs from the
i386 version in the following ways:

  * The PTE flags are passed as a pgprot_t argument and must be
    determined up front by the arch-specific code. No additional
    PTE flags are added.
  * Uses set_pte_at() instead of set_pte()

[bunk@stusta.de: warning fix]
]dhowells@redhat.com: nommu build fix]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Andi Kleen <ak@suse.de>
Cc: <linux-m32r@ml.linux-m32r.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:31 -07:00
Fernando Vazquez dc2bc768a0 [PATCH] stack overflow safe kdump: safe_smp_processor_id()
This is a the first of a series of patch-sets aiming at making kdump more
robust against stack overflows.

This patch set does the following:

* Add safe_smp_processor_id function to i386 architecture (this function was
  inspired by the x86_64 function of the same name).

* Substitute "smp_processor_id" with the stack overflow-safe
  "safe_smp_processor_id" in the reboot path to the second kernel.

This patch:

On the event of a stack overflow critical data that usually resides at the
bottom of the stack is likely to be stomped and, consequently, its use should
be avoided.

In particular, in the i386 and IA64 architectures the macro smp_processor_id
ultimately makes use of the "cpu" member of struct thread_info which resides
at the bottom of the stack.  x86_64, on the other hand, is not affected by
this problem because it benefits from the use of the PDA infrastructure.

To circumvent this problem I suggest implementing "safe_smp_processor_id()"
(it already exists in x86_64) for i386 and IA64 and use it as a replacement
for smp_processor_id in the reboot path to the dump capture kernel.  This is a
possible implementation for i386.

Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Looks-reasonable-to: Andi Kleen <ak@muc.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Dave Hansen ce71ec3684 [PATCH] r/o bind mounts: monitor zeroing of i_nlink
Some filesystems, instead of simply decrementing i_nlink, simply zero it
during an unlink operation.  We need to catch these in addition to the
decrement operations.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Dave Hansen d8c76e6f45 [PATCH] r/o bind mount prepwork: inc_nlink() helper
This is mostly included for parity with dec_nlink(), where we will have some
more hooks.  This one should stay pretty darn straightforward for now.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Dave Hansen 9a53c3a783 [PATCH] r/o bind mounts: unlink: monitor i_nlink
When a filesystem decrements i_nlink to zero, it means that a write must be
performed in order to drop the inode from the filesystem.

We're shortly going to have keep filesystems from being remounted r/o between
the time that this i_nlink decrement and that write occurs.

So, add a little helper function to do the decrements.  We'll tie into it in a
bit to note when i_nlink hits zero.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Jay Lan db5fed26b2 [PATCH] csa accounting taskstats update
ChangeLog:
   Feedbacks from Andrew Morton:
   - define TS_COMM_LEN to 32
   - change acct_stimexpd field of task_struct to be of
     cputime_t, which is to be used to save the tsk->stime
     of last timer interrupt update.
   - a new Documentation/accounting/taskstats-struct.txt
     to describe fields of taskstats struct.

   Feedback from Balbir Singh:
   - keep the stime of a task to be zero when both stime
     and utime are zero as recoreded in task_struct.

   Misc:
   - convert accumulated RSS/VM from platform dependent
     pages-ticks to MBytes-usecs in the kernel

Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:29 -07:00
Jay Lan 8f0ab51479 [PATCH] csa: convert CONFIG tag for extended accounting routines
There were a few accounting data/macros that are used in CSA but are #ifdef'ed
inside CONFIG_BSD_PROCESS_ACCT.  This patch is to change those ifdef's from
CONFIG_BSD_PROCESS_ACCT to CONFIG_TASK_XACCT.  A few defines are moved from
kernel/acct.c and include/linux/acct.h to kernel/tsacct.c and
include/linux/tsacct_kern.h.

Signed-off-by: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:29 -07:00
Jay Lan 9acc185351 [PATCH] csa: Extended system accounting over taskstats
Add extended system accounting handling over taskstats interface.  A
CONFIG_TASK_XACCT flag is created to enable the extended accounting code.

Signed-off-by: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:29 -07:00