Commit Graph

33413 Commits

Author SHA1 Message Date
ASANO Masahiro a634904a7d VFS: add lookup hint for network file systems
I'm trying to speeding up mkdir(2) for network file systems.  A typical
mkdir(2) calls two inode_operations: lookup and mkdir.  The lookup
operation would fail with ENOENT in common case.  I think it is unnecessary
because the subsequent mkdir operation can check it.  In case of creat(2),
lookup operation is called with the LOOKUP_CREATE flag, so individual
filesystem can omit real lookup.  e.g.  nfs_lookup().

Here is a sample patch which uses LOOKUP_CREATE and O_EXCL on mkdir,
symlink and mknod.  This uses the gadget for creat(2).

And here is the result of a benchmark on NFSv3.
  mkdir(2) 10,000 times:
    original  50.5 sec
    patched   29.0 sec

Signed-off-by: ASANO Masahiro <masano@tnes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from fab7bf44449b29f9d5572a5dd8adcf7c91d5bf0f commit)
2006-08-24 15:49:14 -04:00
Nikita Danilov ddeff520f0 NFS: Fix a potential deadlock in nfs_release_page
nfs_wb_page() waits on request completion and, as a result, is not safe to be
called from nfs_release_page() invoked by VM scanner as part of GFP_NOFS
allocation. Fix possible deadlock by analyzing gfp mask and refusing to
release page if __GFP_FS is not set.

Signed-off-by: Nikita Danilov <danilov@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 374d969debfb290bafcb41d28918dc6f7e43ce31 commit)
2006-08-24 15:48:46 -04:00
Greg Kroah-Hartman ef7d1b244f Merge gregkh@master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2006-08-18 11:02:52 -07:00
Greg Kroah-Hartman ed0da6fc9d Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc 2006-08-18 09:20:04 -07:00
Herbert Xu 78eb887733 [BRIDGE]: Disable SG/GSO if TX checksum is off
When the bridge recomputes features, it does not maintain the
constraint that SG/GSO must be off if TX checksum is off.
This patch adds that constraint.

On a completely unrelated note, I've also added TSO6 and TSO_ECN
feature bits if GSO is enabled on the underlying device through
the new NETIF_F_GSO_SOFTWARE macro.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 18:22:32 -07:00
Patrick McHardy 8311731afc [NETFILTER]: ip_tables: fix table locking in ipt_do_table
table->private might change because of ruleset changes, don't use it without
holding the lock.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 18:13:53 -07:00
Patrick McHardy d205dc4079 [NETFILTER]: ctnetlink: fix deadlock in table dumping
ip_conntrack_put must not be called while holding ip_conntrack_lock
since destroy_conntrack takes it again.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 18:12:38 -07:00
Jon Loeliger 9e8a9bc2d2 [POWERPC] Fix the mpc8641_hpcn.dts file.
Add 'linux,phandle' entry to i8259@4d0 node.

Signed-off-by: Zhang Wei <wei.zhang@freescale.com>
Signed-off-by: Jon Loeliger <jdl@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-18 10:08:37 +10:00
Jon Loeliger 5315862045 [POWERPC] Offer PCI as a CONFIG choice for PPC_86xx.
Also fix 80-column run-over.

Signed-off-by: Jon Loeliger <jdl@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-18 10:08:36 +10:00
Jon Loeliger 707ba16f0f [POWERPC] Add MPC8641 HPCN Device Tree Source file.
As per list discussion, let's add device tree source files
under powerpc/boot/dts.  If nothing else, it is a starting point.

Signed-off-by: Jon Loeliger <jdl@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-18 10:02:45 +10:00
Jon Loeliger f583165f6a [POWERPC] Convert to mac-address for ethernet MAC address data.
Also accept "local-mac-address".  However the old "address"
is now obsolete, but accepted for backwards compatibility.
It should be removed after all device trees have been
converted to use "mac-address".

Signed-off-by: Jon Loeliger <jdl@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-18 09:50:16 +10:00
Alexey Kuznetsov 6e8fcbf640 [IPV4]: severe locking bug in fib_semantics.c
Found in 2.4 by Yixin Pan <yxpan@hotmail.com>.

> When I read fib_semantics.c of Linux-2.4.32, write_lock(&fib_info_lock) =
> is used in fib_release_info() instead of write_lock_bh(&fib_info_lock).  =
> Is the following case possible: a BH interrupts fib_release_info() while =
> holding the write lock, and calls ip_check_fib_default() which calls =
> read_lock(&fib_info_lock), and spin forever.

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 16:44:46 -07:00
David L Stevens acd6e00b8e [MCAST]: Fix filter leak on device removal.
This fixes source filter leakage when a device is removed and a
process leaves the group thereafter.

This also includes corresponding fixes for IPv6 multicast source
filters on device removal.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 16:29:57 -07:00
David S. Miller c7fa9d189e [NET]: Disallow whitespace in network device names.
It causes way too much trouble and confusion in userspace.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 16:29:56 -07:00
Panagiotis Issaris d4274b51a5 [PPP]: handle kmalloc failures and convert to using kzalloc
The PPP code contains two kmalloc()s followed by memset()s without
handling a possible memory allocation failure.  (Suggested by Joe
Perches).

And furthermore, conversions from kmalloc+memset to kzalloc.

[akpm@osdl.org: fix error-path leak]
[akpm@osdl.org: cleanups]
[paulus@samba.org: don't add useless printk and cardmap_destroy calls]

Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 16:29:55 -07:00
Ralf Hildebrandt c0956bd251 [PKT_SCHED] cls_u32: Fix typo.
Signed-off-by: Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 16:29:54 -07:00
Kevin Hilman b9c6e3e966 [ATM]: Compile error on ARM
atm_proc_exit() is declared as __exit, and thus in .exit.text.  On
some architectures (ARM) .exit.text is discarded at compile time, and
since atm_proc_exit() is called by some other __init functions, it
results in a link error.

Signed-off-by: Kevin Hilman <khilman@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 16:29:53 -07:00
Michael Chan 932f3772cf [BNX2]: Convert to netdev_alloc_skb()
Convert dev_alloc_skb() to netdev_alloc_skb() and increase default
rx ring size to 255. The old ring size of 100 was too small.

Update version to 1.4.44.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 16:29:52 -07:00
Michael Chan 2f8af120a1 [BNX2]: Fix tx race condition.
Fix a subtle race condition between bnx2_start_xmit() and bnx2_tx_int()
similar to the one in tg3 discovered by Herbert Xu:

CPU0					CPU1
bnx2_start_xmit()
	if (tx_ring_full) {
		tx_lock
					bnx2_tx()
						if (!netif_queue_stopped)
		netif_stop_queue()
		if (!tx_ring_full)
						update_tx_ring
			netif_wake_queue()
		tx_unlock
	}

Even though tx_ring is updated before the if statement in bnx2_tx_int() in
program order, it can be re-ordered by the CPU as shown above.  This
scenario can cause the tx queue to be stopped forever if bnx2_tx_int() has
just freed up the entire tx_ring.  The possibility of this happening
should be very rare though.

The following changes are made, very much identical to the tg3 fix:

1. Add memory barrier to fix the above race condition.

2. Eliminate the private tx_lock altogether and rely solely on
netif_tx_lock.  This eliminates one spinlock in bnx2_start_xmit()
when the ring is full.

3. Because of 2, use netif_tx_lock in bnx2_tx_int() before calling
netif_wake_queue().

4. Add memory barrier to bnx2_tx_avail().

5. Add bp->tx_wake_thresh which is set to half the tx ring size.

6. Check for the full wake queue condition before getting
netif_tx_lock in tg3_tx().  This reduces the number of unnecessary
spinlocks when the tx ring is full in a steady-state condition.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 16:29:51 -07:00
Jan "Yenya" Kasprzak fb33f82568 [NET]: Terminology in ip-sysctl.txt
this minor patch fixes the description of net.ipv4.tcp_mem sysctl
in ip-sysctl.txt - the headline names the values "min, pressure, max",
while the description uses the "low, pressure, high" values.
Both tcp_rmem and tcp_wmem descriptions use the "min, pressure, max"
values, so I have changed the tcp_mem to match this and not vice versa.

Signed-off-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 16:29:50 -07:00
Michal Ruzicka bb699cbca0 [IPV4]: Possible leak of multicast source filter sctructure
There is a leak of a socket's multicast source filter list structure
on closing a socket with a multicast source filter set on an interface
that does not exist any more.

Signed-off-by: Michal Ruzicka <michal.ruzicka@comstar.cz>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 16:29:49 -07:00
Ingo Molnar 640c41c77a [IPV6] lockdep: annotate __icmpv6_socket
Split off __icmpv6_socket's sk->sk_dst_lock class, because it gets
used from softirqs, which is safe for __icmpv6_sockets (because they
never get directly used via userspace syscalls), but unsafe for normal
sockets.

Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 16:29:48 -07:00
Andrew Morton deb47c66e1 [NETFILTER]: xt_physdev build fix
It needs netfilter_bridge.h for brnf_deferred_hooks

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 16:29:47 -07:00
Suresh Siddha 8557511250 [NET]: Fix potential stack overflow in net/core/utils.c
On High end systems (1024 or so cpus) this can potentially cause stack
overflow.  Fix the stack usage.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 16:29:47 -07:00
David S. Miller 7ea49ed73c [VLAN]: Make sure bonding packet drop checks get done in hwaccel RX path.
Since __vlan_hwaccel_rx() is essentially bypassing the
netif_receive_skb() call that would have occurred if we did the VLAN
decapsulation in software, we are missing the skb_bond() call and the
assosciated checks it does.

Export those checks via an inline function, skb_bond_should_drop(),
and use this in __vlan_hwaccel_rx().

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17 16:29:46 -07:00
Olof Johansson 9a936a2e05 [POWERPC] powerpc: Clear HID0 attention enable on PPC970 at boot time
Clear HID0[en_attn] at CPU init time on PPC970.  Closes CVE-2006-4093.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-18 07:23:29 +10:00
Benjamin Herrenschmidt e5c14ce118 [POWERPC] Fix irq radix tree remapping typo
The code for using the radix tree for reverse mapping of interrupts has
a typo that causes it to create incorrect mappings if the software and
hardware numbers happen to be different. This would, among others, cause
the IDE interrupt to fail on js20's. This fixes it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-17 16:41:11 +10:00
Ananth N Mavinakayanahalli 83db3dde26 [POWERPC] kprobes: Fix possible system crash during out-of-line single-stepping
- On archs that have no-exec support, we vmalloc() a executable scratch
area of PAGE_SIZE and divide it up into an array of slots of maximum
instruction size for that arch
- On a kprobe registration, the original instruction is copied to the
first available free slot, so if multiple kprobes are registered, chances
are, they get contiguous slots
- On POWER4, due to not having coherent icaches, we could hit a situation
where a probe that is registered on one processor, is hit immediately on
another. This second processor could have fetched the stream of text from
the out-of-line single-stepping area *before* the probe registration
completed, possibly due to an earlier (and a different) kprobe hit and
hence would see stale data at the slot.

Executing such an arbitrary instruction lead to a problem as reported
in LTC bugzilla 23555.

The correct solution is to call flush_icache_range() as soon as the
instruction is copied for out-of-line single-stepping, so the correct
instruction is seen on all processors.

Thanks to Will Schmidt who tracked this down.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-17 16:41:10 +10:00
Michael Ellerman b6f35b4966 [POWERPC] Make crash.c work on 32-bit and 64-bit
To compile kexec on 32-bit we need a few more bits and pieces. Rather
than add empty definitions, we can make crash.c work on 32-bit, with
only a couple of kludges.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-17 16:41:10 +10:00
Michael Ellerman 47585d8f5d [POWERPC] Move some kexec logic into machine_kexec.c
We're missing a few functions for kexec to compile on 32-bit. There's
nothing really 64-bit specific about the 64-bit versions, so make them
generic rather than adding empty definitions for 32-bit.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-17 16:41:10 +10:00
Will Schmidt 90bdde362c [POWERPC] update {g5,iseries,pseries}_defconfigs
Updating the defconfigs for iseries, pseries, and G5.   Sticking with
the defaults, with the following exceptions:  I've turned off HW_RANDOM
for all three configs.   For G5, I've enabled SND_AOA and friends as
modules; this includes the FABRIC_LAYOUT, ONYX, TAS, TOONIE and
SOUNDBUS* config options.

Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-17 16:41:10 +10:00
David Wilder eac8392f95 [POWERPC] Make secondary CPUs call into kdump on reset exception
In the case of a system hang, the user will invoke soft-reset to
initiate the kdump boot.  If xmon is enabled, the CPU(s) enter into the
xmon debugger.   Unfortunately, the secondary CPU(s) will return to the
hung state when they exit from the debugger (returned from die() ->
system_reset_exception()).  This causes a problem in kdump since the
hung CPU(s) will not respond to the IPI sent from kdump.  This patch
fixes the issue by calling crash_kexec_secondary() directly from
system_reset_exception() without returning to the previous state.  These
secondary CPUs wait 5ms until the kdump boot is started by the primary
CPU.   In the case we exited from the debugger to "recover" (command 'x'
in xmon) the primary and the secondary CPUs will all return from die()
-> system_reset_exception() ->crash_kexec_secondary() wait 5ms, then
return to the previous state.  A kdump boot is not started in this case.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-17 16:41:10 +10:00
Greg Kroah-Hartman 774bd8613d Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-2.6.18 2006-08-16 12:41:16 -07:00
Sam Ravnborg c9eca0b910 kbuild: correct assingment to CFLAGS with CROSS_COMPILE
Some architectures change $CC in arch/$(ARCH)/Makefile
mips is one example.

That have impact on what options are supported by gcc so move all
$(call cc-option, ...) after include of arch specific Makefile.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2006-08-16 21:14:08 +02:00
Greg Kroah-Hartman 223ddcea89 Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6 2006-08-16 08:51:04 -07:00
Heiko Carstens 3e03a2fcb2 [S390] kernel page table allocation.
Don't waste DMA capable pages for identity mapping page tables.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2006-08-16 13:49:37 +02:00
Peter Oberparleiter b18a60e7c2 [S390] inaccessible PAV alias devices on LPAR.
In some situations PAV alias devices on LPAR are not accessible.
The initialization procedure required to enable access to PAV alias
devices has to be performed per storage server subsystem and not
only once per storage server.

Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2006-08-16 13:49:33 +02:00
Heiko Carstens 2f6c55fc31 [S390] dasd slab cache alignment.
The dasd_page_cache should return page addresses and therefore the
cache must be created with an alignment of PAGE_SIZE.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2006-08-16 13:49:27 +02:00
Greg Kroah-Hartman ca412cc992 Merge gregkh@master.kernel.org:/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog 2006-08-16 01:38:39 -07:00
Hans de Goede e0e9263271 [PATCH] PATCH: 1 line 2.6.18 bugfix: modpost-64bit-fix.patch
There is a small but annoying bug in scripts/mod/file2alias.c which causes
it to generate invalid aliases for input devices on 64 bit archs. This causes
joydev.ko to not be automaticly loaded when inserting a joystick, resulting in
a non working joystick (for the average user).

In scripts/mod/file2alias.c is the following code for generating the input
aliases:
static void do_input(char *alias,
                     kernel_ulong_t *arr, unsigned int min, unsigned int max)
{
        unsigned int i;

        for (i = min; i < max; i++)
                if (arr[i / BITS_PER_LONG] & (1 << (i%BITS_PER_LONG)))
                        sprintf(alias + strlen(alias), "%X,*", i);
}

On 32 bits systems, this correctly generates "0,*" for the first alias, "8,*"
for the second etc.

However on 64 bits it generates: "0,*20,*" resp "8,*28,*" Notice how it adds 20
+ first entry (hex) ! to the list of hex codes, which is 32 more then the first
entry, thus is because the bit test above wraps at 32 bits instead of 64.

scripts/mod/file2alias.c, line 379 reads:
                if (arr[i / BITS_PER_LONG] & (1 << (i%BITS_PER_LONG)))
That should be:
                if (arr[i / BITS_PER_LONG] & (1L << (i%BITS_PER_LONG)))

Notice the added 'L' after the 1, otherwise that is an 32 bit int instead of a
64 bit long, and when that int gets shifted >= 32 times, appearantly the number
by which to shift is wrapped at 5 bits ( % 32) causing it to test a bit 32 bits
too low.

The patch below makes the nescesarry 1 char change :)

Signed-off-by: Hans de Goede <j.w.r.degoede@hhs.nl>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-08-15 12:53:09 -07:00
Greg Kroah-Hartman 80914d97aa Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 2006-08-15 12:31:36 -07:00
Matt LaPlante 2621e2a155 [WATCHDOG] Kconfig typos fix.
Three typos in drivers/char/watchdog/Kconfig...

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2006-08-15 11:17:22 +02:00
Trond Myklebust 74361cb682 [PATCH] fcntl(F_SETSIG) fix
fcntl(F_SETSIG) no longer works on leases because
lease_release_private_callback() gets called as the lease is copied in
order to initialise it.

The problem is that lease_alloc() performs an unnecessary initialisation,
which sets the lease_manager_ops.  Avoid the problem by allocating the
target lease structure using locks_alloc_lock().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-08-14 13:10:59 -07:00
Alexander Zarochentsev 1d7ea7324a [PATCH] fuse: fix error case in fuse_readpages
Don't let fuse_readpages leave the @pages list not empty when exiting
on error.

[akpm@osdl.org: kernel-doc fixes]
Signed-off-by: Alexander Zarochentsev <zam@namesys.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-08-14 12:54:29 -07:00
Andrew Morton 9b41ea7289 [PATCH] workqueue: remove lock_cpu_hotplug()
Use a private lock instead.  It protects all per-cpu data structures in
workqueue.c, including the workqueues list.

Fix a bug in schedule_on_each_cpu(): it was forgetting to lock down the
per-cpu resources.

Unfixed long-standing bug: if someone unplugs the CPU identified by
`singlethread_cpu' the kernel will get very sick.

Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-08-14 12:54:29 -07:00
Michal Januszewski 2b25742556 [PATCH] fbdev: include backlight.h only when __KERNEL__ is defined
linux/backlight.h pulls in header files (eg.  ioport.h) that break
compilation of userspace programs.  To solve the problem, only include
backlight.h in fb.h if compiling kernel stuff.

Signed-off-by: Michal Januszewski <spock@gentoo.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-08-14 12:54:29 -07:00
john stultz e579dcbf23 [PATCH] futex_handle_fault always fails
We found this issue last week w/ the -RT kernel, but it seems the same
issue is in mainline as well.

Basically it is possible for futex_unlock_pi to return without actually
freeing the lock.  This is due to buggy logic in the use of
futex_handle_fault() and its attempt argument in a failure case.

Looking at futex.c the logic is as follows:

1) In futex_unlock_pi() we start w/ ret=0 and we go down to the first
   futex_atomic_cmpxchg_inatomic(), where we find uval==-EFAULT.  We then
   jump to the pi_faulted label.

2) From pi_faulted: We increment attempt, unlock the sem and hit the
   retry label.

3) From the retry label, with ret still zero, we again hit EFAULT on the
   first futex_atomic_cmpxchg_inatomic(), and again goto the pi_faulted
   label.

4) Again from pi_faulted: we increment attempt and enter the
   conditional, where we call futex_handle_fault.

5) futex_handle_fault fails, and we goto the out_unlock_release_sem
   label.

6) From out_unlock_release_sem we return, and since ret is still zero,
   we return without error, while never actually unlocking the lock.

Issue #1: at the first futex_atomic_cmpxchg_inatomic() we should probably
be setting ret=-EFAULT before jumping to pi_faulted: However in our case
this doesn't really affect anything, as the glibc we're using ignores the
error value from futex_unlock_pi().

Issue #2: Look at futex_handle_fault(), its first conditional will return
-EFAULT if attempt is >= 2.  However, from the "if(attempt++)
futex_handle_fault(attempt)" logic above, we'll *never* call
futex_handle_fault when attempt is less then two.  So we never get a chance
to even try to fault the page in.

The following patch addresses these two issues by 1) Always setting ret to
-EFAULT if futex_handle_fault fails, and 2) Removing the = in
futex_handle_fault's (attempt >= 2) check.

I'm really not sure this is the right fix, but wanted to bring it up so
folks knew the issue is alive and well in the current -git tree.  From
looking at the git logs the logic was first introduced (then later copied
to other places) in the following commit almost a year ago:

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4732efbeb997189d9f9b04708dc26bf8613ed721;hp=5b039e681b8c5f30aac9cc04385cc94be45d0823

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-08-14 12:54:29 -07:00
Kirill Korotaev 6997a6faaa [PATCH] sys_getppid oopses on debug kernel
sys_getppid() optimization can access a freed memory.  On kernels with
DEBUG_SLAB turned ON, this results in Oops.  As Dave Hansen noted, this
optimization is also unsafe for memory hotplug.

So this patch always takes the lock to be safe.

[oleg@tv-sign.ru: simplifications]
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Cc: <stable@kernel.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-08-14 12:54:29 -07:00
Horms 012c437d03 [PATCH] Change panic_on_oops message to "Fatal exception"
Previously the message was "Fatal exception: panic_on_oops", as introduced
in a recent patch whith removed a somewhat dangerous call to ssleep() in
the panic_on_oops path.  However, Paul Mackerras suggested that this was
somewhat confusing, leadind people to believe that it was panic_on_oops
that was the root cause of the fatal exception.  On his suggestion, this
patch changes the message to simply "Fatal exception".  A suitable oops
message should already have been displayed.

Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-08-14 12:54:29 -07:00
Michal Miroslaw 485311a23c [PATCH] dm: BUG/OOPS fix
Fix BUG I tripped on while testing failover and multipathing.

BUG shows up on error path in multipath_ctr() when parse_priority_group()
fails after returning at least once without error.  The fix is to
initialize m->ti early - just after alloc()ing it.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
c027c3d2
*pde = 00000000
Oops: 0000 [#3]
Modules linked in: qla2xxx ext3 jbd mbcache sg ide_cd cdrom floppy
CPU:    0
EIP:    0060:[<c027c3d2>]    Not tainted VLI
EFLAGS: 00010202   (2.6.17.3 #1)
EIP is at dm_put_device+0xf/0x3b
eax: 00000001   ebx: ee4fcac0   ecx: 00000000   edx: ee4fcac0
esi: ee4fc4e0   edi: ee4fc4e0   ebp: 00000000   esp: c5db3e78
ds: 007b   es: 007b   ss: 0068
Process multipathd (pid: 15912, threadinfo=c5db2000 task=ef485a90)
Stack: ec4eda40 c02816bd ee4fc4c0 00000000 f7e89498 f883e0bc c02816f6 f7e89480
       f7e8948c c0281801 ffffffea f7e89480 f883e080 c0281ffe 00000001 00000000
       00000004 dfe9cab8 f7a693c0 f883e080 f883e0c0 ca4b99c0 c027c6ee 01400000
Call Trace:
 <c02816bd> free_pgpaths+0x31/0x45  <c02816f6> free_priority_group+0x25/0x2e
 <c0281801> free_multipath+0x35/0x67  <c0281ffe> multipath_ctr+0x123/0x12d
 <c027c6ee> dm_table_add_target+0x11e/0x18b  <c027e5b4> populate_table+0x8a/0xaf
 <c027e62b> table_load+0x52/0xf9  <c027ec23> ctl_ioctl+0xca/0xfc
 <c027e5d9> table_load+0x0/0xf9  <c0152146> do_ioctl+0x3e/0x43
 <c0152360> vfs_ioctl+0x16c/0x178  <c01523b4> sys_ioctl+0x48/0x60
 <c01029b3> syscall_call+0x7/0xb
Code: 97 f0 00 00 00 89 c1 83 c9 01 80 e2 01 0f 44 c1 88 43 14 8b 04 24 59 5b 5e 5f 5d c3 53 89 c1 89 d3 ff 4a 08 0f 94 c0 84 c0 74 2a <8b> 01 8b 10 89 d8 e8 f6 fb ff ff 8b 03 8b 53 04 89 50 04 89 02
EIP: [<c027c3d2>] dm_put_device+0xf/0x3b SS:ESP 0068:c5db3e78

Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-08-14 12:54:29 -07:00