If you have an array with a write-intent-bitmap, and you remove a device, then
re-add it, a full recovery isn't needed. We detect a re-add by looking at
saved_raid_disk. For raid1, it doesn't matter which disk it was, only whether
or not it was an active device. The old code being removed set a value of
'mirror' which was then ignored, so it can go. The changed code performs the
correct check.
For raid6, if there are two missing devices, make sure we chose the right slot
on --re-add rather than always the first slot.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If an array is created using set_array_info, default_bitmap_offset isn't set
properly meaning that an internal bitmap cannot be hot-added until the array
is stopped and re-assembled.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When doing a recovery, we need to know whether the array will still be
degraded after the recovery has finished, so we can know whether bits can be
clearred yet or not. This patch performs the required check.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
bitmap_unplug actually writes data (bits) to storage, so we shouldn't be
holding a spinlock...
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
raid10 has two different layouts. One uses near-copies (so multiple
copies of a block are at the same or similar offsets of different
devices) and the other uses far-copies (so multiple copies of a block
are stored a greatly different offsets on different devices). The point
of far-copies is that it allows the first section (normally first half)
to be layed out in normal raid0 style, and thus provide raid0 sequential
read performance.
Unfortunately, the read balancing in raid10 makes some poor decisions
for far-copies arrays and you don't get the desired performance. So
turn off that bad bit of read_balance for far-copies arrays.
With this patch, read speed of an 'f2' array is comparable with a raid0
with the same number of devices, though write speed is ofcourse still
very slow.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some users (hi Zwane) have seen a problem when running a workload that
eats nearly all of physical memory - th system does an OOM kill, even
when there is still a lot of swap free.
The problem appears to be a very big task that is holding the swap
token, and the VM has a very hard time finding any other page in the
system that is swappable.
Instead of ignoring the swap token when sc->priority reaches 0, we could
simply take the swap token away from the memory hog and make sure we
don't give it back to the memory hog for a few seconds.
This patch resolves the problem Zwane ran into.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Assign the appropriate dentry operations to the dentry. Fixes memory leak.
Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move the cpuset_fork() call below the write_unlock_irq call in
kernel/fork.c copy_process().
Since the cpuset-dual-semaphore-locking-overhaul.patch, the cpuset_fork()
routine acquires task_lock(), so cannot be called while holding the
tasklist_lock for write.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I believe this patch is required to fix breakage in the asynch reclaim
watermark logic introduced by this patch:
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7fb1d9fca5c6e3b06773b69165a73f3fb786b8ee
Just some background of the watermark logic in case it isn't clear...
Basically what we have is this:
--- pages_high
|
| (a)
|
--- pages_low
|
| (b)
|
--- pages_min
|
| (c)
|
--- 0
Now when pages_low is reached, we want to kick asynch reclaim, which gives us
an interval of "b" before we must start synch reclaim, and gives kswapd an
interval of "a" before it need go back to sleep.
When pages_min is reached, normal allocators must enter synch reclaim, but
PF_MEMALLOC, ALLOC_HARDER, and ALLOC_HIGH (ie. atomic allocations, recursive
allocations, etc.) get access to varying amounts of the reserve "c".
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch corrects the return value for the EXT3_IOC_GROUP_ADD in case it
fails due to the presence of multiple resizers at the filesystem.
The problem is a little bit more serious than a wrong return value in this
case, since the clause err=0 in the exit_journal path will lead to a call
to update_backups which in turns causes a NULL pointer dereference.
Signed-off-by: Glauber de Oliveira Costa <glommer@br.ibm.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is for supporting IDE interface for M3A-2170(Mappi-III) board.
Signed-off-by: Mamoru Sakugawa <sakugawa@linux-m32r.org>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a deadlock problem of the m32r SMP kernel.
In the m32r kernel, sys_tas() system call is provided as a test-and-set
function for userspace, for backward compatibility.
In some multi-threading application program, deadlocks were rarely caused
at sys_tas() funcion. Such a deadlock was caused due to a collision of
__pthread_lock() and __pthread_unlock() operations.
The "tas" syscall is repeatedly called by pthread_mutex_lock() to get a
lock, while a lock variable's value is not 0. On the other hand,
pthead_mutex_unlock() sets the lock variable to 0 for unlocking.
In the previous implementation of sys_tas() routine, there was a
possibility that a unlock operation was ignored in the following case:
- Assume a lock variable (*addr) was equal to 1 before sys_tas() execution.
- __pthread_unlock() operation is executed by the other processor
and the lock variable (*addr) is set to 0, between a read operation
("oldval = *addr;") and the following write operation ("*addr = 1;")
during a execution of sys_tas().
In this case, the following write operation ("*addr = 1;") overwrites the
__pthread_unlock() result, and sys_tas() fails to get a lock in the next
turn and after that.
According to the attatched patch, sys_tas() returns 0 value in the next
turn and deadlocks never happen.
Signed-off-by: Hitoshi Yamamoto <Yamamoto.Hitoshi@ap.MitsubishiElectric.co.jp>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Tracked this down on an Ultra Enterprise 3000. It's a 6-way machine. Odd
thing about this machine (and it's good for finding bugs like this) is that
the CPU id's are not 0 based. For instance, on my machine the CPU's are
6/7/10/11/14/15.
This caused some NULL pointer dereference in kernel/workqueue.c because for
single_threaded workqueue's, it hardcoded the cpu to 0.
I changed the 0's to any_online_cpu(cpu_online_mask), which cpumask.h
claims is "First cpu in mask". So this fits the same usage.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
genalloc improperly stores the sizes of freed chunks, allocates overlapping
memory regions, and oopses after its in-band data is overwritten.
Signed-off-by: Chris Humbert <mahadri-kernel@drigon.com>
Cc: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I now see another overflow in reiserfs that should lead to data corruptions
with files that are bigger than 4G under certain circumstances when using
mmap.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove bogus usage of test/set_bit() from fbcon rotation code and just
manipulate the bits directly. This fixes an oops on powerpc among others
and should be faster. Seems to work fine on the G5 here.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch implements a bunch of small changes to the FRV arch to
make it work again.
It deals with the following problems:
(1) SEM_DEBUG should be SEMAPHORE_DEBUG.
(2) The argument list to pcibios_penalize_isa_irq() has changed.
(3) CONFIG_HIGHMEM can't be used directly in #if as it may not be defined.
(4) page->private is no longer directly accessible.
(5) linux/hardirq.h assumes asm/hardirq.h will include linux/irq.h
(6) The IDE MMIO access functions are given pointers, not integers, and so
get type casting errors.
(7) __pa() is passed an explicit u64 type in drivers/char/mem.c, but that
can't be cast directly to a pointer on a 32-bit platform.
(8) SEMAPHORE_DEBUG should not be contingent on WAITQUEUE_DEBUG as that no
longer exists.
(9) PREEMPT_ACTIVE is too low a value.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't do that - it does GFP_KERNEL allocations, for a start.
(Reported by Guillaume Thouvenin <guillaume.thouvenin@bull.net>)
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
So don't define it as extern in the header file.
drivers/base/memory.c:28: error: static declaration of 'memory_sysdev_class' follows non-static declaration
include/linux/memory.h:88: error: previous declaration of 'memory_sysdev_class' was here
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are some callers in cpufreq hotplug notify path that the lowest
function calls lock_cpu_hotplug(). The lock is already held during
cpu_up() and cpu_down() calls when the notify calls are broadcast to
registered clients.
Ideally if possible, we could disable_preempt() at the highest caller and
make sure we dont sleep in the path down in cpufreq->driver_target() calls
but the calls are so intertwined and cumbersome to cleanup.
Hence we consistently use lock_cpu_hotplug() and unlock_cpu_hotplug() in
all places.
- Removed export of cpucontrol semaphore and made it static.
- removed explicit uses of up/down with lock_cpu_hotplug()
so we can keep track of the the callers in same thread context and
just keep refcounts without calling a down() that causes a deadlock.
- Removed current_in_hotplug() uses
- Removed PF_HOTPLUG_CPU in sched.h introduced for the current_in_hotplug()
temporary workaround.
Tested with insmod of cpufreq_stat.ko, and logical online/offline
to make sure we dont have any hang situations.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Cc: Zwane Mwaikambo <zwane@linuxpower.ca>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
LD .tmp_vmlinux1
mm/built-in.o(.text+0x100d6): In function `copy_page_range':
: undefined reference to `__pud_alloc'
mm/built-in.o(.text+0x1010b): In function `copy_page_range':
: undefined reference to `__pmd_alloc'
mm/built-in.o(.text+0x11ef4): In function `__handle_mm_fault':
: undefined reference to `__pud_alloc'
fs/built-in.o(.text+0xc930): In function `install_arg_page':
: undefined reference to `__pud_alloc'
make: *** [.tmp_vmlinux1] Error 1
Those missing references in mm/memory.c arise from this code in
include/linux/mm.h, combined with the fact that __PGTABLE_PMD_FOLDED and
__PGTABLE_PUD_FOLDED are both set and __ARCH_HAS_4LEVEL_HACK is not:
/*
* The following ifdef needed to get the 4level-fixup.h header to work.
* Remove it when 4level-fixup.h has been removed.
*/
#if defined(CONFIG_MMU) && !defined(__ARCH_HAS_4LEVEL_HACK)
static inline pud_t *pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
{
return (unlikely(pgd_none(*pgd)) && __pud_alloc(mm, pgd, address))?
NULL: pud_offset(pgd, address);
}
static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
{
return (unlikely(pud_none(*pud)) && __pmd_alloc(mm, pud, address))?
NULL: pmd_offset(pud, address);
}
#endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */
With my configuration the pgd_none and pud_none routines are inlines
returning a constant 0. Apparently the old compiler avoids generating
calls to __pud_alloc and __pmd_alloc but still lists them as undefined
references in the module's symbol table.
I don't know which change caused this problem. I think it was added
somewhere between 2.6.14 and 2.6.15-rc1, because I remember building
several 2.6.14-rc kernels without difficulty. However I can't point to an
individual culprit.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
explicit support for a "remapped page range" aka VM_PFNMAP. It allows a
VM area to contain an arbitrary range of page table entries that the VM
never touches, and never considers to be normal pages.
Any user of "remap_pfn_range()" automatically gets this new
functionality, and doesn't even have to mark the pages reserved or
indeed mark them any other way. It just works. As a side effect, doing
mmap() on /dev/mem works for arbitrary ranges.
Sparc update from David in the next commit.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The Coverity checker spotted this obvious use-after-release bug caused
by a wrong order of the cleanups.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In __rpc_purge_upcall (net/sunrpc/rpc_pipe.c), the newer code to clean up
the in_upcall list has a typo.
Thanks to Vince Busam <vbusam@google.com> for spotting this!
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In cases where the server has gone insane, nfs_update_inode() may end
up calling nfs_invalidate_inode(), which again calls stuff that takes
the inode->i_lock that we're already holding.
In addition, given the sort of things we have in NFS these days that
need to be cleaned up on inode release, I'm not sure we should ever
be calling make_bad_inode().
Fix up spinlock recursion, and limit nfs_invalidate_inode() to clearing
the caches, and marking the inode as being stale.
Thanks to Steve Dickson <SteveD@redhat.com> for spotting this.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When caching locks due to holding a file delegation, we must always
check against local locks before sending anything to the server.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Blah. The patch [0] I recently sent fixing errors with
in_hugepage_area() and prepare_hugepage_range() for powerpc itself has
an off-by-one bug. Furthermore, the related functions
touches_hugepage_*_range() and within_hugepage_*_range() are also
buggy. Some of the bugs, like those addressed in [0] originated with
commit 7d24f0b8a5 where we tweaked the
semantics of where hugepages are allowed. Other bugs have been there
essentially forever, and are due to the undefined behaviour of '<<'
with shift counts greater than the type width (LOW_ESID_MASK could
return non-zero for high ranges with the right congruences).
The good news is that I now have a testsuite which should pick up
things like this if they creep in again.
[0] "powerpc-fix-for-hugepage-areas-straddling-4gb-boundary"
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
With the removal of include/asm-powerpc, we no longer need
arch/powerpc/include/asm for the 64 bit build. We also do not need
-Iarch/powerpc for the 64 bit build either.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
its queue of IO completion callbacks, thus creating the deadlock between
umount and xfslogd. Breaking the loop solves the problem.
SGI-PV: 943821
SGI-Modid: xfs-linux-melb:xfs-kern:202363a
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
This code fixes a tiny problem with the recent fbcon rotation changes:
fb_prepare_logo doesn't check the return value of fb_find_logo and that
causes a crash for my while booting.
Obvious & working & tested fix is here.
Signed-off-by: Jasper Spaans <jasper@vs19.net>
Acked-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A fix for a locking bug which is triggered when a client tries to lock with
flag DMA_QUIESCENT (typically the X server), but gets interrupted by a signal.
The locking IOCTL should then return an error, but if DMA_QUIESCENT succeeds
it returns 0, and the client falsely thinks it has the lock. In addition
The client waits for DMA_QUISCENT and possibly DMA_READY without having the lock.
From: Thomas Hellstrom
Signed-off-by: Dave Airlie <airlied@linux.ie>
remove redundant include
Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>