OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
NeilBrown	ae3c20ccf8	[PATCH] md: fix some small races in bitmap plugging in raid5 The comment gives more details, but I didn't quite have the sequencing write, so there was room for races to leave bits unset in the on-disk bitmap for short periods of time. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-10 13:24:17 -07:00
NeilBrown	7c785b7a18	[PATCH] md: fix a plug/unplug race in raid5 When a device is unplugged, requests are moved from one or two (depending on whether a bitmap is in use) queues to the main request queue. So whenever requests are put on either of those queues, we should make sure the raid5 array is 'plugged'. However we don't. We currently plug the raid5 queue just before putting requests on queues, so there is room for a race. If something unplugs the queue at just the wrong time, requests will be left on the queue and nothing will want to unplug them. Normally something else will plug and unplug the queue fairly soon, but there is a risk that nothing will. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-10 13:24:16 -07:00
NeilBrown	ff4e8d9a9f	[PATCH] md: fix resync speed calculation for restarted resyncs We introduced 'io_sectors' recently so we could count the sectors that causes io during resync separate from sectors which didn't cause IO - there can be a difference if a bitmap is being used to accelerate resync. However when a speed is reported, we find the number of sectors processed recently by subtracting an oldish io_sectors count from a current 'curr_resync' count. This is wrong because curr_resync counts all sectors, not just io sectors. So, add a field to mddev to store the curren io_sectors separately from curr_resync, and use that in the calculations. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-10 13:24:16 -07:00
NeilBrown	0b8c9de05c	[PATCH] md: delay starting md threads until array is completely setup When an array is started we start one or two threads (two if there is a reshape or recovery that needs to be completed). We currently start these before the array is completely set up and in particular before queue->queuedata is set. If the thread actually starts very quickly on another CPU, we can end up dereferencing queue->queuedata and oops. This patch also makes sure we don't try to start a recovery if a reshape is being restarted. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-10 13:24:16 -07:00
NeilBrown	31b65a0d38	[PATCH] md: set desc_nr correctly for version-1 superblocks This has to be done in ->load_super, not ->validate_super Without this, hot-adding devices to an array doesn't always work right - though there is a work around in mdadm-2.5.2 to make this less of an issue. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-10 13:24:16 -07:00
NeilBrown	f4370781d8	[PATCH] md: possible fix for unplug problem I have reports of a problem with raid5 which turns out to be because the raid5 device gets stuck in a 'plugged' state. This shouldn't be able to happen as 3msec after it gets plugged it should get unplugged. However it happens none-the-less. This patch fixes the problem and is a reasonable thing to do, though it might hurt performance slightly in some cases. Until I can find the real problem, we should probably have this workaround in place. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-10 13:24:16 -07:00
Ingo Molnar	663d440eaa	[PATCH] lockdep: annotate blkdev nesting Teach special (recursive) locking code to the lock validator. Effects on non-lockdep kernels: - the introduction of the following function variants: extern struct block_device open_partition_by_devnum(dev_t, unsigned); extern int blkdev_put_partition(struct block_device ); static int blkdev_get_whole(struct block_device bdev, mode_t mode, unsigned flags); which on non-lockdep are the same as open_by_devnum(), blkdev_put() and blkdev_get(). - a subclass parameter to do_open(). [unused on non-lockdep] - a subclass parameter to __blkdev_put(), which is a new internal function for the main blkdev_put() functions. [parameter unused on non-lockdep kernels, except for two sanity check WARN_ON()s] these functions carry no semantical difference - they only express object dependencies towards the lockdep subsystem. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-03 15:27:10 -07:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Linus Torvalds	602cada851	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6: (22 commits) [PATCH] devfs: Remove it from the feature_removal.txt file [PATCH] devfs: Last little devfs cleanups throughout the kernel tree. [PATCH] devfs: Rename TTY_DRIVER_NO_DEVFS to TTY_DRIVER_DYNAMIC_DEV [PATCH] devfs: Remove the tty_driver devfs_name field as it's no longer needed [PATCH] devfs: Remove the line_driver devfs_name field as it's no longer needed [PATCH] devfs: Remove the videodevice devfs_name field as it's no longer needed [PATCH] devfs: Remove the gendisk devfs_name field as it's no longer needed [PATCH] devfs: Remove the miscdevice devfs_name field as it's no longer needed [PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree [PATCH] devfs: Remove devfs_remove() function from the kernel tree [PATCH] devfs: Remove devfs_mk_cdev() function from the kernel tree [PATCH] devfs: Remove devfs_mk_bdev() function from the kernel tree [PATCH] devfs: Remove devfs_mk_symlink() function from the kernel tree [PATCH] devfs: Remove devfs_mk_dir() function from the kernel tree [PATCH] devfs: Remove devfs_*_tape() functions from the kernel tree [PATCH] devfs: Remove devfs support from the sound subsystem [PATCH] devfs: Remove devfs support from the ide subsystem. [PATCH] devfs: Remove devfs support from the serial subsystem [PATCH] devfs: Remove devfs from the init code [PATCH] devfs: Remove devfs from the partition code ...	2006-06-29 14:19:21 -07:00
Adrian Bunk	cfb9e32f2f	[PATCH] drivers/md/raid5.c: remove an unused variable Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-29 10:26:21 -07:00
Greg Kroah-Hartman	890fbae281	[PATCH] devfs: Last little devfs cleanups throughout the kernel tree. Just removes a few unused #defines and fixes some comments due to devfs now being gone. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-06-26 12:25:09 -07:00
Greg Kroah-Hartman	ce7b0f46bb	[PATCH] devfs: Remove the gendisk devfs_name field as it's no longer needed And remove the now unneeded number field. Also fixes all drivers that set these fields. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-06-26 12:25:08 -07:00
Greg Kroah-Hartman	96192ff1a9	[PATCH] devfs: Remove the miscdevice devfs_name field as it's no longer needed Also fixes all drivers that set this field. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-06-26 12:25:08 -07:00
Greg Kroah-Hartman	ff23eca3e8	[PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree Also fixes up all files that #include it. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-06-26 12:25:08 -07:00
Greg Kroah-Hartman	8ab5e4c15b	[PATCH] devfs: Remove devfs_remove() function from the kernel tree Removes the devfs_remove() function and all callers of it. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-06-26 12:25:07 -07:00
Greg Kroah-Hartman	1a715c5cf9	[PATCH] devfs: Remove devfs_mk_bdev() function from the kernel tree Removes the devfs_mk_bdev() function and all callers of it. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-06-26 12:25:06 -07:00
Greg Kroah-Hartman	95dc112a57	[PATCH] devfs: Remove devfs_mk_dir() function from the kernel tree Removes the devfs_mk_dir() function and all callers of it. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-06-26 12:25:06 -07:00
Adrian Bunk	0538195424	[PATCH] drivers/md/md.c: make code static Make needlessly global code static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:40 -07:00
NeilBrown	f655675b3f	[PATCH] md: Allow the write_mostly flag to be set via sysfs It appears in /sys/mdX/md/dev-YYY/state and can be set or cleared by writing 'writemostly' or '-writemostly' respectively. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:40 -07:00
NeilBrown	a94213b1fa	[PATCH] md: Allow resync_start to be set and queried via sysfs Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:40 -07:00
NeilBrown	d4dbd0250e	[PATCH] md: Allow raid 'layout' to be read and set via sysfs Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:39 -07:00
NeilBrown	45dc2de1e5	[PATCH] md: Allow rdev state to be set via sysfs The md/dev-XXX/state file can now be written: "faulty" simulates an error on the device "remove" removes the device from the array (if it is not busy) Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:39 -07:00
NeilBrown	9e653b6342	[PATCH] md: Set/get state of array via sysfs This allows the state of an md/array to be directly controlled via sysfs and adds the ability to stop and array without tearing it down. Array states/settings: clear No devices, no size, no level Equivalent to STOP_ARRAY ioctl inactive May have some settings, but array is not active all IO results in error When written, doesn't tear down array, but just stops it suspended (not supported yet) All IO requests will block. The array can be reconfigured. Writing this, if accepted, will block until array is quiescent readonly no resync can happen. no superblocks get written. write requests fail read-auto like readonly, but behaves like 'clean' on a write request. clean - no pending writes, but otherwise active. When written to inactive array, starts without resync If a write request arrives then if metadata is known, mark 'dirty' and switch to 'active'. if not known, block and switch to write-pending If written to an active array that has pending writes, then fails. active fully active: IO and resync can be happening. When written to inactive array, starts with resync write-pending (not supported yet) clean, but writes are blocked waiting for 'active' to be written. active-idle like active, but no writes have been seen for a while (100msec). Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:39 -07:00
NeilBrown	4254376914	[PATCH] md: Don't write dirty/clean update to spares - leave them alone - record the 'event' count on each individual device (they might sometimes be slightly different now) - add a new value for 'sb_dirty': '3' means that the super block only needs to be updated to record a clean<->dirty transition. - Prefer odd event numbers for dirty states and even numbers for clean states - Using all the above, don't update the superblock on a spare device if the update is just doing a clean-dirty transition. To accomodate this, a transition from dirty back to clean might now decrement the events counter if nothing else has changed. The net effect of this is that spare drives will not see any IO requests during normal running of the array, so they can go to sleep if that is what they want to do. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:39 -07:00
NeilBrown	07d84d109d	[PATCH] md: Allow re-add to work on array without bitmaps When an array has a bitmap, a device can be removed and re-added and only blocks changes since the removal (as recorded in the bitmap) will be resynced. It should be possible to do a similar thing to arrays without bitmaps. i.e. if a device is removed and re-added and no changes have been made in the interim, then the add should not require a resync. This patch allows that option. This means that when assembling an array one device at a time (e.g. during device discovery) the array can be enabled read-only as soon as enough devices are available, but extra devices can still be added without causing a resync. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:39 -07:00
NeilBrown	3285edf152	[PATCH] md: Fix bug that stops raid5 resync from happening As data_disks is less than raid_disks, the current test here is obviously wrong. And as the difference is already available in conf->max_degraded, it makes much more sense to use that. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:39 -07:00
akpm@osdl.org	b3cc9ec76b	[PATCH] md: Fix Kconfig error RAID5 recently changed to RAID456 Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:39 -07:00
Justin Piszcz	4d2554d045	[PATCH] md: md Kconfig speeling feex I was experimenting with Linux SW raid today and found a spelling error when reading the help menus... (and fly spell found more). Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:39 -07:00
NeilBrown	8838832830	[PATCH] md: Calculate correct array size for raid10 in new offset mode The size calculation made assumtion which the new offset mode didn't follow. This gets the size right in all cases. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:39 -07:00
NeilBrown	ce25c31bdd	[PATCH] md: Change md/bitmap file handling to use bmap to file blocks-fix Fix problems with new bmap based access to bitmap files. 1/ When not using a file based bitmap, attach a NULL list of buffers to each page so the common free_buffer routine can cope. 2/ Use submit_bh to read as well as write, rather than vfs_read. This makes read and write more symetric. 3/ sync the file before reading, to ensure that the page cache has no dirty pages that might get written out later. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:38 -07:00
NeilBrown	d785a06a0b	[PATCH] md/bitmap: change md/bitmap file handling to use bmap to file blocks If md is asked to store a bitmap in a file, it tries to hold onto the page cache pages for that file, manipulate them directly, and call a cocktail of operations to write the file out. I don't believe this is a supportable approach. This patch changes the approach to use the same approach as swap files. i.e. bmap is used to enumerate all the block address of parts of the file and we write directly to those blocks of the device. swapfile only uses parts of the file that provide a full pages at contiguous addresses. We don't have that luxury so we have to cope with pages that are non-contiguous in storage. To handle this we attach buffers to each page, and store the addresses in those buffers. With this approach the pagecache may contain data which is inconsistent with what is on disk. To alleviate the problems this can cause, md invalidates the pagecache when releasing the file. If the file is to be examined while the array is active (a non-critical but occasionally useful function), O_DIRECT io must be used. And new version of mdadm will have support for this. This approach simplifies a lot of code: - we no longer need to keep a list of pages which we need to wait for, as the b_endio function can keep track of how many outstanding writes there are. This saves a mempool. - -EAGAIN returns from write_page are no longer possible (not sure if they ever were actually). Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:38 -07:00
NeilBrown	acc55e2201	[PATCH] md/bitmap: tidy up i_writecount handling in md/bitmap md/bitmap modifies i_writecount of a bitmap file to make sure that no-one else writes to it. The reverting of the change is sometimes done twice, and there is one error path where it is omitted. This patch tidies that up. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:38 -07:00
NeilBrown	0cdd02cabd	[PATCH] md/bitmap: remove dead code from md/bitmap bitmap_active is never called, and the BITMAP_ACTIVE flag is never users or tested, so discard them both. Also remove some out-of-date 'todo' comments. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:38 -07:00
NeilBrown	a647e4bc5c	[PATCH] md/bitmap: remove unnecessary page reference manipulations from md/bitmap code md/bitmap gets a collection of pages representing the bitmap when it initialises the bitmap, and puts all the references when discarding the bitmap. It also occasionally takes extra references without any good reason, and sometimes drops them ... though it doesn't always drop them, which can result in a memory leak. This patch removes the unnecessary 'get_page' calls, and the corresponding 'put_page' calls. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:38 -07:00
NeilBrown	e16b68b6e4	[PATCH] md/bitmap: use set_bit etc for bitmap page attributes In particular, this means that we use 4 bits per page instead of a whole unsigned long. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:38 -07:00
NeilBrown	ec7a3197f4	[PATCH] md/bitmap: cleaner separation of page attribute handlers in md/bitmap md/bitmap has some attributes per-page. Handling of these attributes in largely abstracted in set_page_attr and clear_page_attr. However get_page_attr exposes the format used to store them. So prior to changing that format, introduce test_page_attr instead of get_page_attr, and make appropriate usage changes. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:38 -07:00
NeilBrown	0b79ccf0cd	[PATCH] md/bitmap: remove bitmap writeback daemon md/bitmap currently has a separate thread to wait for writes to the bitmap file to complete (as we cannot get a callback on that action). However this isn't needed as bitmap_unplug is called from process context and waits for the writeback thread to do it's work. The same result can be achieved by doing the waiting directly in bitmap_unplug. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:38 -07:00
NeilBrown	d7375ab324	[PATCH] md/bitmap: fix online removal of file-backed bitmaps When "mdadm --grow /dev/mdX --bitmap=none" is used to remove a filebacked bitmap, the bitmap was disconnected from the array, but the file wasn't closed (until the array was stopped). The file also wasn't closed if adding the bitmap file failed. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:38 -07:00
NeilBrown	52c03291a8	[PATCH] md: split reshape portion of raid5 sync_request into a separate function ... as raid5 sync_request is WAY too big. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:37 -07:00
Adrian Bunk	5e56341d02	[PATCH] md: make md_print_devices() static This patch makes the needlessly global md_print_devices() static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:37 -07:00
NeilBrown	c93983bf51	[PATCH] md: support stripe/offset mode in raid10 The "industry standard" DDF format allows for a stripe/offset layout where data is duplicated on different stripes. e.g. A B C D D A B C E F G H H E F G (columns are drives, rows are stripes, LETTERS are chunks of data). This is similar to raid10's 'far' mode, but not quite the same. So enhance 'far' mode with a 'far/offset' option which follows the layout of DDFs stripe/offset. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:37 -07:00
NeilBrown	7c7546ccf6	[PATCH] md: allow a linear array to have drives added while active Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:37 -07:00
NeilBrown	5fd6c1dce0	[PATCH] md: allow checkpoint of recovery with version-1 superblock For a while we have had checkpointing of resync. The version-1 superblock allows recovery to be checkpointed as well, and this patch implements that. Due to early carelessness we need to add a feature flag to signal that the recovery_offset field is in use, otherwise older kernels would assume that a partially recovered array is in fact fully recovered. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:37 -07:00
NeilBrown	a8a55c387d	[PATCH] md: remove nuisance message at shutdown At shutdown, we switch all arrays to read-only, which creates a message for every instantiated array, even those which aren't actually active. So remove the message for non-active arrays. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:37 -07:00
NeilBrown	16a53ecc35	[PATCH] md: merge raid5 and raid6 code There is a lot of commonality between raid5.c and raid6main.c. This patches merges both into one module called raid456. This saves a lot of code, and paves the way for online raid5->raid6 migrations. There is still duplication, e.g. between handle_stripe5 and handle_stripe6. This will probably be cleaned up later. Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:37 -07:00
NeilBrown	16f17b39f3	[PATCH] md: increase the delay before marking metadata clean, and make it configurable When a md array has been idle (no writes) for 20msecs it is marked as 'clean'. This delay turns out to be too short for some real workloads. So increase it to 200msec (the time to update the metadata should be a tiny fraction of that) and make it sysfs-configurable. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:37 -07:00
NeilBrown	9443a1d1f7	[PATCH] md: remove useless ioctl warning This warning was slightly useful back in 2.2 days, but is more an annoyance now. It makes it awkward to add new ioctls (that we we are likely to do that in the current climate, but it is possible). Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:36 -07:00
NeilBrown	8932c2e0dc	[PATCH] md: remove arbitrary limit on chunk size The largest chunk size the code can support without substantial surgery is 2^30 bytes, so make that the limit instead of an arbitrary 4Meg. Some day, the 'chunksize' should change to a sector-shift instead of a byte-count. Then no limit would be needed. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:36 -07:00
NeilBrown	c70810b327	[PATCH] md: reformat code in raid1_end_write_request to avoid goto A recent change made this goto unnecessary, so reformat the code to make it clearer what is happening. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:36 -07:00
Alasdair G Kergon	72d9486169	[PATCH] dm: improve error message consistency Tidy device-mapper error messages to include context information automatically. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:36 -07:00
Alasdair G Kergon	5c6bd75d06	[PATCH] dm: prevent removal if open If you misuse the device-mapper interface (or there's a bug in your userspace tools) it's possible to end up with 'unlinked' mapped devices that cannot be removed until you reboot (along with uninterruptible processes). This patch prevents you from removing a device that is still open. It introduces dm_lock_for_deletion() which is called when a device is about to be removed to ensure that nothing has it open and nothing further can open it. It uses a private open_count for this which also lets us remove one of the problematic bdget_disk() calls elsewhere. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:36 -07:00
David Teigland	c2ade42dd3	[PATCH] dm: create error table Add a library function dm_create_error_table() to create a table that rejects any I/O sent to a device with EIO. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:36 -07:00
Alasdair G Kergon	17b2f66f2a	[PATCH] dm: add exports Move definitions of core device-mapper functions for manipulating mapped devices and their tables to <linux/device-mapper.h> advertising their availability for use elsewhere in the kernel. Protect the contents of device-mapper.h with ifdef __KERNEL__. And throw in a few formatting clean-ups and extra comments. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:36 -07:00
Alasdair G Kergon	2b06cfff12	[PATCH] dm: consolidate creation functions Merge dm_create() and dm_create_with_minor() by introducing the special value DM_ANY_MINOR to request the allocation of the next available minor number. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:36 -07:00
David Teigland	814d68629b	[PATCH] dm table split_args: handle no input Return sense if dm_split_args is called with a NULL input parameter. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:36 -07:00
Jonathan Brassow	ce503f59ae	[PATCH] dm kcopyd: error accumulation fix kcopyd should accumulate errors - otherwise I/O failures may be ignored unintentionally. And invert 'success' (used in a future patch), using a more intuitive !(read_err \|\| write_err). Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:35 -07:00
Alasdair G Kergon	8a835f11bc	[PATCH] dm mirror log: sync_count fix When a mirror is reduced in size, clear the part of the bitmap that is no longer used. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:35 -07:00
Alasdair G Kergon	29121bd0b0	[PATCH] dm mirror log: bitset_size fix Fix the 'sizeof' in the region log bitmap size calculation: it's uint32_t, not unsigned long - this breaks on some archs. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:35 -07:00
Alasdair G Kergon	b7cca195c4	[PATCH] dm mirror log: refactor context Refactor the code that creates the core and disk log contexts to avoid the repeated allocation of clean_bits introduced by the last patch. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:35 -07:00
Kevin Corry	702ca6f0be	[PATCH] dm mirror log: sector size fix On-disk logs for dm-mirror devices are currently hard-coded to use 512 byte hard-sector-sizes. This patch fixes dm-log so it will work with devices with non-512-byte hard-sector-sizes. To maintain full compatibility, instead of moving the clean-bits bitset to a bitset, and enlarges the disk-header buffer to encompass both the header and the bitset. The I/O routines for the bitset are removed, and the I/O routines for the disk-header now also read/write the bitset. Signed-off-by: Kevin Corry <kevcorry@us.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:35 -07:00
Milan Broz	143535396c	[PATCH] dm table: get_target: fix last index The table is indexed from 0, so an index equal to t->num_targets should be rejected. (There is no code in the current tree that would exercise this bug.) Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:35 -07:00
Neil Brown	e4c8b3ba34	[PATCH] dm: mirror sector offset fix The device-mapper core does not perform any remapping of bios before passing them to the targets. If a particular mapping begins part-way into a device, targets obtain the sector relative to the start of the mapping by subtracting ti->begin. The dm-raid1 target didn't do this everywhere: this patch fixes it, taking care to subtract ti->begin exactly once for each bio. [akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled] Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:35 -07:00
Jeff Mahoney	f0b0411536	[PATCH] dm: fix block device initialisation In alloc_dev(), we register the device with the block layer and then continue to initialize the device. But register_disk() makes the device available to be opened before we have completed initialising it. This patch moves the final bits of the initialization above the disk registration. [akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:35 -07:00
Jeff Mahoney	10da4f795f	[PATCH] dm: add module ref counting The reference counting on dm-mod is zero if no mapped devices are open. This is incorrect, and can lead to an oops if the module is unloaded while mapped devices exist. This patch claims a reference to the module whenever a device is created, and drops it again when the device is freed. Devices must be removed before dm-mod is unloaded. [akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:35 -07:00
Jeff Mahoney	7ec75f2547	[PATCH] dm: fix mapped device ref counting To avoid races, _minor_lock must be held while changing mapped device reference counts. There are a few paths where a mapped_device pointer is returned before a reference is taken. This patch fixes them. [akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:35 -07:00
Jeff Mahoney	fba9f90e56	[PATCH] dm: add DMF_FREEING There is a chicken and egg problem between the block layer and dm in which the gendisk associated with a mapping keeps a reference-less pointer to the mapped_device. This patch uses a new flag DMF_FREEING to indicate when the mapped_device is no longer valid. This is checked to prevent any attempt to open the device from succeeding while the device is being destroyed. [akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:34 -07:00
Jeff Mahoney	f32c10b099	[PATCH] dm: change minor_lock to spinlock While removing a device, another another thread might attempt to resurrect it. This patch replaces the _minor_lock mutex with a spinlock and uses atomic_dec_and_lock() to serialize reference counting in dm_put(). [akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:34 -07:00
Jeff Mahoney	62f75c2f32	[PATCH] dm: move idr_pre_get idr_pre_get() can sleep while allocating memory. The next patch will change _minor_lock into a spinlock, so this patch moves idr_pre_get() outside the lock in preparation. [akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:34 -07:00
Jeff Mahoney	ba61fdd17d	[PATCH] dm: fix idr minor allocation One part of the system can attempt to use a mapped device before another has finished initialising it or while it is being freed. This patch introduces a place holder value, MINOR_ALLOCED, to mark the minor as allocated but in a state where it can't be used, such as mid-allocation or mid-free. At the end of the initialization, it replaces the place holder with the pointer to the mapped_device, making it available to the rest of the dm subsystem. [akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled] Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:34 -07:00
Alasdair G Kergon	c51c275249	[PATCH] dm snapshot: unify chunk_size Persistent snapshots currently store a private copy of the chunk size. Userspace also supplies the chunk size when loading a snapshot. Ensure consistency by only storing the chunk_size in one place instead of two. Currently the two sizes will differ if the chunk size supplied by userspace does not match the chunk size an existing snapshot actually uses. Amongst other problems, this causes an incorrect 'percentage full' to be reported. The patch ensures consistency by only storing the chunk_size in one place, removing it from struct pstore. Some initialisation is delayed until the correct chunk_size is known. If read_header() discovers that the wrong chunk size was supplied, the 'area' buffer (which the header already got read into) is reinitialised to the correct size. [akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled] Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:34 -07:00
Akinobu Mita	179e09172a	[PATCH] drivers: use list_move() This patch converts the combination of list_del(A) and list_add(A, B) to list_move(A, B) under drivers/. Acked-by: Corey Minyard <minyard@mvista.com> Cc: Ben Collins <bcollins@debian.org> Acked-by: Roland Dreier <rolandd@cisco.com> Cc: Alasdair Kergon <dm-devel@redhat.com> Cc: Gerd Knorr <kraxel@bytesex.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Frank Pavlic <fpavlic@de.ibm.com> Acked-by: Matthew Wilcox <matthew@wil.cx> Cc: Andrew Vasquez <linux-driver@qlogic.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-26 09:58:18 -07:00
Adrian Bunk	a5d6839b75	[PATCH] drivers/md/raid6algos.c: fix a NULL dereference This patch fixes a NULL dereference spotted by the Coverity checker. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-23 07:43:08 -07:00
NeilBrown	c331eb04b9	[PATCH] md: Fix badness in sysfs_notify caused by md_new_event From: NeilBrown <neilb@suse.de> If an error is reported by a drive in a RAID array (which is done via bi_end_io - in interrupt context), we call md_error and md_new_event which calls sysfs_notify. However sysfs_notify grabs a mutex and so cannot be called in interrupt context. This patch just creates a variant of md_new_event which avoids the sysfs call, and uses that. A better fix for later is to arrange for the event to be called from user-context. Note: avoiding the sysfs call isn't a problem as an error will not, by itself, modify the sync_action attribute. (We do still need to wake_up(&md_event_waiters) as an error by itself will modify /proc/mdstat). Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-31 16:27:11 -07:00
Neil Brown	c71d48877e	[PATCH] Unlock md devices when stopping them on reboot. otherwise we get nasty messages about locks not being released. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-26 11:52:11 -07:00
NeilBrown	5c4c33318d	[PATCH] md: fix possible oops when starting a raid0 array This loop that sets up the hash_table has problems. Careful examination will show that the last time through, everything but the first line is pointless. This is because all it does is change 'cur' and 'size' and neither of these are used after the loop. This should ring warning bells... That last time through the loop, size += conf->strip_zone[cur].size can index off the end of the strip_zone array. Depending on what it finds there, it might exit the loop cleanly, or it might spin going further and further beyond the array until it hits an unmapped address. This patch rearranges the code so that the last, pointless, iteration of the loop never happens. i.e. the one statement of the last loop that is needed is moved the the end of the previous loop - or to before the loop starts - and the loop counter starts from 1 instead of 0. Cc: "Don Dupuis" <dondster@gmail.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-23 10:35:31 -07:00
NeilBrown	2adc7d47c4	[PATCH] md: Fix inverted test for 'repair' directive. We should be able to write 'repair' to /sys/block/mdX/md/sync_action, however due to and inverted test, that always given EINVAL. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-21 12:59:17 -07:00
NeilBrown	5e7dd2ab6b	[PATCH] md: Fix 'rdev->nr_pending' count when retrying barrier requests When retrying a failed BIO_RW_BARRIER request, we need to keep the reference in ->nr_pending over the whole retry. Currently, we only hold the reference if the failed request is the last one to finish - which is silly, because it would normally be the first to finish. So move the rdev_dec_pending call up into the didn't-fail branch. As the rdev isn't used in the later code, calling rdev_dec_pending earlier doesn't hurt. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-01 18:17:42 -07:00
NeilBrown	62de608da0	[PATCH] md: Improve detection of lack of barrier support in raid1 Move the test for 'do barrier work' down a bit so that if the first write to a raid1 is a BIO_RW_BARRIER write, the checking done by superblock writes will cause the right thing to happen. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-01 18:17:42 -07:00
NeilBrown	bea2771871	[PATCH] md: Change ENOTSUPP to EOPNOTSUPP Because that is what you get if a BIO_RW_BARRIER isn't supported! Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-01 18:17:42 -07:00
NeilBrown	e0a33270ed	[PATCH] md: Fixed refcounting/locking when attempting read error correction in raid10 We need to hold a reference to rdevs while reading and writing to attempt to correct read errors. This reference must be taken under an rcu lock. Signed-off-by: Neil Brown <neilb@suse.de> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-01 18:17:42 -07:00
NeilBrown	df30d0f4ca	[PATCH] md: Avoid oops when attempting to fix read errors on raid10 We should add to the counter for the rdev after checking if the rdev is NULL!!! Signed-off-by: Neil Brown <neilb@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-05-01 18:17:42 -07:00
Ingo Molnar	5dc5cf7dd2	[PATCH] md: locking fix - fix mddev_lock() usage bugs in md_attr_show() and md_attr_store(). [they did not anticipate the possibility of getting a signal] - remove mddev_lock_uninterruptible() [unused] Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-04-20 07:54:04 -07:00
NeilBrown	4508a7a734	[PATCH] sysfs: Allow sysfs attribute files to be pollable It works like this: Open the file Read all the contents. Call poll requesting POLLERR or POLLPRI (so select/exceptfds works) When poll returns, close the file and go to top of loop. or lseek to start of file and go back to the 'read'. Events are signaled by an object manager calling sysfs_notify(kobj, dir, attr); If the dir is non-NULL, it is used to find a subdirectory which contains the attribute (presumably created by sysfs_create_group). This has a cost of one int per attribute, one wait_queuehead per kobject, one int per open file. The name "sysfs_notify" may be confused with the inotify functionality. Maybe it would be nice to support inotify for sysfs attributes as well? This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action to be pollable Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-04-14 11:41:24 -07:00
NeilBrown	6f91fe88e4	[PATCH] md: make sure 64bit fields in version-1 metadata are 64-bit aligned reshape_position is a 64bit field that was not 64bit aligned. So swap with new_level. NOTE: this is a user-visible change. However: - The bad code has not appeared in a released kernel - This code is still marked 'experimental' - This only affects version-1 superblock, which are not in wide use - These field are only used (rather than simply reported) by user-space tools in extemely rare circumstances : after a reshape crashes in the first second of the reshape process. So I believe that, at this stage, the change is safe. Especially if people heed the 'help' message on use mdadm-2.4.1. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-04-11 06:18:30 -07:00
Eric Sesterhenn	b638548384	BUG_ON() Conversion in md/raid10.c this changes if() BUG(); constructs to BUG_ON() which is cleaner and can better optimized away Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-04-02 13:34:29 +02:00
Eric Sesterhenn	43dab9bbe9	BUG_ON() Conversion in md/raid6main.c this changes if() BUG(); constructs to BUG_ON() which is cleaner and can better optimized away Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-04-02 13:33:30 +02:00
Eric Sesterhenn	78bafebd46	BUG_ON() Conversion in md/raid5.c this changes if() BUG(); constructs to BUG_ON() which is cleaner and can better optimized away Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-04-02 13:31:42 +02:00
Eric Sesterhenn	9e77c485f7	BUG_ON() Conversion in md/raid1.c this changes if() BUG(); constructs to BUG_ON() which is cleaner and can better optimized away Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-04-01 01:08:49 +02:00
Eric Sesterhenn	543cb2a451	BUG_ON() Conversion in md/dm-target.c this changes if() BUG(); constructs to BUG_ON() which is cleaner and can better optimized away Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-04-01 01:08:12 +02:00
NeilBrown	ec350a7fc1	[PATCH] md: Raid-6 did not create sysfs entries for stripe cache Signed-off-by: Brad Campbell <brad@wasp.net.au> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-31 12:19:01 -08:00
NeilBrown	926ce2d8a7	[PATCH] md: Remove some code that can sleep from under a spinlock And remove the comments that were put in inplace of a fix too.... Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-31 12:19:01 -08:00
NeilBrown	6b1117d505	[PATCH] md: Don't clear bits in bitmap when writing to one device fails during recovery Currently a device failure during recovery leaves bits set in the bitmap. This normally isn't a problem as the offending device will be rejected because of errors. However if device re-adding is being used with non-persistent bitmaps, this can be a problem. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-31 12:19:01 -08:00
NeilBrown	df5b89b323	[PATCH] md: Convert reconfig_sem to reconfig_mutex ... being careful that mutex_trylock is inverted wrt down_trylock Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:03 -08:00
Arjan van de Ven	48c9c27b8b	[PATCH] sem2mutex: drivers/md Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:03 -08:00
NeilBrown	2f889129de	[PATCH] md: Restore 'remaining' count when retrying an write operation When retrying a write due to barrier failure, we don't reset 'remaining', so it goes negative and never hits 0 again. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:03 -08:00
NeilBrown	8ddeeae51f	[PATCH] md: Fix md grow/size code to correctly find the maximum available space An md array can be asked to change the amount of each device that it is using, and in particular can be asked to use the maximum available space. This currently only works if the first device is not larger than the rest. As 'size' gets changed and so 'fit' becomes wrong. So check if a 'fit' is required early and don't corrupt it. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:03 -08:00
NeilBrown	f6344757a9	[PATCH] md: Remove bi_end_io call out from under a spinlock raid5 overloads bi_phys_segments to count the number of blocks that the request was broken in to so that it knows when the bio is completely handled. Accessing this must always be done under a spinlock. In one case we also call bi_end_io under that spinlock, which probably isn't ideal as bi_end_io could be expensive (even though it isn't allowed to sleep). So we reducde the range of the spinlock to just accessing bi_phys_segments. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:03 -08:00
NeilBrown	b3b46be38a	[PATCH] md: Remove some stray semi-colons after functions called in macro.. wait_event_lock_irq puts a ';' after its usage of the 4th arg, so we don't need to. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:02 -08:00
NeilBrown	df8e7f7639	[PATCH] md: Improve comments about locking situation in raid5 make_request Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:02 -08:00
NeilBrown	e464eafdb4	[PATCH] md: Support suspending of IO to regions of an md array This allows user-space to access data safely. This is needed for raid5 reshape as user-space needs to take a backup of the first few stripes before allowing reshape to commence. It will also be useful in cluster-aware raid1 configurations so that all cluster members can leave a section of the array untouched while a resync/recovery happens. A 'start' and 'end' of the suspended range are written to 2 sysfs attributes. Note that only one range can be suspended at a time. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:02 -08:00
NeilBrown	16484bf596	[PATCH] md: Make 'reshape' a possible sync_action action This allows reshape to be triggerred via sysfs (which is the only way to start it happening). Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:02 -08:00
NeilBrown	63c70c4f3a	[PATCH] md: Split reshape handler in check_reshape and start_reshape check_reshape checks validity and does things that can be done instantly - like adding devices to raid1. start_reshape initiates a restriping process to convert the whole array. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:02 -08:00
NeilBrown	b578d55fdd	[PATCH] md: Only checkpoint expansion progress occasionally Instead of checkpointing at each stripe, only checkpoint when a new write would overwrite uncheckpointed data. Block any write to the uncheckpointed area. Arbitrarily checkpoint at least every 3Meg. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:02 -08:00
NeilBrown	f67055780c	[PATCH] md: Checkpoint and allow restart of raid5 reshape We allow the superblock to record an 'old' and a 'new' geometry, and a position where any conversion is up to. The geometry allows for changing chunksize, layout and level as well as number of devices. When using verion-0.90 superblock, we convert the version to 0.91 while the conversion is happening so that an old kernel will refuse the assemble the array. For version-1, we use a feature bit for the same effect. When starting an array we check for an incomplete reshape and restart the reshape process if needed. If the reshape stopped at an awkward time (like when updating the first stripe) we refuse to assemble the array, and let user-space worry about it. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:01 -08:00
NeilBrown	292695531a	[PATCH] md: Final stages of raid5 expand code This patch adds raid5_reshape and end_reshape which will start and finish the reshape processes. raid5_reshape is only enabled in CONFIG_MD_RAID5_RESHAPE is set, to discourage accidental use. Read the 'help' for the CONFIG_MD_RAID5_RESHAPE entry. and Make sure that you have backups, just in case. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:01 -08:00
NeilBrown	ccfcc3c10b	[PATCH] md: Core of raid5 resize process This patch provides the core of the resize/expand process. sync_request notices if a 'reshape' is happening and acts accordingly. It allocated new stripe_heads for the next chunk-wide-stripe in the target geometry, marking them STRIPE_EXPANDING. Then it finds which stripe heads in the old geometry can provide data needed by these and marks them STRIPE_EXPAND_SOURCE. This causes stripe_handle to read all blocks on those stripes. Once all blocks on a STRIPE_EXPAND_SOURCE stripe_head are read, any that are needed are copied into the corresponding STRIPE_EXPANDING stripe_head. Once a STRIPE_EXPANDING stripe_head is full, it is marks STRIPE_EXPAND_READY and then is written out and released. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:01 -08:00
NeilBrown	7ecaa1e6a1	[PATCH] md: Infrastructure to allow normal IO to continue while array is expanding We need to allow that different stripes are of different effective sizes, and use the appropriate size. Also, when a stripe is being expanded, we must block any IO attempts until the stripe is stable again. Key elements in this change are: - each stripe_head gets a 'disk' field which is part of the key, thus there can sometimes be two stripe heads of the same area of the array, but covering different numbers of devices. One of these will be marked STRIPE_EXPANDING and so won't accept new requests. - conf->expand_progress tracks how the expansion is progressing and is used to determine whether the target part of the array has been expanded yet or not. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:01 -08:00
NeilBrown	ad01c9e375	[PATCH] md: Allow stripes to be expanded in preparation for expanding an array Before a RAID-5 can be expanded, we need to be able to expand the stripe-cache data structure. This requires allocating new stripes in a new kmem_cache. If this succeeds, we copy cache pages over and release the old stripes and kmem_cache. We then allocate new pages. If that fails, we leave the stripe cache at it's new size. It isn't worth the effort to shrink it back again. Unfortuanately this means we need two kmem_cache names as we, for a short period of time, we have two kmem_caches. So they are raid5/%s and raid5/%s-alt Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:01 -08:00
NeilBrown	b55e6bfcd2	[PATCH] md: Split disks array out of raid5 conf structure so it is easier to grow The remainder of this batch implements raid5 reshaping. Currently the only shape change that is supported is added a device, but it is envisioned that changing the chunksize and layout will also be supported, as well as changing the level (e.g. 1->5, 5->6). The reshape process naturally has to move all of the data in the array, and so should be used with caution. It is believed to work, and some testing does support this, but wider testing would be great for increasing my confidence. You will need a version of mdadm newer than 2.3.1 to make use of raid5 growth. This is because mdadm need to take a copy of a 'critical section' at the start of the array incase there is a crash at an awkward moment. On restart, mdadm will restore the critical section and allow reshape to continue. I hope to release a 2.4-pre by early next week - it still needs a little more polishing. This patch: Previously the array of disk information was included in the raid5 'conf' structure which was allocated to an appropriate size. This makes it awkward to change the size of that array. So we split it off into a separate kmalloced array which will require a little extra indexing, but is much easier to grow. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:01 -08:00
NeilBrown	4588b42e9d	[PATCH] md: Update status_resync to handle LARGE devices status_resync - used by /proc/mdstat to report the status of a resync, assumes that device sizes will always fit into an 'unsigned long' This is no longer the case... Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:01 -08:00
NeilBrown	1be7892fff	[PATCH] md: Fix the 'failed' count for version-0 superblocks We are counting failed devices twice, once of the device that is failed, and once for the hole that has been left in the array. Remove the former so 'failed' matches 'missing'. Storing these counts in the superblock is a bit silly anyway.... Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:00 -08:00
NeilBrown	c5a10f62c5	[PATCH] md: Add '4' to the list of levels for which bitmaps are supported I really should make this a function of the personality.... Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:00 -08:00
NeilBrown	89e5c8b5b8	[PATCH] md: Make sure QUEUE_FLAG_CLUSTER is set properly for md. This flag should be set for a virtual device iff it is set for all underlying devices. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:00 -08:00
Kevin Corry	a22c96c737	[PATCH] dm: remove unnecessary typecast Signed-off-by: Kevin Corry <kevcorry@us.ibm.com> Cc: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:00 -08:00
Jun'ichi Nomura	f165921df4	[PATCH] dm/md dependency tree in sysfs: dm to use bd_claim_by_disk Use bd_claim_by_disk. Following symlinks are created if dm-0 maps to sda: /sys/block/dm-0/slaves/sda --> /sys/block/sda /sys/block/sda/holders/dm-0 --> /sys/block/dm-0 Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:00 -08:00
Jun'ichi Nomura	5463c7904c	[PATCH] dm/md dependency tree in sysfs: md to use bd_claim_by_disk Use bd_claim_by_disk. Following symlinks are created if md0 is built from sda and sdb /sys/block/md0/slaves/sda --> /sys/block/sda /sys/block/md0/slaves/sdb --> /sys/block/sdb /sys/block/sda/holders/md0 --> /sys/block/md0 /sys/block/sdb/holders/md0 --> /sys/block/md0 Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:45:00 -08:00
Darrick J. Wong	3ac51e741a	[PATCH] dm store geometry Allow drive geometry to be stored with a new DM_DEV_SET_GEOMETRY ioctl. Device-mapper will now respond to HDIO_GETGEO. If the geometry information is not available, zero will be returned for all of the parameters. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:59 -08:00
Mike Anderson	1134e5ae79	[PATCH] dm table: store md Store an up-pointer to the owning struct mapped_device in every table when it is created. Access it with: struct mapped_device dm_table_get_md(struct dm_table t) Tables linked to md must be destroyed before the md itself. Signed-off-by: Mike Anderson <andmike@us.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:59 -08:00
Alasdair G Kergon	9ade92a9a5	[PATCH] dm: tidy mdptr Change dm_get_mdptr() to take a struct mapped_device instead of dev_t. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:59 -08:00
Mike Anderson	7e51f257e8	[PATCH] dm: store md name The patch stores a printable device number in struct mapped_device for use in warning messages and with a proposed netlink interface. Signed-off-by: Mike Anderson <andmike@us.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:59 -08:00
Jun'ichi Nomura	1ecac7fd74	[PATCH] dm flush queue EINTR If dm_suspend() is cancelled, bios already added to the deferred list need to be submitted. Otherwise they remain 'in limbo' until there's a dm_resume(). Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:59 -08:00
Alasdair G Kergon	138728dc96	[PATCH] dm snapshot: fix kcopyd destructor Before removing a snapshot, wait for the completion of any kcopyd jobs using it. Do this by maintaining a count (nr_jobs) of how many outstanding jobs each kcopyd_client has. The snapshot destructor first unregisters the snapshot so that no new kcopyd jobs (created by writes to the origin) will reference that particular snapshot. kcopyd_client_destroy() is now run next to wait for the completion of any outstanding jobs before the snapshot exception structures (that those jobs reference) are freed. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:59 -08:00
NeilBrown	969429b504	[PATCH] dm: make sure QUEUE_FLAG_CLUSTER is set properly This flag should be set for a virtual device iff it is set for all underlying devices. Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:59 -08:00
Andrew Morton	4ee218cd67	[PATCH] dm: remove SECTOR_FORMAT We don't know what type sector_t has. Sometimes it's unsigned long, sometimes it's unsigned long long. For example on ppc64 it's unsigned long with CONFIG_LBD=n and on x86_64 it's unsigned long long with CONFIG_LBD=n. The way to handle all of this is to always use unsigned long long and to always typecast the sector_t when printing it. Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:58 -08:00
Jun'ichi Nomura	930d332a23	[PATCH] drivers/md/dm-raid1.c: Fix inconsistent mirroring after interrupted recovery dm-mirror has potential data corruption problem: while on-disk log shows that all disk contents are in-sync, actual contents of the disks are not synchronized. This problem occurs if initial recovery (synching) is interrupted and resumed. Attached patch fixes this problem. Background: rh_dec() changes the region state from RH_NOSYNC (out-of-sync) to RH_CLEAN (in-sync), which results in the corresponding bit of clean_bits being set. This is harmful if on-disk log is used and the map is removed/suspended before the initial sync is completed. The clean_bits is written down to the on-disk log at the map removal, and, upon resume, it's read and copied to sync_bits. Since the recovery process refers to the sync_bits to find a region to be recovered, the region whose state was changed from RH_NOSYNC to RH_CLEAN is no longer recovered. If you haven't applied dm-raid1-read-balancing.patch proposed in dm-devel sometimes ago, the contents of the mirrored disk just corrupt silently. If you have, balanced read may get bogus data from out-of-sync disks. The patch keeps RH_NOSYNC state unchanged. It will be changed to RH_RECOVERING when recovery starts and get reclaimed when the recovery completes. So it doesn't leak the region hash entry. Description: Keep RH_NOSYNC state unchanged when I/O on the region completes. rh_dec() changes the region state from RH_NOSYNC (out-of-sync) to RH_CLEAN (in-sync), which results in the corresponding bit of clean_bits being set. This is harmful if on-disk log is used and the map is removed/suspended before the initial sync is completed. The clean_bits is written down to the on-disk log at the map removal, and, upon resume, it's read and copied to sync_bits. Since the recovery process refers to the sync_bits to find a region to be recovered, the region whose state was changed from RH_NOSYNC to RH_CLEAN is no longer recovered. If you haven't applied dm-raid1-read-balancing.patch proposed in dm-devel sometimes ago, the contents of the mirrored disk just corrupt silently. If you have, balanced read may get bogus data from out-of-sync disks. The RH_NOSYNC region will be changed to RH_RECOVERING when recovery starts on the region and get reclaimed when the recovery completes. So it doesn't leak the region hash entry. Alasdair said: I've analysed the relevant part of the state machine and I believe that the patch is correct. (Further work on this code is still needed - this patch has the side-effect of holding onto memory unnecessarily for long periods of time under certain workloads - but better that than corrupting data.) Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:58 -08:00
Alasdair G Kergon	76df1c651b	[PATCH] device-mapper snapshot: fix invalidation When a snapshot becomes invalid, s->valid is set to 0. In this state, a snapshot can no longer be accessed. When s->lock is acquired, before doing anything else, s->valid must be checked to ensure the snapshot remains valid. This patch eliminates some races (that may cause panics) by adding some missing checks. At the same time, some unnecessary levels of indentation are removed and snapshot invalidation is moved into a single function that always generates a device-mapper event. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:58 -08:00
Alasdair G Kergon	b4b610f684	[PATCH] device-mapper snapshot: replace sibling list The siblings "list" is used unsafely at the moment. Firstly, only the element on the list being changed gets locked (via the snapshot lock), not the next and previous elements which have pointers that are also being changed. Secondly, if you have two or more snapshots and write to the same chunk a second time before every snapshot has finished making its private copy of the data, if you're unlucky, _origin_write() could attempt its list_merge() and dereference a 'last' pointer to a pending_exception structure that has just been freed. Analysis reveals that the list is actually only there for reference counting. If 5 pending_exceptions are needed in origin_write, then the 5 are joined together into a 5-element list - without a separate list head because there's nowhere suitable to store it. As the pending_exceptions complete, they are removed from the list one-by-one and any contents of origin_bios get moved across to one of the remaining pending_exceptions on the list. Whichever one is last is detected because list_empty() is then true and the origin_bios get submitted. The fix proposed here uses an alternative reference counting mechanism by choosing one of the pending_exceptions as primary and maintaining an atomic counter there. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:58 -08:00
Alasdair G Kergon	eccf081799	[PATCH] device-mapper snapshot: fix origin_write pending_exception submission Say you have several snapshots of the same origin and then you issue a write to some place in the origin for the first time. Before the device-mapper snapshot target lets the write go through to the underlying device, it needs to make a copy of the data that is about to be overwritten. Each snapshot is independent, so it makes one copy for each snapshot. __origin_write() loops through each snapshot and checks to see whether a copy is needed for that snapshot. (A copy is only needed the first time that data changes.) If a copy is needed, the code allocates a 'pending_exception' structure holding the details. It links these together for all the snapshots, then works its way through this list and submits the copying requests to the kcopyd thread by calling start_copy(). When each request is completed, the original pending_exception structure gets freed in pending_complete(). If you're very unlucky, this structure can get freed before the submission process has finished walking the list. This patch: 1) Creates a new temporary list pe_queue to hold the pending exception structures; 2) Does all the bookkeeping up-front, then walks through the new list safely and calls start_copy() for each pending_exception that needed it; 3) Avoids attempting to add pe->siblings to the list if it's already connected. [NB This does not fix all the races in this code. More patches will follow.] Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:58 -08:00
Linus Torvalds	9ae21d1bb3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: drivers/char/ftape/lowlevel/fdc-io.c: Correct a comment Kconfig help: MTD_JEDECPROBE already supports Intel Remove ugly debugging stuff do_mounts.c: Minor ROOT_DEV comment cleanup BUG_ON() Conversion in drivers/s390/block/dasd_devmap.c BUG_ON() Conversion in mm/mempool.c BUG_ON() Conversion in mm/memory.c BUG_ON() Conversion in kernel/fork.c BUG_ON() Conversion in ipc/sem.c BUG_ON() Conversion in fs/ext2/ BUG_ON() Conversion in fs/hfs/ BUG_ON() Conversion in fs/dcache.c BUG_ON() Conversion in fs/buffer.c BUG_ON() Conversion in input/serio/hp_sdc_mlc.c BUG_ON() Conversion in md/dm-table.c BUG_ON() Conversion in md/dm-path-selector.c BUG_ON() Conversion in drivers/isdn BUG_ON() Conversion in drivers/char BUG_ON() Conversion in drivers/mtd/	2006-03-26 09:41:18 -08:00
Matthew Dobson	93d2341c75	[PATCH] mempool: use mempool_create_slab_pool() Modify well over a dozen mempool users to call mempool_create_slab_pool() rather than calling mempool_create() with extra arguments, saving about 30 lines of code and increasing readability. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:57:00 -08:00
Matthew Dobson	26b6e051bc	[PATCH] mempool: use common mempool kzalloc allocator This patch changes a mempool user, which is basically just a wrapper around kzalloc(), to use the common mempool_kmalloc/kfree, rather than its own wrapper function, removing duplicated code. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:56:59 -08:00
Matthew Dobson	0eaae62aba	[PATCH] mempool: use common mempool kmalloc allocator This patch changes several mempool users, all of which are basically just wrappers around kmalloc(), to use the common mempool_kmalloc/kfree, rather than their own wrapper function, removing a bunch of duplicated code. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:56:59 -08:00
Matthew Dobson	a19b27ce38	[PATCH] mempool: use common mempool page allocator Convert two mempool users that currently use their own mempool-backed page allocators to use the generic mempool page allocator. Also included are 2 trivial whitespace fixes. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:56:59 -08:00
Ingo Molnar	14cc3e2b63	[PATCH] sem2mutex: misc static one-file mutexes Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jens Axboe <axboe@suse.de> Cc: Neil Brown <neilb@cse.unsw.edu.au> Acked-by: Alasdair G Kergon <agk@redhat.com> Cc: Greg KH <greg@kroah.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Adam Belay <ambx1@neo.rr.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:56:55 -08:00
Eric Sesterhenn	547bc92649	BUG_ON() Conversion in md/dm-table.c this changes if() BUG(); constructs to BUG_ON() which is cleaner and can better optimized away Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-03-26 18:22:50 +02:00
Eric Sesterhenn	c163c7293e	BUG_ON() Conversion in md/dm-path-selector.c this changes if() BUG(); constructs to BUG_ON() which is cleaner and can better optimized away Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-03-26 18:21:58 +02:00
Linus Torvalds	1e8c573933	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (21 commits) BUG_ON() Conversion in drivers/video/ BUG_ON() Conversion in drivers/parisc/ BUG_ON() Conversion in drivers/block/ BUG_ON() Conversion in sound/sparc/cs4231.c BUG_ON() Conversion in drivers/s390/block/dasd.c BUG_ON() Conversion in lib/swiotlb.c BUG_ON() Conversion in kernel/cpu.c BUG_ON() Conversion in ipc/msg.c BUG_ON() Conversion in block/elevator.c BUG_ON() Conversion in fs/coda/ BUG_ON() Conversion in fs/binfmt_elf_fdpic.c BUG_ON() Conversion in input/serio/hil_mlc.c BUG_ON() Conversion in md/dm-hw-handler.c BUG_ON() Conversion in md/bitmap.c The comment describing how MS_ASYNC works in msync.c is confusing rcu: undeclared variable used in documentation fix typos "wich" -> "which" typo patch for fs/ufs/super.c Fix simple typos tabify drivers/char/Makefile ...	2006-03-25 08:41:09 -08:00
Adrian Bunk	7e31765550	[PATCH] md/bitmap.c:bitmap_mask_state(): fix inconsequent NULL checking We dereference bitmap both one line above and one line below this check rendering this check quite useless. Spotted by the Coverity checker. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-25 08:22:57 -08:00
Eric Sesterhenn	4401d13899	BUG_ON() Conversion in md/dm-hw-handler.c this changes if() BUG(); constructs to BUG_ON() which is cleaner and can better optimized away Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-03-24 18:36:27 +01:00
Eric Sesterhenn	5daf2cf19a	BUG_ON() Conversion in md/bitmap.c this changes if() BUG(); constructs to BUG_ON() which is cleaner and can better optimized away Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-03-24 18:35:26 +01:00
Jens Axboe	2056a782f8	[PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23 Signed-off-by: Jens Axboe <axboe@suse.de>	2006-03-23 20:00:26 +01:00
Alasdair G Kergon	d2044a94e8	[PATCH] dm: bio split bvec fix The code that handles bios that span table target boundaries by breaking them up into smaller bios will not split an individual struct bio_vec into more than two pieces. Sometimes more than that are required. This patch adds a loop to break the second piece up into as many pieces as are necessary. Cc: "Abhishek Gupta" <abhishekgupt@gmail.com> Cc: Dan Smith <danms@us.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-22 07:53:55 -08:00
Al Viro	1312f40e11	[PATCH] regularize blk_cleanup_queue() use Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-03-18 18:34:20 -05:00
Kevin Corry	8ba32fde2c	[PATCH] dm stripe: Fix bounds The dm-stripe target currently does not enforce that the size of a stripe device be a multiple of the chunk-size. Under certain conditions, this can lead to I/O requests going off the end of an underlying device. This test-case shows one example. echo "0 100 linear /dev/hdb1 0" \| dmsetup create linear0 echo "0 100 linear /dev/hdb1 100" \| dmsetup create linear1 echo "0 200 striped 2 32 /dev/mapper/linear0 0 /dev/mapper/linear1 0" \| \ dmsetup create stripe0 dd if=/dev/zero of=/dev/mapper/stripe0 bs=1k This will produce the output: dd: writing '/dev/mapper/stripe0': Input/output error 97+0 records in 96+0 records out And in the kernel log will be: attempt to access beyond end of device dm-0: rw=0, want=104, limit=100 The patch will check that the table size is a multiple of the stripe chunk-size when the table is created, which will prevent the above striped device from being created. This should not affect tools like LVM or EVMS, since in all the cases I can think of, striped devices are always created with the sizes being a multiple of the chunk-size. The size of a stripe device must be a multiple of its chunk-size. (akpm: that typecast is quite gratuitous) Signed-off-by: Kevin Corry <kevcorry@us.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-17 07:51:25 -08:00
NeilBrown	04b857f74c	[PATCH] md: Fix several raid1 bugs which cause a memory leak - wrong test for 'is this a BARRIER bio' - not freeing on all possible paths. - using r1_bio after freeing it. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-09 19:47:37 -08:00
Jun'ichi Nomura	63d94e482d	[PATCH] dm: free minor after unlink gendisk Minor number should be freed after del_gendisk(). Otherwise, there could be a window where 2 registered gendisk has same minor number. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-02-24 14:31:39 -08:00
Jun'ichi Nomura	d9dde59ba0	[PATCH] dm: missing bdput/thaw_bdev at removal Need to unfreeze and release bdev otherwise the bdev inode with inconsistent state is reused later and cause problem. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Alasdair G Kergon <agk@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-02-24 14:31:39 -08:00
NeilBrown	8ed75463b9	[PATCH] md: Make sure rdev->size gets set for version-1 superblocks Sometimes it doesn't so make the code more like the version-0 code which works. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-02-03 08:32:00 -08:00
NeilBrown	29fc7e3e70	[PATCH] md: Assorted little md fixes - version-1 superblock + The default_bitmap_offset is in sectors, not bytes. + the 'size' field in the superblock is in sectors, not KB - raid0_run should return a negative number on error, not '1' - raid10_read_balance should not return a valid 'disk' number if ->rdev turned out to be NULL - kmem_cache_destroy doesn't like being passed a NULL. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-02-03 08:32:00 -08:00
NeilBrown	284ae7cab0	[PATCH] md: Handle overflow of mdu_array_info_t->size better mdu_array_info_t->size is 'int', which isn't big enough for the size (in KB of each component in) some arrays. So rather than a random overflow, set size to -1 when it cannot be set correctly. To update aspect on an array, userspace will sometimes: get_array_info change one field set_array_info in this case, we don't want the '-1' in 'size' to change to size, or look like a size change at all. So test for that in update_array_info. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-02-03 08:31:59 -08:00

1 2 3 4 5 ...

498 Commits