linux-sg2042

Commit Graph

Author	SHA1	Message	Date
K.Tanaka	a07e6ab41b	md: the md RAID10 resync thread could cause a md RAID10 array deadlock This message describes another issue about md RAID10 found by testing the 2.6.24 md RAID10 using new scsi fault injection framework. Abstract: When a scsi error results in disabling a disk during RAID10 recovery, the resync threads of md RAID10 could stall. This case, the raid array has already been broken and it may not matter. But I think stall is not preferable. If it occurs, even shutdown or reboot will fail because of resource busy. The deadlock mechanism: The r10bio_s structure has a "remaining" member to keep track of BIOs yet to be handled when recovering. The "remaining" counter is incremented when building a BIO in sync_request() and is decremented when finish a BIO in end_sync_write(). If building a BIO fails for some reasons in sync_request(), the "remaining" should be decremented if it has already been incremented. I found a case where this decrement is forgotten. This causes a md_do_sync() deadlock because md_do_sync() waits for md_done_sync() called by end_sync_write(), but end_sync_write() never calls md_done_sync() because of the "remaining" counter mismatch. For example, this problem would be reproduced in the following case: Personalities : [raid10] md0 : active raid10 sdf1[4] sde1[5](F) sdd1[2] sdc1[1] sdb1[6](F) 3919616 blocks 64K chunks 2 near-copies [4/2] [_UU_] [>....................] recovery = 2.2% (45376/1959808) finish=0.7min speed=45376K/sec This case, sdf1 is recovering, sdb1 and sde1 are disabled. An additional error with detaching sdd will cause a deadlock. md0 : active raid10 sdf1[4] sde1[5](F) sdd1[6](F) sdc1[1] sdb1[7](F) 3919616 blocks 64K chunks 2 near-copies [4/1] [_U__] [=>...................] recovery = 5.0% (99520/1959808) finish=5.9min speed=5237K/sec 2739 ? S< 0:17 [md0_raid10] 28608 ? D< 0:00 [md0_resync] 28629 pts/1 Ss 0:00 bash 28830 pts/1 R+ 0:00 ps ax 31819 ? D< 0:00 [kjournald] The resync thread keeps working, but actually it is deadlocked. Patch: By this patch, the remaining counter will be decremented if needed. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:18 -08:00
NeilBrown	1c830532f6	md: fix possible raid1/raid10 deadlock on read error during resync Thanks to K.Tanaka and the scsi fault injection framework, here is a fix for another possible deadlock in raid1/raid10 error handing. If a read request returns an error while a resync is happening and a resync request is pending, the attempt to fix the error will block until the resync progresses, and the resync will block until the read request completes. Thus a deadlock. This patch fixes the problem. Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:18 -08:00
Keld Simonsen	8ed3a19563	md: don't attempt read-balancing for raid10 'far' layouts This patch changes the disk to be read for layout "far > 1" to always be the disk with the lowest block address. Thus the chunks to be read will always be (for a fully functioning array) from the first band of stripes, and the raid will then work as a raid0 consisting of the first band of stripes. Some advantages: The fastest part which is the outer sectors of the disks involved will be used. The outer blocks of a disk may be as much as 100 % faster than the inner blocks. Average seek time will be smaller, as seeks will always be confined to the first part of the disks. Mixed disks with different performance characteristics will work better, as they will work as raid0, the sequential read rate will be number of disks involved times the IO rate of the slowest disk. If a disk is malfunctioning, the first disk which is working, and has the lowest block address for the logical block will be used. Signed-off-by: Keld Simonsen <keld@dkuug.dk> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:18 -08:00
NeilBrown	27c529bb8e	md: lock access to rdev attributes properly When we access attributes of an rdev (component device on an md array) through sysfs, we really need to lock the array against concurrent changes. We currently do that when we change an attribute, but not when we read an attribute. We need to lock when reading as well else rdev->mddev could become NULL while we are accessing it. So add appropriate locking (mddev_lock) to rdev_attr_show. rdev_size_store requires some extra care as well as it needs to unlock the mddev while scanning other mddevs for overlapping regions. We currently assume that rdev->mddev will still be unchanged after the scan, but that cannot be certain. So take a copy of rdev->mddev for use at the end of the function. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:18 -08:00
NeilBrown	2515619823	md: make sure a reshape is started when device switches to read-write A resync/reshape/recovery thread will refuse to progress when the array is marked read-only. So whenever it mark it not read-only, it is important to wake up thread resync thread. There is one place we didn't do this. The problem manifests if the start_ro module parameters is set, and a raid5 array that is in the middle of a reshape (restripe) is started. The array will initially be semi-read-only (meaning it acts like it is readonly until the first write). So the reshape will not proceed. On the first write, the array will become read-write, but the reshape will not be started, and there is no event which will ever restart that thread. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:18 -08:00
NeilBrown	d0fae18f1b	md: clean up irregularity with raid autodetect When a raid1 array is stopped, all components currently get added to the list for auto-detection. However we should really only add components that were found by autodetection in the first place. So add a flag to record that information, and use it. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:18 -08:00
NeilBrown	a1801f858e	md: guard against possible bad array geometry in v1 metadata Make sure the data doesn't start before the end of the superblock when the superblock is at the start of the device. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:17 -08:00
NeilBrown	8311c29d40	md: reduce CPU wastage on idle md array with a write-intent bitmap On an md array with a write-intent bitmap, a thread wakes up every few seconds and scans the bitmap looking for work to do. If the array is idle, there will be no work to do, but a lot of scanning is done to discover this. So cache the fact that the bitmap is completely clean, and avoid scanning the whole bitmap when the cache is known to be clean. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:17 -08:00
NeilBrown	a35e63efa1	md: fix deadlock in md/raid1 and md/raid10 when handling a read error When handling a read error, we freeze the array to stop any other IO while attempting to over-write with correct data. This is done in the raid1d(raid10d) thread and must wait for all submitted IO to complete (except for requests that failed and are sitting in the retry queue - these are counted in ->nr_queue and will stay there during a freeze). However write requests need attention from raid1d as bitmap updates might be required. This can cause a deadlock as raid1 is waiting for requests to finish that themselves need attention from raid1d. So we create a new function 'flush_pending_writes' to give that attention, and call it in freeze_array to be sure that we aren't waiting on raid1d. Thanks to "K.Tanaka" <k-tanaka@ce.jp.nec.com> for finding and reporting this problem. Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:17 -08:00
Adrian Bunk	e03f1a8422	dm-raid1.c: fix NULL dereferences This patch fixes two NULL dereferences introduced by commit `06386bbfd2` and spotted by the Coverity checker. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-19 15:52:27 -08:00
Jan Blunck	cf28b4863f	d_path: Make d_path() use a struct path d_path() is used on a <dentry,vfsmount> pair. Lets use a struct path to reflect this. [akpm@linux-foundation.org: fix build in mm/memory.c] Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Bryan Wu <bryan.wu@analog.com> Acked-by: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:17:09 -08:00
Jan Blunck	c32c2f63a9	d_path: Make seq_path() use a struct path argument seq_path() is always called with a dentry and a vfsmount from a struct path. Make seq_path() take it directly as an argument. Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:17:08 -08:00
Jan Blunck	1d957f9bf8	Introduce path_put() * Add path_put() functions for releasing a reference to the dentry and vfsmount of a struct path in the right order * Switch from path_release(nd) to path_put(&nd->path) * Rename dput_path() to path_put_conditional() [akpm@linux-foundation.org: fix cifs] Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: <linux-fsdevel@vger.kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Jan Blunck	4ac9137858	Embed a struct path into struct nameidata instead of nd->{dentry,mnt} This is the central patch of a cleanup series. In most cases there is no good reason why someone would want to use a dentry for itself. This series reflects that fact and embeds a struct path into nameidata. Together with the other patches of this series - it enforced the correct order of getting/releasing the reference count on <dentry,vfsmount> pairs - it prepares the VFS for stacking support since it is essential to have a struct path in every place where the stack can be traversed - it reduces the overall code size: without patch series: text data bss dec hex filename 5321639 858418 715768 6895825 6938d1 vmlinux with patch series: text data bss dec hex filename 5320026 858418 715768 `6894212` 693284 vmlinux This patch: Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix cifs] [akpm@linux-foundation.org: fix smack] Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-14 21:13:33 -08:00
Al Viro	39ed7adb17	dm-raid1 breakage on 64bit test_and_set_bit() on address of uint32_t is a Bad Idea(tm)... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-13 08:16:34 -08:00
Jonathan Brassow	af195ac82e	dm raid1: report fault status This patch adds extra information to the mirror status output, so that it can be determined which device(s) have failed. For each mirror device, a character is printed indicating the most severe error encountered. The characters are: * A => Alive - No failures * D => Dead - A write failure occurred leaving mirror out-of-sync * S => Sync - A sychronization failure occurred, mirror out-of-sync * R => Read - A read failure occurred, mirror data unaffected This allows userspace to properly reconfigure the mirror set. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:11:39 +00:00
Jonathan Brassow	06386bbfd2	dm raid1: handle read failures This patch gives the ability to respond-to/record device failures that happen during read operations. It also adds the ability to read from mirror devices that are not the primary if they are in-sync. There are essentially two read paths in mirroring; the direct path and the queued path. When a read request is mapped, if the region is 'in-sync' the direct path is taken; otherwise the queued path is taken. If the direct path is taken, we must record bio information so that if the read fails we can retry it. We then discover the status of a direct read through mirror_end_io. If the read has failed, we will mark the device from which the read was attempted as failed (so we don't try to read from it again), restore the bio and try again. If the queued path is taken, we discover the results of the read from 'read_callback'. If the device failed, we will mark the device as failed and attempt the read again if there is another device where this region is known to be 'in-sync'. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:11:37 +00:00
Jonathan Brassow	b80aa7a0c2	dm raid1: fix EIO after log failure This patch adds the ability to requeue write I/O to core device-mapper when there is a log device failure. If a write to the log produces and error, the pending writes are put on the "failures" list. Since the log is marked as failed, they will stay on the failures list until a suspend happens. Suspends come in two phases, presuspend and postsuspend. We must make sure that all the writes on the failures list are requeued in the presuspend phase (a requirement of dm core). This means that recovery must be complete (because writes may be delayed behind it) and the failures list must be requeued before we return from presuspend. The mechanisms to ensure recovery is complete (or stopped) was already in place, but needed to be moved from postsuspend to presuspend. We rely on 'flush_workqueue' to ensure that the mirror thread is complete and therefore, has requeued all writes in the failures list. Because we are using flush_workqueue, we must ensure that no additional 'queue_work' calls will produce additional I/O that we need to requeue (because once we return from presuspend, we are unable to do anything about it). 'queue_work' is called in response to the following functions: - complete_resync_work = NA, recovery is stopped - rh_dec (mirror_end_io) = NA, only calls 'queue_work' if it is ready to recover the region (recovery is stopped) or it needs to clear the region in the log* this doesn't get called while suspending - rh_recovery_end = NA, recovery is stopped - rh_recovery_start = NA, recovery is stopped - write_callback = 1) Writes w/o failures simply call bio_endio -> mirror_end_io -> rh_dec (see rh_dec above) 2) Writes with failures are put on the failures list and queue_work is called write_callbacks don't happen during suspend ** - do_failures = NA, 'queue_work' not called if suspending - add_mirror (initialization) = NA, only done on mirror creation - queue_bio = NA, 1) delayed I/O scheduled before flush_workqueue is called. 2) No more I/Os are being issued. 3) Re-attempted READs can still be handled. (Write completions are handled through rh_dec/ write_callback - mention above - and do not use queue_bio.) Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:11:35 +00:00
Jonathan Brassow	8f0205b798	dm raid1: handle recovery failures This patch adds the calls to 'fail_mirror' if an error occurs during mirror recovery (aka resynchronization). 'fail_mirror' is responsible for recording the type of error by mirror device and ensuring an event gets raised for the purpose of notifying userspace. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:11:32 +00:00
Jonathan Brassow	72f4b31410	dm raid1: handle write failures This patch gives mirror the ability to handle device failures during normal write operations. The 'write_callback' function is called when a write completes. If all the writes failed or succeeded, we report failure or success respectively. If some of the writes failed, we call fail_mirror; which increments the error count for the device, notes the type of error encountered (DM_RAID1_WRITE_ERROR), and selects a new primary (if necessary). Note that the primary device can never change while the mirror is not in-sync (IOW, while recovery is happening.) This means that the scenario where a failed write changes the primary and gives recovery_complete a chance to misread the primary never happens. The fact that the primary can change has necessitated the change to the default_mirror field. We need to protect against reading garbage while the primary changes. We then add the bio to a new list in the mirror set, 'failures'. For every bio in the 'failures' list, we call a new function, '__bio_mark_nosync', where we mark the region 'not-in-sync' in the log and properly set the region state as, RH_NOSYNC. Userspace must also be notified of the failure. This is done by 'raising an event' (dm_table_event()). If fail_mirror is called in process context the event can be raised right away. If in interrupt context, the event is deferred to the kmirrord thread - which raises the event if 'event_waiting' is set. Backwards compatibility is maintained by ignoring errors if the DM_FEATURES_HANDLE_ERRORS flag is not present. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:11:29 +00:00
Milan Broz	d74f81f8ad	dm snapshot: combine consecutive exceptions in memory Provided sector_t is 64 bits, reduce the in-memory footprint of the snapshot exception table by the simple method of using unused bits of the chunk number to combine consecutive entries. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:11:27 +00:00
Brian Wood	4f7f5c675f	dm: stripe enhanced status return This patch adds additional information to the status line. It is added at the end of the returned text so it will not interfere with existing implementations using this data. The addition of this information will allow for a common return interface to match that returned with the dm-raid1.c status line (with Jonathan Brassow's patches). Here is a sample of what is returned with a mirror "status" call: isw_eeaaabgfg_mirror: 0 488390920 mirror 2 8:16 8:32 3727/3727 1 AA 1 core Here's what's returned with this patch for a stripe "status" call: isw_dheeijjdej_stripe: 0 976783872 striped 2 8:16 8:32 1 AA Signed-off-by: Brian Wood <brian.j.wood@intel.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:11:24 +00:00
Brian Wood	a25eb9446a	dm: stripe trigger event on failure This patch adds the stripe_end_io function to process errors that might occur after an IO operation. As part of this there are a number of enhancements made to record and trigger events: - New atomic variable in struct stripe to record the number of errors each stripe volume device has experienced (could be used later with uevents to report back directly to userspace) - New workqueue/work struct setup to process the trigger_event function - New end_io function. It is here that testing for BIO error conditions take place. It determines the exact stripe that cause the error, records this in the new atomic variable, and calls the queue_work() function - New trigger_event function to process failure events. This calls dm_table_event() Signed-off-by: Brian Wood <brian.j.wood@intel.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:11:22 +00:00
Jonathan Brassow	fb8b284806	dm log: auto load modules If the log type is not recognised, attempt to load the module 'dm-log-<type>.ko'. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:11:19 +00:00
Milan Broz	304f3f6a58	dm: move deferred bio flushing to workqueue Add a single-thread workqueue for each mapped device and move flushing of the lists of pushback and deferred bios to this new workqueue. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:11:17 +00:00
Milan Broz	3a7f6c990a	dm crypt: use async crypto dm-crypt: Use crypto ablkcipher interface Move encrypt/decrypt core to async crypto call. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:11:14 +00:00
Milan Broz	95497a9600	dm crypt: prepare async callback fn dm-crypt: Use crypto ablkcipher interface Prepare callback function for async crypto operation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:11:12 +00:00
Milan Broz	43d6903482	dm crypt: add completion for async dm-crypt: Use crypto ablkcipher interface Prepare completion for async crypto request. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:11:09 +00:00
Milan Broz	ddd42edfd8	dm crypt: add async request mempool dm-crypt: Use crypto ablkcipher interface Introduce mempool for async crypto requests. cc->req is used mainly during synchronous operations (to prevent allocation and deallocation of the same object). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:11:07 +00:00
Milan Broz	01482b7671	dm crypt: extract scatterlist processing dm-crypt: Use crypto ablkcipher interface Move scatterlists to separate dm_crypt_struct and pick out block processing from crypt_convert. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:11:04 +00:00
Milan Broz	899c95d36c	dm crypt: tidy io ref counting Make io reference counting more obvious. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:11:02 +00:00
Milan Broz	84131db689	dm crypt: introduce crypt_write_io_loop Introduce crypt_write_io_loop(). Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:59 +00:00
Milan Broz	dec1cedf9d	dm crypt: abstract crypt_write_done Process write request in separate function and queue final bio through io workqueue. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:57 +00:00
Milan Broz	0c395b0f8d	dm crypt: store sector mapping in dm_crypt_io Add sector into dm_crypt_io instead of using local variable. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:54 +00:00
Alasdair G Kergon	395b167ca0	dm crypt: move queue functions Reorder kcryptd functions for clarity. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:52 +00:00
Milan Broz	4e4eef64e2	dm crypt: adjust io processing functions Rename functions to follow calling convention. Prepare write io error processing function skeleton. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:49 +00:00
Milan Broz	ee7a491e62	dm crypt: tidy crypt_endio Simplify crypt_endio function. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:46 +00:00
Milan Broz	5742fd7775	dm crypt: move error setting outside crypt_dec_pending Move error code setting outside of crypt_dec_pending function. Use -EIO if crypt_convert_scatterlist() fails. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:43 +00:00
Milan Broz	fcd369daa3	dm crypt: remove unnecessary crypt_context write parm Remove write attribute from convert_context and use bio flag instead. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:41 +00:00
Milan Broz	53017030e2	dm crypt: move convert_context inside dm_crypt_io Move convert_context inside dm_crypt_io. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:38 +00:00
Alasdair G Kergon	009cd09042	dm mpath: add missing static A static declaration missing. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:35 +00:00
Alasdair G Kergon	0149e57fed	dm: targets no longer experimental Drop the EXPERIMENTAL tag from well-established device-mapper targets, so the newer ones stand out better. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:32 +00:00
Milan Broz	46125c1c90	dm: refactor dm_suspend completion wait Move completion wait to separate function Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:30 +00:00
Milan Broz	94d6351e14	dm: split dm_suspend io_lock hold into two Change io_locking to allow processing flush in separate thread. Because we have DMF_BLOCK_IO already set, any possible new ios are queued in dm_requests now. In the case of interrupting previous wait there can be more ios queued (we unlocked io_lock for a while) but this is safe. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:27 +00:00
Milan Broz	73d410c013	dm: tidy dm_suspend Tidy dm_suspend function - change return value logic in dm_suspend - use atomic_read only once. - move DMF_BLOCK_IO clearing into one place Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:25 +00:00
Milan Broz	6d6f10df89	dm: refactor deferred bio_list processing Refactor deferred bio_list processing. - use separate _merge_pushback_list function - move deferred bio list pick up to flush function - use bio_list_pop instead of bio_list_get - simplify noflush flag use No real functional change in this patch. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:22 +00:00
Milan Broz	6ed7ade896	dm: tidy alloc_dev labels Tidy labels in alloc_dev to make later patches more clear. No functional change in this patch. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:19 +00:00
Andrew Morton	a26ffd4aa9	dm ioctl: use uninitialized_var drivers/md/dm-ioctl.c:1405: warning: 'param' may be used uninitialized in this function Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:16 +00:00
Andrew Morton	69a2ce72a4	dm: table use uninitialized_var drivers/md/dm-table.c: In function 'dm_get_device': drivers/md/dm-table.c:478: warning: 'dev' may be used uninitialized in this function Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:14 +00:00
Andrew Morton	e48b9db251	dm snapshot: use uninitialized_var drivers/md/dm-exception-store.c: In function 'persistent_read_metadata': drivers/md/dm-exception-store.c:452: warning: 'new_snapshot' may be used uninitialized in this function Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:11 +00:00
Daniel Walker	e61290a4a2	dm: convert suspend_lock semaphore to mutex Replace semaphore with mutex. Signed-off-by: Daniel Walker <dwalker@mvista.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:08 +00:00
Robert P. J. Day	8defd83084	dm snapshot: use rounddown_pow_of_two Since the source file already includes the log2.h header file, it seems pointless to re-invent the necessary routine. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:06 +00:00
Jun'ichi Nomura	82d601dc07	dm: table remove unused total "total = 0" does nothing. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:10:04 +00:00
Paul Jimenez	afb24528f9	dm: table use list_for_each This patch is some minor janitorish cleanup, using some macros from linux/list.h (already #included via dm.h) to improve readability. Signed-off-by: Paul Jimenez <pj@place.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:09:59 +00:00
Milan Broz	76c072b48e	dm ioctl: move compat code Move compat_ioctl handling into dm-ioctl.c. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:09:56 +00:00
Alasdair G Kergon	27238b2bea	dm ioctl: remove lock_kernel Remove lock_kernel() from the device-mapper ioctls - there should be sufficient internal locking already where required. Also remove some superfluous casts. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:09:53 +00:00
Alasdair G Kergon	b9249e5568	dm: mark function lists static Add a couple of statics. Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:09:51 +00:00
Milan Broz	7e5c1e830b	dm: add missing memory barrier to dm_suspend Add memory barrier to fix atomic_read of pending value. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-02-08 02:09:49 +00:00
NeilBrown	6ed3003c19	md: fix an occasional deadlock in raid5 raid5's 'make_request' function calls generic_make_request on underlying devices and if we run out of stripe heads, it could end up waiting for one of those requests to complete. This is bad as recursive calls to generic_make_request go on a queue and are not even attempted until make_request completes. So: don't make any generic_make_request calls in raid5 make_request until all waiting has been done. We do this by simply setting STRIPE_HANDLE instead of calling handle_stripe(). If we need more stripe_heads, raid5d will get called to process the pending stripe_heads which will call generic_make_request from a This change by itself causes a performance hit. So add a change so that raid5_activate_delayed is only called at unplug time, never in raid5. This seems to bring back the performance numbers. Calling it in raid5d was sometimes too soon... Neil said: How about we queue it for 2.6.25-rc1 and then about when -rc2 comes out, we queue it for 2.6.24.y? Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de> Tested-by: dean gaudet <dean@arctic.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:19 -08:00
NeilBrown	73c34431c7	md: change ITERATE_RDEV_GENERIC to rdev_for_each_list, and remove ITERATE_RDEV_PENDING. Finish ITERATE_ to for_each conversion. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:19 -08:00
NeilBrown	d089c6af10	md: change ITERATE_RDEV to rdev_for_each As this is more in line with common practice in the kernel. Also swap the args around to be more like list_for_each. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:19 -08:00
NeilBrown	29ac4aa3fc	md: change INTERATE_MDDEV to for_each_mddev As this is more consistent with kernel style. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:19 -08:00
NeilBrown	20a49ff679	md: change a few 'int' to 'size_t' in md As suggested by Andrew Morton. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:19 -08:00
NeilBrown	177a99b23e	md: fix use-after-free bug when dropping an rdev from an md array Due to possible deadlock issues we need to use a schedule work to kobject_del an 'rdev' object from a different thread. A recent change means that kobject_add no longer gets a refernce, and kobject_del doesn't put a reference. Consequently, we need to explicitly hold a reference to ensure that the last reference isn't dropped before the scheduled work get a chance to call kobject_del. Also, rename delayed_delete to md_delayed_delete to that it is more obvious in a stack trace which code is to blame. Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:19 -08:00
NeilBrown	a17184a911	md: allow an md array to appear with 0 drives if it has external metadata Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:19 -08:00
NeilBrown	ca38805945	md: lock address when changing attributes of component devices Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:18 -08:00
NeilBrown	c5d79adba7	md: allow devices to be shared between md arrays Currently, a given device is "claimed" by a particular array so that it cannot be used by other arrays. This is not ideal for DDF and other metadata schemes which have their own partitioning concept. So for externally managed metadata, just claim the device for md in general, require that "offset" and "size" are set properly for each device, and make sure that if a device is included in different arrays then the active sections do not overlap. This involves adding another flag to the rdev which makes it awkward to set "->flags = 0" to clear certain flags. So now clear flags explicitly by name when we want to clear things. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:18 -08:00
NeilBrown	1ec4a9398d	md: set and test the ->persistent flag for md devices more consistently If you try to start an array for which the number of raid disks is listed as zero, md will currently try to read metadata off any devices that have been given. This was done because the value of raid_disks is used to signal whether array details have been provided by userspace (raid_disks > 0) or must be read from the devices (raid_disks == 0). However for an array without persistent metadata (or with externally managed metadata) this is the wrong thing to do. So we add a test in do_md_run to give an error if raid_disks is zero for non-persistent arrays. This requires that mddev->persistent is set corrently at this point, which it currently isn't for in-kernel autodetected arrays. So set ->persistent for autodetect arrays, and remove the settign in super_*_validate which is now redundant. Also clear ->persistent when stopping an array so it is consistently zero when starting an array. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:18 -08:00
NeilBrown	c620727779	md: allow a maximum extent to be set for resyncing This allows userspace to control resync/reshape progress and synchronise it with other activities, such as shared access in a SAN, or backing up critical sections during a tricky reshape. Writing a number of sectors (which must be a multiple of the chunk size if such is meaningful) causes a resync to pause when it gets to that point. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:18 -08:00
NeilBrown	c303da6d71	md: give userspace control over removing failed devices when external metdata in use When a device fails, we must not allow an further writes to the array until the device failure has been recorded in array metadata. When metadata is managed externally, this requires some synchronisation... Allow/require userspace to explicitly remove failed devices from active service in the array by writing 'none' to the 'slot' attribute. If this reduces the number of failed devices to 0, the write block will automatically be lowered. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:18 -08:00
NeilBrown	e691063a61	md: support 'external' metadata for md arrays - Add a state flag 'external' to indicate that the metadata is managed externally (by user-space) so important changes need to be left of user-space to handle. Alternates are non-persistant ('none') where there is no stable metadata - after the array is stopped there is no record of it's status - and internal which can be version 0.90 or version 1.x These are selected by writing to the 'metadata' attribute. - move the updating of superblocks (sync_sbs) to after we have checked if there are any superblocks or not. - New array state 'write_pending'. This means that the metadata records the array as 'clean', but a write has been requested, so the metadata has to be updated to record a 'dirty' array before the write can continue. This change is reported to md by writing 'active' to the array_state attribute. - tidy up marking of sb_dirty: - don't set sb_dirty when resync finishes as md_check_recovery calls md_update_sb when the sync thread finishes anyway. - Don't set sb_dirty in multipath_run as the array might not be dirty. - don't mark superblock dirty when switching to 'clean' if there is no internal superblock (if external, userspace can choose to update the superblock whenever it chooses to). Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:18 -08:00
NeilBrown	b47490c9bc	md: Update md bitmap during resync. Currently an md array with a write-intent bitmap does not updated that bitmap to reflect successful partial resync. Rather the entire bitmap is updated when the resync completes. This is because there is no guarentee that resync requests will complete in order, and tracking each request individually is unnecessarily burdensome. However there is value in regularly updating the bitmap, so add code to periodically pause while all pending sync requests complete, then update the bitmap. Doing this only every few seconds (the same as the bitmap update time) does not notciably affect resync performance. [snitzer@gmail.com: export bitmap_cond_end_sync] Signed-off-by: Neil Brown <neilb@suse.de> Cc: "Mike Snitzer" <snitzer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:18 -08:00
H. Peter Anvin	66c811e993	md: raid6: clean up the style of raid6test/test.c Clean up the coding style in raid6test/test.c. Break it apart into subfunctions to make the code more readable. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:18 -08:00
H. Peter Anvin	98ec302be5	md: raid6: Fix mktable.c Make both mktables.c and its output CodingStyle compliant. Update the copyright notice. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:18 -08:00
Oliver Pinter	54212cf405	coding style cleanups for drivers/md/mktables.c Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com> Cc: Neil Brown <neilb@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-06 10:41:18 -08:00
Greg Kroah-Hartman	c10997f657	Kobject: convert drivers/* from kobject_unregister() to kobject_put() There is no need for kobject_unregister() anymore, thanks to Kay's kobject cleanup changes, so replace all instances of it with kobject_put(). Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:40 -08:00
Greg Kroah-Hartman	f9cb074bff	Kobject: rename kobject_init_ng() to kobject_init() Now that the old kobject_init() function is gone, rename kobject_init_ng() to kobject_init() to clean up the namespace. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:38 -08:00
Greg Kroah-Hartman	b2d6db5878	Kobject: rename kobject_add_ng() to kobject_add() Now that the old kobject_add() function is gone, rename kobject_add_ng() to kobject_add() to clean up the namespace. Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:38 -08:00
Greg Kroah-Hartman	649316b25b	Kobject: convert drivers/md/md.c to use kobject_init/add_ng() This converts the code to use the new kobject functions, cleaning up the logic in doing so. Cc: Neil Brown <neilb@suse.de> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:37 -08:00
Kay Sievers	edfaa7c365	Driver core: convert block from raw kobjects to core devices This moves the block devices to /sys/class/block. It will create a flat list of all block devices, with the disks and partitions in one directory. For compatibility /sys/block is created and contains symlinks to the disks. /sys/class/block \|-- sda -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda \|-- sda1 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda1 \|-- sda10 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda10 \|-- sda5 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda5 \|-- sda6 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda6 \|-- sda7 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda7 \|-- sda8 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda8 \|-- sda9 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda9 `-- sr0 -> ../../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0 /sys/block/ \|-- sda -> ../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda `-- sr0 -> ../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0 Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:36 -08:00
Greg Kroah-Hartman	3830c62fef	Kobject: change drivers/md/md.c to use kobject_init_and_add Stop using kobject_register, as this way we can control the sending of the uevent properly, after everything is properly initialized. Cc: Neil Brown <neilb@suse.de> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-01-24 20:40:29 -08:00
Dan Williams	0f94e87cde	md: fix data corruption when a degraded raid5 array is reshaped We currently do not wait for the block from the missing device to be computed from parity before copying data to the new stripe layout. The change in the raid6 code is not techincally needed as we don't delay data block recovery in the same way for raid6 yet. But making the change now is safer long-term. This bug exists in 2.6.23 and 2.6.24-rc Cc: <stable@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-01-08 16:10:35 -08:00
Milan Broz	91e1062592	dm crypt: use bio_add_page Fix possible max_phys_segments violation in cloned dm-crypt bio. In write operation dm-crypt needs to allocate new bio request and run crypto operation on this clone. Cloned request has always the same size, but number of physical segments can be increased and violate max_phys_segments restriction. This can lead to data corruption and serious hardware malfunction. This was observed when using XFS over dm-crypt and at least two HBA controller drivers (arcmsr, cciss) recently. Fix it by using bio_add_page() call (which tests for other restrictions too) instead of constructing own biovec. All versions of dm-crypt are affected by this bug. Cc: stable@kernel.org Cc: dm-crypt@saout.de Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2007-12-20 17:32:13 +00:00
Neil Brown	91212507f9	dm: merge max_hw_sector Make sure dm honours max_hw_sectors of underlying devices We still have no firm testing evidence in support of this patch but believe it may help to resolve some bug reports. - agk Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2007-12-20 17:32:12 +00:00
Alasdair G Kergon	69267a30be	dm: trigger change uevent on rename Insert a missing KOBJ_CHANGE notification when a device is renamed. Cc: Scott James Remnant <scott@ubuntu.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2007-12-20 17:32:11 +00:00
Milan Broz	adfe47702c	dm crypt: fix write endio Fix BIO_UPTODATE test for write io. Cc: stable@kernel.org Cc: dm-crypt@saout.de Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2007-12-20 17:32:10 +00:00
Paul Mundt	d1622e8909	dm mpath: hp requires scsi With CONFIG_SCSI=n __scsi_print_sense() is never linked in. drivers/built-in.o: In function `hp_sw_end_io': dm-mpath-hp-sw.c:(.text+0x914f8): undefined reference to `__scsi_print_sense' Caught with a randconfig on current git. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2007-12-20 17:32:09 +00:00
Jun'ichi Nomura	512875bd96	dm: table detect io beyond device This patch fixes a panic on shrinking a DM device if there is outstanding I/O to the part of the device that is being removed. (Normally this doesn't happen - a filesystem would be resized first, for example.) The bug is that __clone_and_map() assumes dm_table_find_target() always returns a valid pointer. It may fail if a bio arrives from the block layer but its target sector is no longer included in the DM btree. This patch appends an empty entry to table->targets[] which will be returned by a lookup beyond the end of the device. After calling dm_table_find_target(), __clone_and_map() and target_message() check for this condition using dm_target_is_valid(). Sample test script to trigger oops:	2007-12-20 17:32:08 +00:00
Dan Williams	6c55be8b96	raid5: fix unending write sequence <debug output from Joel's system> handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 check 5: state 0x6 toread 0000000000000000 read 0000000000000000 write fffff800ffcffcc0 written 0000000000000000 check 4: state 0x6 toread 0000000000000000 read 0000000000000000 write fffff800fdd4e360 written 0000000000000000 check 3: state 0x1 toread 0000000000000000 read 0000000000000000 write 0000000000000000 written 0000000000000000 check 2: state 0x1 toread 0000000000000000 read 0000000000000000 write 0000000000000000 written 0000000000000000 check 1: state 0x6 toread 0000000000000000 read 0000000000000000 write fffff800ff517e40 written 0000000000000000 check 0: state 0x6 toread 0000000000000000 read 0000000000000000 write fffff800fd4cae60 written 0000000000000000 locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0 for sector 7629696, rmw=0 rcw=0 </debug> These blocks were prepared to be written out, but were never handled in ops_run_biodrain(), so they remain locked forever. The operations flags are all clear which means handle_stripe() thinks nothing else needs to be done. This state suggests that the STRIPE_OP_PREXOR bit was sampled 'set' when it should not have been. This patch cleans up cases where the code looks at sh->ops.pending when it should be looking at the consistent stack-based snapshot of the operations flags. Report from Joel: Resync done. Patch fix this bug. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Joel Bertrand <joel.bertrand@systella.fr> Cc: <stable@kernel.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-14 18:45:39 -08:00
Alan D. Brunelle	2ad8b1ef11	Add UNPLUG traces to all appropriate places Added blk_unplug interface, allowing all invocations of unplugs to result in a generated blktrace UNPLUG. Signed-off-by: Alan D. Brunelle <Alan.Brunelle@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-11-09 13:41:32 +01:00
Neil Brown	def6ae26a9	md: fix misapplied patch in raid5.c commit `4ae3f847e4` ("md: raid5: fix clearing of biofill operations") did not get applied correctly, presumably due to substantial similarities between handle_stripe5 and handle_stripe6. This patch moves the chunk of new code from handle_stripe6 (where it isn't needed (yet)) to handle_stripe5. Signed-off-by: Neil Brown <neilb@suse.de> Cc: "Dan Williams" <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-11-05 15:12:32 -08:00
Vasily Averin	5ec140e600	dm: bounce_pfn limit added Device mapper uses its own bounce_pfn that may differ from one on underlying device. In that way dm can build incorrect requests that contain sg elements greater than underlying device is able to handle. This is the cause of slab corruption in i2o layer, occurred on i386 arch when very long direct IO requests are addressed to dm-over-i2o device. Signed-off-by: Vasily Averin <vvs@sw.ru> Cc: <stable@kernel.org> Cc: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-11-02 08:47:25 +01:00
Al Viro	ca5cd877ae	x86 merge fallout: uml Don't undef __i386__/__x86_64__ in uml anymore, make sure that (few) places that required adjusting the ifdefs got those. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-29 07:41:32 -07:00
Herbert Xu	68e3f5dd4d	[CRYPTO] users: Fix up scatterlist conversion errors This patch fixes the errors made in the users of the crypto layer during the sg_init_table conversion. It also adds a few conversions that were missing altogether. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-27 00:52:07 -07:00
Jens Axboe	642f149031	SG: Change sg_set_page() to take length and offset argument Most drivers need to set length and offset as well, so may as well fold those three lines into one. Add sg_assign_page() for those two locations that only needed to set the page, where the offset/length is set outside of the function context. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-24 11:20:47 +02:00
Dan Williams	4ae3f847e4	md: raid5: fix clearing of biofill operations ops_complete_biofill() runs outside of spin_lock(&sh->lock) and clears the 'pending' and 'ack' bits. Since the test_and_ack_op() macro only checks against 'complete' it can get an inconsistent snapshot of pending work. Move the clearing of these bits to handle_stripe5(), under the lock. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Joel Bertrand <joel.bertrand@systella.fr> Signed-off-by: Neil Brown <neilb@suse.de> Cc: Stable <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-23 08:32:06 -07:00
NeilBrown	85bfb4da8c	md: fix an unsigned compare to allow creation of bitmaps with v1.0 metadata As page->index is unsigned, this all becomes an unsigned comparison, which almost always returns an error. Signed-off-by: Neil Brown <neilb@suse.de> Cc: Stable <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-23 08:32:06 -07:00
Jens Axboe	45711f1af6	[SG] Update drivers to use sg helpers Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-22 21:19:53 +02:00
Linus Torvalds	c00046c279	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (74 commits) fix do_sys_open() prototype sysfs: trivial: fix sysfs_create_file kerneldoc spelling mistake Documentation: Fix typo in SubmitChecklist. Typo: depricated -> deprecated Add missing profile=kvm option to Documentation/kernel-parameters.txt fix typo about TBI in e1000 comment proc.txt: Add /proc/stat field small documentation fixes Fix compiler warning in smount example program from sharedsubtree.txt docs/sysfs: add missing word to sysfs attribute explanation documentation/ext3: grammar fixes Documentation/java.txt: typo and grammar fixes Documentation/filesystems/vfs.txt: typo fix include/asm-*/system.h: remove unused set_rmb(), set_wmb() macros trivial copy_data_pages() tidy up Fix typo in arch/x86/kernel/tsc_32.c file link fix for Pegasus USB net driver help remove unused return within void return function Typo fixes retrun -> return x86 hpet.h: remove broken links ...	2007-10-19 20:36:17 -07:00
Milan Broz	80fd662683	dm crypt: tidy pending Add crypt prefix to dec_pending to avoid confusing it in backtraces with the dm core function of the same name. No functional change here. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2007-10-20 02:01:28 +01:00

1 2 3 4 5 ...

832 Commits