OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Goldwyn Rodrigues	965400eb61	Send RESYNCING while performing resync start/stop When a resync is initiated, RESYNCING message is sent to all active nodes with the range (lo,hi). When the resync is over, a RESYNCING message is sent with (0,0). A high sector value of zero indicates that the resync is over. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>	2015-02-23 09:59:06 -06:00
Goldwyn Rodrigues	1d7e3e9611	Reload superblock if METADATA_UPDATED is received Re-reads the devices by invalidating the cache. Since we don't write to faulty devices, this is detected using events recorded in the devices. If it is old as compared to the mddev mark it is faulty. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>	2015-02-23 09:59:06 -06:00
Goldwyn Rodrigues	293467aa1f	metadata_update sends message to other nodes - request to send a message - make changes to superblock - send messages telling everyone that the superblock has changed - other nodes all read the superblock - other nodes all ack the messages - updating node release the "I'm sending a message" resource. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>	2015-02-23 09:59:06 -06:00
Goldwyn Rodrigues	f9209a3235	bitmap_create returns bitmap pointer This is done to have multiple bitmaps open at the same time. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>	2015-02-23 09:57:57 -06:00
Goldwyn Rodrigues	96ae923ab6	Gather on-going resync information of other nodes When a node joins, it does not know of other nodes performing resync. So, each node keeps the resync information in it's LVB. When a new node joins, it reads the LVB of each "online" bitmap. [TODO] The new node attempts to get the PW lock on other bitmap, if it is successful, it reads the bitmap and performs the resync (if required) on it's behalf. If the node does not get the PW, it requests CR and reads the LVB for the resync information. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>	2015-02-23 07:30:11 -06:00
Goldwyn Rodrigues	cf921cc19c	Add node recovery callbacks DLM offers callbacks when a node fails and the lock remastery is performed: 1. recover_prep: called when DLM discovers a node is down 2. recover_slot: called when DLM identifies the node and recovery can start 3. recover_done: called when all nodes have completed recover_slot recover_slot() and recover_done() are also called when the node joins initially in order to inform the node with its slot number. These slot numbers start from one, so we deduct one to make it start with zero which the cluster-md code uses. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>	2015-02-23 07:30:11 -06:00
Goldwyn Rodrigues	ca8895d9bb	Return MD_SB_CLUSTERED if mddev is clustered Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>	2015-02-23 07:28:43 -06:00
Goldwyn Rodrigues	c4ce867fda	Introduce md_cluster_info md_cluster_info stores the cluster information in the MD device. The join() is called when mddev detects it is a clustered device. The main responsibilities are: 1. Setup a DLM lockspace 2. Setup all initial locks such as super block locks and bitmap lock (will come later) The leave() clears up the lockspace and all the locks held. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>	2015-02-23 07:28:42 -06:00
Goldwyn Rodrigues	edb39c9ded	Introduce md_cluster_operations to handle cluster functions This allows dynamic registering of cluster hooks. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>	2015-02-23 07:28:42 -06:00
NeilBrown	6791875e2e	md: make reconfig_mutex optional for writes to md sysfs files. Rather than using mddev_lock() to take the reconfig_mutex when writing to any md sysfs file, we only take mddev_lock() in the particular _store() functions that require it. Admittedly this is most, but it isn't all. This also allows us to remove special-case handling for new_dev_store (in md_attr_store). Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-06 09:32:56 +11:00
NeilBrown	5c47daf6e7	md: move mddev_lock and related to md.h The one which is not inline (mddev_unlock) gets EXPORTed. This makes the locking available to personality modules so that it doesn't have to be imposed upon them. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-06 09:32:56 +11:00
NeilBrown	23da422b19	md: use mddev->lock to protect updates to resync_{min,max}. There are interdependencies between these two sysfs attributes and whether a resync is currently running. Rather than depending on reconfig_mutex to ensure no races when testing these interdependencies are met, use the spinlock. This will allow the mutex to be remove from protecting this code in a subsequent patch. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-06 09:32:56 +11:00
NeilBrown	1b30e66f5a	md: minor cleanup in safe_delay_store. There isn't really much room for races with ->safemode_delay. But as I am trying to clean up any racy code and will soon be removing reconfig_mutex protection from most _store() functions: - only set mddev->safemode_delay once, to ensure no code can see an intermediate value - use safemode_timer to call md_safemode_timeout() rather than calling it directly, to ensure it never races with itself. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-06 09:32:56 +11:00
NeilBrown	4af1a04176	md: move GET_BITMAP_FILE ioctl out from mddev_lock. It makes more sense to report bitmap_info->file, rather than bitmap->file (the later is only available once the array is active). With that change, use mddev->lock to protect bitmap_info being set to NULL, and we can call get_bitmap_file() without taking the mutex. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-06 09:32:56 +11:00
NeilBrown	1e594bb24d	md: tidy up set_bitmap_file 1/ delay setting mddev->bitmap_info.file until 'f' looks usable, so we don't have to unset it. 2/ Don't allow bitmap file to be set if bitmap_info.file is already set. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-06 09:32:56 +11:00
NeilBrown	f4ad3d38d4	md: remove unnecessary 'buf' from get_bitmap_file. 'buf' is only used because d_path fills from the end of the buffer instead of from the start. We don't need a separate buf to handle that, we just need to use memmove() to move the string to the start. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-06 09:32:56 +11:00
NeilBrown	758bfc8abf	md: remove mddev_lock from rdev_attr_show() No rdev attributes need locking for 'show', though state_show() might benefit from ensuring it sees a consistent set of flags. None even use rdev->mddev, so testing for it isn't really needed and it certainly doesn't need to be held constant. So improve state_show() and remove the locking. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-06 09:32:56 +11:00
NeilBrown	b7b17c9b67	md: remove mddev_lock() from md_attr_show() Most attributes can be read safely without any locking. A race might lead to a slightly out-dated value, but nothing wrong. We already have locking in some places where needed. All that remains is can_clear_show(), behind_writes_used_show() and action_show() which are easily fixed. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-06 09:32:55 +11:00
NeilBrown	f97fcad38f	md: remove need for mddev_lock() in md_seq_show() The only access in md_seq_show that could suffer from races not protected by ->lock is walking the rdev list. This can receive sufficient protection from 'rcu'. So use rdev_for_each_rcu() and get rid of mddev_lock(). Now reading /proc/mdstat will never block in md_seq_show. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-06 09:32:55 +11:00
NeilBrown	36d091f475	md: protect ->pers changes with mddev->lock ->pers is already protected by ->reconfig_mutex, and cannot possibly change when there are threads running or outstanding IO. However there are some places where we access ->pers not in a thread or IO context, and where ->reconfig_mutex is unnecessarily heavy-weight: level_show and md_seq_show(). So protect all changes, and those accesses, with ->lock. This is a step toward taking those accesses out from under reconfig_mutex. [Fixed missing "mddev->pers" -> "pers" conversion, thanks to Dan Carpenter <dan.carpenter@oracle.com>] Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-04 08:35:53 +11:00
NeilBrown	db721d32b7	md: level_store: group all important changes into one place. Gather all the changes that can happen atomically and might be relevant to other code into one place. This will make it easier to refine the locking. Note that this puts quite a few things between mddev_detach() and ->free(). Enabling this was the point of some recent patches. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-04 08:35:53 +11:00
NeilBrown	afa0f557cb	md: rename ->stop to ->free Now that the ->stop function only frees the private data, rename is accordingly. Also pass in the private pointer as an arg rather than using mddev->private. This flexibility will be useful in level_store(). Finally, don't clear ->private. It doesn't make sense to clear it seeing that isn't what we free, and it is no longer necessary to clear ->private (it was some time ago before ->to_remove was introduced). Setting ->to_remove in ->free() is a bit of a wart, but not a big problem at the moment. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-04 08:35:52 +11:00
NeilBrown	5aa61f427e	md: split detach operation out from ->stop. Each md personality has a 'stop' operation which does two things: 1/ it finalizes some aspects of the array to ensure nothing is accessing the ->private data 2/ it frees the ->private data. All the steps in '1' can apply to all arrays and so can be performed in common code. This is useful as in the case where we change the personality which manages an array (in level_store()), it would be helpful to do step 1 early, and step 2 later. So split the 'step 1' functionality out into a new mddev_detach(). Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-04 08:35:52 +11:00
NeilBrown	64590f45dd	md: make merge_bvec_fn more robust in face of personality changes. There is no locking around calls to merge_bvec_fn(), so it is possible that calls which coincide with a level (or personality) change could go wrong. So create a central dispatch point for these functions and use rcu_read_lock(). If the array is suspended, reject any merge that can be rejected. If not, we know it is safe to call the function. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-04 08:35:52 +11:00
NeilBrown	5c675f83c6	md: make ->congested robust against personality changes. There is currently no locking around calls to the 'congested' bdi function. If called at an awkward time while an array is being converted from one level (or personality) to another, there is a tiny chance of running code in an unreferenced module etc. So add a 'congested' function to the md_personality operations structure, and call it with appropriate locking from a central 'mddev_congested'. When the array personality is changing the array will be 'suspended' so no IO is processed. If mddev_congested detects this, it simply reports that the array is congested, which is a safe guess. As mddev_suspend calls synchronize_rcu(), mddev_congested can avoid races by included the whole call inside an rcu_read_lock() region. This require that the congested functions for all subordinate devices can be run under rcu_lock. Fortunately this is the case. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-04 08:35:52 +11:00
NeilBrown	85572d7c75	md: rename mddev->write_lock to mddev->lock This lock is used for (slightly) more than helping with writing superblocks, and it will soon be extended further. So the name is inappropriate. Also, the _irq variant hasn't been needed since 2.6.37 as it is never taking from interrupt or bh context. So: -rename write_lock to lock -document what it protects -remove _irq ... except in md_flush_request() as there is no wait_event_lock() (with no _irq). This can be cleaned up after appropriate changes to wait.h. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-04 08:35:52 +11:00
Linus Torvalds	8fd9589ced	Three fixes for md -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIVAwUAVIzozznsnt1WYoG5AQKEOQ/9G31KYGhe3/Tn+Hhyy7o16+6fjOBSUN2z 6GCGtOuHhom0Q+y4cdeAohw+eW9bJPqc9tRyRrLnVCuAkeymPP1t7fIRLGAuOHq5 QjjX7ZwKK8urmPAFIxRFgJwrJctFEoaCQiZHk4Pcx9YcYm4paJP2liqqQ0khGPcl KGGjWgD3KMAG0moJzNmLMWBZevG3SLCTirPxOcDJuBkQ7KfP0e4UGVn/UYXdgF+f 73yForADVIE/Jq/CXnsOEZa/nPTM7DIsXZAQqqFCnkUiuCvwhgfkowxhwCExx2d6 w3+I69anD1WII1Z27bNDBgnJGR5xvdiAl6NTJ+GY1uMCo3lnV9OYpQrJxAEJP4dC nXILm6SVm7WLfLvVY1lJm3wZndIyWNmTcDxODGyMZhAY4RmjDHG+AZ9tcnY1pM+9 rt0jHbmaj/ZZ5jKbGxPKBlQ3/U7EP8d4oiruOaDWCGT4sN+e2dbZVLpaWoNaBFUx eyUft+DEvY82ryz2Ktn/Exsx1aWk2Cy3cMl0ELq2L/2WAcjhA6gsikNmKG7tfafn Bd6WIGzAygATaFs4hOtFANyXqWubWi8TgLUHXNYUkJ2JooaMuSIfWrpeMNJyHHU7 5bWTaSZqZ86Ygb6upvJqaA3olePMo1RPBsOev46TnuFT9vtfpzPRy6FlvbXfx8sK M5YCqNpTDTs= =Ar2y -----END PGP SIGNATURE----- Merge tag 'md/3.19' of git://neil.brown.name/md Pull md updates from Neil Brown: "Three fixes for md. I did have a largish set of locking changes queued, but late testing showed they weren't quite as stable as I thought and while I fixed what I found, I decided it safer to delay them a release ... particularly as I'll be AFK for a few weeks. So expect a larger batch next time :-)" * tag 'md/3.19' of git://neil.brown.name/md: md: Check MD_RECOVERY_RUNNING as well as ->sync_thread. md: fix semicolon.cocci warnings md/raid5: fetch_block must fetch all the blocks handle_stripe_dirtying wants.	2014-12-14 12:13:05 -08:00
NeilBrown	f851b60db0	md: Check MD_RECOVERY_RUNNING as well as ->sync_thread. A recent change to md started the ->sync_thread from a asynchronously from a work_queue rather than synchronously. This means that there can be a small window between the time when MD_RECOVERY_RUNNING is set and when ->sync_thread is set. So code that checks ->sync_thread might now conclude that the thread has not been started and (because a lock is held) will not be started. That is no longer the case. Most of those places are best fixed by testing MD_RECOVERY_RUNNING as well. To make this completely reliable, we wake_up(&resync_wait) after clearing that flag as well as after clearing ->sync_thread. Other places are better served by flushing the relevant workqueue to ensure that that if the sync thread was starting, it has now started. This is particularly best if we are about to stop the sync thread. Fixes: `ac05f25669` Signed-off-by: NeilBrown <neilb@suse.de>	2014-12-11 10:02:10 +11:00
kbuild test robot	7d7e64f2ec	md: fix semicolon.cocci warnings drivers/md/md.c:7175:43-44: Unneeded semicolon Removes unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2014-12-03 16:07:59 +11:00
Gu Zheng	18c0b223cf	md: use generic io stats accounting functions to simplify io stat accounting Use generic io stats accounting help functions (generic_{start,end}_io_acct) to simplify io stat accounting. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-24 08:05:16 -07:00
NeilBrown	45eaf45dfa	md: Always set RECOVERY_NEEDED when clearing RECOVERY_FROZEN md_check_recovery will skip any recovery and also clear MD_RECOVERY_NEEDED if MD_RECOVERY_FROZEN is set. So when we clear _FROZEN, we must set _NEEDED and ensure that md_check_recovery gets run. Otherwise we could miss out on something that is needed. In particular, this can make it impossible to remove a failed device from an array is the 'recovery-needed' processing didn't happen. Suitable for stable kernels since 3.13. Cc: stable@vger.kernel.org (3.13+) Reported-and-tested-by: Joe Lawrence <joe.lawrence@stratus.com> Fixes: `30b8feb730` Signed-off-by: NeilBrown <neilb@suse.de>	2014-11-17 09:17:46 +11:00
NeilBrown	6c144d3164	md: move EXPORT_SYMBOL to after function in md.c Signed-off-by: NeilBrown <neilb@suse.de>	2014-10-14 13:08:29 +11:00
NeilBrown	2cbbca5e7c	md: discard PRINT_RAID_DEBUG ioctl All the interesting information printed by this ioctl is provided in /proc/mdstat and/or sysfs. So it isn't needed and isn't used and would be best if it didn't exist. Signed-off-by: NeilBrown <neilb@suse.de>	2014-10-14 13:08:29 +11:00
NeilBrown	403df47888	md: remove MD_BUG() Most of the places that call this are doing so pointlessly. A couple of the others a best replaced with WARN_ON(). Signed-off-by: NeilBrown <neilb@suse.de>	2014-10-14 13:08:29 +11:00
NeilBrown	3adc28d85f	md: clean up 'exit' labels in md_ioctl(). There are 4 labels and we only really need two. Signed-off-by: NeilBrown <neilb@suse.de>	2014-10-14 13:08:29 +11:00
NeilBrown	326eb17d73	md: remove unnecessary test for MD_MAJOR in md_ioctl() unknown ioctls no longer get this deep into md_ioctl since md_ioctl_valid() was introduced in 3.14. So remove the test and the misleading comment. Signed-off-by: NeilBrown <neilb@suse.de>	2014-10-14 13:08:29 +11:00
NeilBrown	e1960f8c5c	md: don't allow "-sync" to be set for device in an active array. If an array is active, devices can be marked 'faulty', but simply removing the 'sync' flag is wrong. That only makes sense for an array which is not active (and is probably only useful for testing anyway). Signed-off-by: NeilBrown <neilb@suse.de>	2014-10-14 13:08:29 +11:00
NeilBrown	f72ffdd686	md: remove unwanted white space from md.c My editor shows much of this is RED. Signed-off-by: NeilBrown <neilb@suse.de>	2014-10-14 13:08:29 +11:00
NeilBrown	ac05f25669	md: don't start resync thread directly from md thread. The main 'md' thread is needed for processing writes, so if it blocks write requests could be delayed. Starting a new thread requires some GFP_KERNEL allocations and so can wait for writes to complete. This can deadlock. So instead, ask a workqueue to start the sync thread. There is no particular rush for this to happen, so any work queue will do. MD_RECOVERY_RUNNING is used to ensure only one thread is started. Reported-by: BillStuff <billstuff2001@sbcglobal.net> Signed-off-by: NeilBrown <neilb@suse.de>	2014-10-14 13:08:28 +11:00
NeilBrown	8b1afc3d67	md: Just use RCU when checking for overlap between arrays. We don't really need the full mddev_lock here, and having to drop it is messy. RCU is enough to protect these lists. Signed-off-by: NeilBrown <neilb@suse.de>	2014-10-14 13:08:28 +11:00
Chao Yu	50bd377405	md: avoid potential long delay under pers_lock printk may cause long time lapse if value of printk_delay in sysctl is configured large by user. If register_md_personality takes long time to print in spinlock pers_lock, we may encounter high CPU usage rate when there are other pers_lock competitors who may be blocked to spin. We can avoid this condition by moving printk out of coverage of pers_lock spinlock. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: NeilBrown <neilb@suse.de>	2014-10-14 13:08:28 +11:00
NeilBrown	0638bb0e73	md: simplify export_array() We don't really need that for_each loop, or those MD_BUGs. Signed-off-by: NeilBrown <neilb@suse.de>	2014-10-14 13:08:28 +11:00
NeilBrown	4878e9eb88	md: discard find_rdev_nr in favour of find_rdev_nr_rcu Having both is a waste - just use the one. Signed-off-by: NeilBrown <neilb@suse.de>	2014-10-14 13:08:28 +11:00
NeilBrown	1967cd5616	md: use wait_event() to simplify md_super_wait() md_super_wait is really just wait_event() open-coded. So use the macro instead. Signed-off-by: NeilBrown <neilb@suse.de>	2014-10-14 13:08:28 +11:00
NeilBrown	9ba3b7f5d0	md: be more relaxed about stopping an array which isn't started. In general we don't allow an array to be stopped if it is in use. However if the array hasn't really been started yet, then any apparent use is an anomily, probably due to 'udev' or similar having a look to see what is there. This means that if something goes wrong while assembling an array it cannot reliably be un-assembled - STOP_ARRAY could fail. There is no value here, so change do_md_stop() to succeed despite concurrent opens if the array has not yet been activated. i.e. if ->pers is NULL. Reported-by: "Baldysiak, Pawel" <pawel.baldysiak@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2014-10-14 13:08:28 +11:00
NeilBrown	d66b1b395a	md: don't allow bitmap file to be added to raid0/linear. An array can only accept a bitmap if it will call bitmap_daemon_work periodically, which means it needs a thread running. If there is no thread, don't allow a bitmap to be added. Signed-off-by: NeilBrown <neilb@suse.de>	2014-08-08 15:43:20 +10:00
Xiao Ni	ac7e50a383	md: Recovery speed is wrong When we calculate the speed of recovery, the numerator that contains the recovery done sectors. It's need to subtract the sectors which don't finish recovery. Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2014-08-08 12:11:25 +10:00
NeilBrown	af5628f05d	md: disable probing for md devices 512 and over. The way md devices are traditionally created in the kernel is simply to open the device with the desired major/minor number. This can be problematic as some support tools, notably udev and programs run by udev, can open a device just to see what is there, and find that it has created something. It is easy for a race to cause udev to open an md device just after it was destroy, causing it to suddenly re-appear. For some time we have had an alternate way to create md devices echo md_somename > /sys/modules/md_mod/paramaters/new_array This will always use a minor number of 512 or higher, which mdadm normally avoids. Using this makes the creation-by-opening unnecessary, but does not disable it, so it is still there to cause problems. This patch disable probing for devices with a major of 9 (MD_MAJOR) and a minor of 512 and up. This devices created by writing to new_array cannot be re-created by opening the node in /dev. Signed-off-by: NeilBrown <neilb@suse.de>	2014-07-31 13:54:54 +10:00
NeilBrown	133d4527ea	md: flush writes before starting a recovery. When we write to a degraded array which has a bitmap, we make sure the relevant bit in the bitmap remains set when the write completes (so a 're-add' can quickly rebuilt a temporarily-missing device). If, immediately after such a write starts, we incorporate a spare, commence recovery, and skip over the region where the write is happening (because the 'needs recovery' flag isn't set yet), then that write will not get to the new device. Once the recovery finishes the new device will be trusted, but will have incorrect data, leading to possible corruption. We cannot set the 'needs recovery' flag when we start the write as we do not know easily if the write will be "degraded" or not. That depends on details of the particular raid level and particular write request. This patch fixes a corruption issue of long standing and so it suitable for any -stable kernel. It applied correctly to 3.0 at least and will minor editing to earlier kernels. Reported-by: Bill <billstuff2001@sbcglobal.net> Tested-by: Bill <billstuff2001@sbcglobal.net> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/53A518BB.60709@sbcglobal.net Signed-off-by: NeilBrown <neilb@suse.de>	2014-07-03 10:44:45 +10:00
NeilBrown	9bd3592032	md: make sure GET_ARRAY_INFO ioctl reports correct "clean" status If an array has a bitmap, the when we set the "has bitmap" flag we incorrectly clear the "is clean" flag. "is clean" isn't really important when a bitmap is present, but it is best to get it right anyway. Reported-by: George Duffield <forumscollective@gmail.com> Link: http://lkml.kernel.org/CAG__1a4MRV6gJL38XLAurtoSiD3rLBTmWpcS5HYvPpSfPR88UQ@mail.gmail.com Fixes: `36fa30636f` (v2.6.14) Signed-off-by: NeilBrown <neilb@suse.de>	2014-07-03 10:44:31 +10:00

1 2 3 4 5 ...

782 Commits