OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Shaohua Li	8031c3ddc7	md/bitmap: copy correct data for bitmap super raid5 cache could write bitmap superblock before bitmap superblock is initialized. The bitmap superblock is less than 512B. The current code will only copy the superblock to a new page and write the whole 512B, which will zero the the data after the superblock. Unfortunately the data could include bitmap, which we should preserve. The patch will make superblock read do 4k chunk and we always copy the 4k data to new page, so the superblock write will old data to disk and we don't change the bitmap. Reported-by: Song Liu <songliubraving@fb.com> Reviewed-by: Song Liu <songliubraving@fb.com> Cc: stable@vger.kernel.org (4.10+) Signed-off-by: Shaohua Li <shli@fb.com>	2017-08-24 10:04:54 -07:00
Guoqing Jiang	4aaf7694f8	md/bitmap: don't read page from device with Bitmap_sync The device owns Bitmap_sync flag needs recovery to become in sync, and read page from this type device could get stale status. Also add comments for Bitmap_sync bit per the suggestion from Shaohua and Neil. Previous disscussion can be found here: https://marc.info/?t=149760428900004&r=1&w=2 Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-07-10 10:30:41 -07:00
Kyungchan Koh	4179bc30b2	md: uuid debug statement now in processor byte order. Previously, the uuid debug statements were printed in little-endian format, which wasn't consistent in machines that might not be in little-endian byte order. With this change, the output will be consistent for all machines with different byte-ordering. Signed-off-by: Kyungchan Koh <kkc6196@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-05-24 15:58:43 -07:00
Zhilong Liu	3560741e31	md: fix several trivial typos in comments Signed-off-by: Zhilong Liu <zlliu@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-03-23 22:54:57 -07:00
Guoqing Jiang	48df498daf	md: move bitmap_destroy to the beginning of __md_stop Since we have switched to sync way to handle METADATA_UPDATED msg for md-cluster, then process_metadata_update is depended on mddev->thread->wqueue. With the new change, clustered raid could possible hang if array received a METADATA_UPDATED msg after array unregistered mddev->thread, so we need to stop clustered raid (bitmap_destroy -> bitmap_free -> md_cluster_stop) earlier than unregister thread (mddev_detach -> md_unregister_thread). And this change should be safe for non-clustered raid since all writes are stopped before the destroy. Also in md_run, we activate the personality (pers->run()) before activating the bitmap (bitmap_create()). So it is pleasingly symmetric to stop the bitmap (bitmap_destroy()) before stopping the personality (__md_stop() calls pers->free()), we achieve this by move bitmap_destroy to the beginning of __md_stop. But we don't want to break the codes for waiting behind IO as Shaohua mentioned, so introduce bitmap_wait_behind_writes to call the codes, and call the new fun in both mddev_detach and bitmap_destroy, then we will not break original behind IO code and also fit the new condition well. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-03-16 16:55:58 -07:00
Guoqing Jiang	b98938d16a	md-cluster: introduce cluster_check_sync_size Support resize is a little complex for clustered raid, since we need to ensure all the nodes share the same knowledge about the size of raid. We achieve the goal by check the sync_size which is in each node's bitmap, we can only change the capacity after cluster_check_sync_size returns 0. Also, get_bitmap_from_slot is added to get a slot's bitmap. And we exported some funcs since they are used in cluster_check_sync_size(). We can also reuse get_bitmap_from_slot to remove redundant code existed in bitmap_copy_from_slot. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-03-16 16:55:50 -07:00
Shaohua Li	2953079c69	md: separate flags for superblock changes The mddev->flags are used for different purposes. There are a lot of places we check/change the flags without masking unrelated flags, we could check/change unrelated flags. These usage are most for superblock write, so spearate superblock related flags. This should make the code clearer and also fix real bugs. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-12-08 22:01:47 -08:00
NeilBrown	46533ff7fe	md: Use REQ_FAILFAST_* on metadata writes where appropriate This can only be supported on personalities which ensure that md_error() never causes an array to enter the 'failed' state. i.e. if marking a device Faulty would cause some data to be inaccessible, the device is status is left as non-Faulty. This is true for RAID1 and RAID10. If we get a failure writing metadata but the device doesn't fail, it must be the last device so we re-write without FAILFAST to improve chance of success. We also flag the device as LastDev so that future metadata updates don't waste time on failfast writes. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-11-22 09:11:33 -08:00
NeilBrown	581dbd94da	md/bitmap: add blktrace event for writes to the bitmap We trace wheneven bitmap_unplug() finds that it needs to write to the bitmap, or when bitmap_daemon_work() find there is work to do. This makes it easier to correlate bitmap updates with data writes. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-11-18 09:34:45 -08:00
NeilBrown	85c9ccd4f0	md/bitmap: Don't write bitmap while earlier writes might be in-flight As we don't wait for writes to complete in bitmap_daemon_work, they could still be in-flight when bitmap_unplug writes again. Or when bitmap_daemon_work tries to write again. This can be confusing and could risk the wrong data being written last. So make sure we wait for old writes to complete before new writes start. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-11-07 15:08:23 -08:00
NeilBrown	ec0cc22685	md/bitmap: change all printk() to pr_*() Follow err/warn distinction introduced in md.c Join multi-part strings into single string. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-11-07 15:08:21 -08:00
Guoqing Jiang	cbb3873236	md/bitmap: call bitmap_file_unmap once bitmap_storage_alloc returns -ENOMEM It is possible that bitmap_storage_alloc could return -ENOMEM, and some member inside store could be allocated such as filemap. To avoid memory leak, we need to call bitmap_file_unmap to free those members in the bitmap_resize. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-11-07 15:08:21 -08:00
Shaohua Li	f71f1cf97c	md/bitmap: fix wrong cleanup if bitmap_create fails, the bitmap is already cleaned up and the returned value is an error number. We can't do the cleanup again. Reported-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Shaohua Li <shli@fb.com>	2016-09-21 09:09:44 -07:00
Shaohua Li	d9dd26b20c	MD: hold mddev lock to change bitmap location Changing the location changes a lot of things. Holding the lock to avoid race. This makes the .quiesce called with mddev lock hold too. Acked-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-08-05 22:02:40 -07:00
Mike Christie	796a5cf083	md: use bio op accessors Separate the op from the rq_flag_bits and have md set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:41:38 -06:00
Mike Christie	2a222ca992	fs: have submit_bh users pass in op and flags separately This has submit_bh users pass in the operation and flags separately, so submit_bh_wbc can setup the bio op and bi_rw flags on the bio that is submitted. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:41:38 -06:00
Guoqing Jiang	51e453aecb	md-cluster: gather resync infos and enable recv_thread after bitmap is ready The in-memory bitmap is not ready when node joins cluster, so it doesn't make sense to make gather_all_resync_info() called so earlier, we need to call it after the node's bitmap is setup. Also, recv_thread could be wake up after node joins cluster, but it could cause problem if node receives RESYNCING message without persionality since mddev->pers->quiesce is called in process_suspend_info. This commit introduces a new cluster interface load_bitmaps to fix above problems, load_bitmaps is called in bitmap_load where bitmap and persionality are ready, and load_bitmaps does the following tasks: 1. call gather_all_resync_info to load all the node's bitmap info. 2. set MD_CLUSTER_ALREADY_IN_CLUSTER bit to recv_thread could be wake up, and wake up recv_thread if there is pending recv event. Then ack_bast only wakes up recv_thread after IN_CLUSTER bit is ready otherwise MD_CLUSTER_PENDING_RESYNC_EVENT is set. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-09 09:24:03 -07:00
kbuild test robot	bc47e84258	md-cluster: fix ifnullfree.cocci warnings drivers/md/bitmap.c:2049:6-11: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values. NULL check before some freeing functions is not needed. Based on checkpatch warning "kfree(NULL) is safe this check is probably not required" and kfreeaddr.cocci by Julia Lawall. Generated by: scripts/coccinelle/free/ifnullfree.cocci Acked-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Guoqing Jiang	c84400c89f	md-cluster/bitmap: unplug bitmap to sync dirty pages to disk This patch is doing two distinct but related things. 1. It adds bitmap_unplug() for the main bitmap (mddev->bitmap). As bit have been set, BITMAP_PAGE_DIRTY is set so bitmap_deamon_work() will not write those pages out in its regular scans, only bitmap_unplug() will. If there are no writes to the array, bitmap_unplug() won't be called, so we need to call it explicitly here. 2. bitmap_write_all() is a bit of a confusing interface as it doesn't actually write anything. The current code for writing "bitmap" works but this change makes it a bit clearer. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Guoqing Jiang	23cea66a37	md-cluster/bitmap: fix wrong page num in bitmap_file_clear_bit and bitmap_file_set_bit The pnum passed to set_page_attr and test_page_attr should from 0 to storage.file_pages - 1, but bitmap_file_set_bit and bitmap_file_clear_bit call set_page_attr and test_page_attr with page->index parameter while page->index has already added node_offset before. So we need to minus node_offset in both bitmap_file_clear_bit and bitmap_file_set_bit. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Guoqing Jiang	7f86ffed9b	md-cluster/bitmap: fix wrong calcuation of offset The offset is wrong in bitmap_storage_alloc, we should set it like below in bitmap_init_from_disk(). node_offset = bitmap->cluster_slot * (DIV_ROUND_UP(store->bytes, PAGE_SIZE)); Because 'offset' is only assigned to 'page->index' and that is usually over-written by read_sb_page. So it does not cause problem in general, but it still need to be fixed. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Guoqing Jiang	18c9ff7f48	md-cluster: sync bitmap when node received RESYNCING msg If the node received RESYNCING message which means another node will perform resync with the area, then we don't want to do it again in another node. Let's set RESYNC_MASK and clear NEEDED_MASK for the region from old-low to new-low which has finished syncing, and the region from old-hi to new-hi is about to syncing, bitmap_sync_with_cluste is introduced for the purpose. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Guoqing Jiang	c9d6503228	md-cluster: always setup in-memory bitmap The in-memory bitmap for raid is allocated on demand, then for cluster scenario, it is possible that slave node which received RESYNCING message doesn't have the in-memory bitmap when master node is perform resyncing, so we can't make bitmap is match up well among each nodes. So for cluster scenario, we need always preserve the bitmap, and ensure the page will not be freed. And a no_hijack flag is introduced to both bitmap_checkpage and bitmap_get_counter, which makes cluster raid returns fail once allocate failed. And the next patch is relied on this change since it keeps sync bitmap among each nodes during resyncing stage. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Linus Torvalds	63b106a87d	Merge tag 'md/4.6-rc2-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md Pull MD fixes from Shaohua Li: "This update mainly fixes bugs: - fix error handling (Guoqing) - fix a crash when a disk is hotremoved (me) - fix a dead loop (Wei Fang)" * tag 'md/4.6-rc2-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: md/bitmap: clear bitmap if bitmap_create failed MD: add rdev reference for super write md: fix a trivial typo in comments md:raid1: fix a dead loop when read from a WriteMostly disk	2016-04-09 11:23:27 -07:00
Kirill A. Shutemov	09cbfeaf1a	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced long time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-04-04 10:41:08 -07:00
Guoqing Jiang	f9a67b1182	md/bitmap: clear bitmap if bitmap_create failed If bitmap_create returns an error, we need to call either bitmap_destroy or bitmap_free to do clean up, and the selection is based on mddev->bitmap is set or not. And the sysfs_put(bitmap->sysfs_can_clear) is moved from bitmap_destroy to bitmap_free, and the comment of bitmap_create is changed as well. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-04-01 13:05:50 -07:00
Guoqing Jiang	c6f0b9f195	md/bitmap: remove redundant return in bitmap_checkpage The "return 0" is not needed since bitmap_checkpage will finally return 0 for the case. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-03-14 11:36:07 -07:00
Eric Engestrom	c97e0602bc	md/bitmap: remove redundant check daemon_sleep is an unsigned, so testing if it's 0 or less than 1 does the same thing. Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-03-07 09:30:16 -08:00
Shaohua Li	fc2561ec0a	md-cluster: delete useless code page->index already considers node offset. The node_offset calculation in write_sb_page is useless and confusion. Cc: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: NeilBrown <neilb@suse.com> Acked-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-01-24 18:13:37 -08:00
Goldwyn Rodrigues	c40f341f1e	md-cluster: Use a small window for resync Suspending the entire device for resync could take too long. Resync in small chunks. cluster's resync window (32M) is maintained in r1conf as cluster_sync_low and cluster_sync_high and processed in raid1's sync_request(). If the current resync is outside the cluster resync window: 1. Set the cluster_sync_low to curr_resync_completed. 2. Check if the sync will fit in the new window, if not issue a wait_barrier() and set cluster_sync_low to sector_nr. 3. Set cluster_sync_high to cluster_sync_low + resync_window. 4. Send a message to all nodes so they may add it in their suspension list. bitmap_cond_end_sync is modified to allow to force a sync inorder to get the curr_resync_completed uptodate with the sector passed. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>	2015-10-12 01:32:05 -05:00
Goldwyn Rodrigues	3c462c880b	md: Increment version for clustered bitmaps Add BITMAP_MAJOR_CLUSTERED as 5, in order to prevent older kernels to assemble a clustered device. In order to maximize compatibility, the major version is set to BITMAP_MAJOR_CLUSTERED only if the bitmap is clustered. Added MD_FEATURE_CLUSTERED in order to return error for older kernels which would assemble MD even if the bitmap is corrupted. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>	2015-10-12 01:31:33 -05:00
NeilBrown	da6fb7a9e5	md/bitmap: don't pass -1 to bitmap_storage_alloc. Passing -1 to bitmap_storage_alloc() causes page->index to be set to -1, which is quite problematic. So only pass ->cluster_slot if mddev_is_clustered(). Fixes: `b97e92574c` ("Use separate bitmaps for each nodes in the cluster") Cc: stable@vger.kernel.org (v4.1+) Signed-off-by: NeilBrown <neilb@suse.com>	2015-10-02 17:24:13 +10:00
Linus Torvalds	aca105a697	Some md fixes for 4.2 Several are tagged for -stable. A few aren't because they are not very, serious or because they are in the 'experimental' cluster code. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJVsbRgAAoJEDnsnt1WYoG54BcQAJlVxcGdMXNAbFAfhToH+cwY DcCKmnXiu3TCcK/gtr0SlUKIhv7kPE2HaGbTko3g5/uTk7SCuMSWRbjpQJr1u98U 9VUPZV0RGLEcQXgsjG3sobEtdSYMSn1/BpdJpmsn/Q6cyTheiEJA9fEghn9F0Iw9 3Ctc56aL0nKsnpeRTvWmKR3F995L2Ene+pIHPbSqTlQcGU+DdxQL/iY9YspMzeih 6SWQICo59+pGSRQdKaU968nZ1a8hJCvhV2NbW0JsJMo9iGo5htCJLdxZatscZV6E xAppCRymW/Nl0n59wCOBhUp+0skVhtYZ1UmNdx/vUA7vcZJX85vIG7WGaaSQAmlF Y4nNMSaX6SWbt/oVgq+JiD39T1oFZN5N5Fh9juqcp3fshiWLO74cnTwea1LtkQZX LfW2HhajDUgoJqoXL6ppKDK8l7alQdcaoYn0C6SfqRZ+PLB5ERrSBHNZpKDbI9aw CFW93lHTtlwtwZn0S7k06mCG8ilO4iPthUVaJEeI1kUWVKY0Ju4xnVTt/7jCroUa Km14KdWeZsS8j9/xr6FYBjarjuh7M9SoflgUK84txE+PIZsSczyrErpBkMf2YBWQ 0KduqQabXa1XYWGtZ7Uhj6iOOGNBcTcZsww8cBEDOJEDyCtSs5w/iPmsFCEGUuo2 YTz6qKpYPgB1VzK2PoBG =6ZCK -----END PGP SIGNATURE----- Merge tag 'md/4.2-fixes' of git://neil.brown.name/md Pull md fixes from Neil Brown: "Some md fixes for 4.2 Several are tagged for -stable. A few aren't because they are not very, serious or because they are in the 'experimental' cluster code" * tag 'md/4.2-fixes' of git://neil.brown.name/md: md/raid5: clear R5_NeedReplace when no longer needed. Fix read-balancing during node failure md-cluster: fix bitmap sub-offset in bitmap_read_sb md: Return error if request_module fails and returns positive value md: Skip cluster setup in case of error while reading bitmap md/raid1: fix test for 'was read error from last working device'. md: Skip cluster setup for dm-raid md: flush ->event_work before stopping array. md/raid10: always set reshape_safe when initializing reshape_position. md/raid5: avoid races when changing cache size.	2015-07-25 11:24:58 -07:00
Goldwyn Rodrigues	33e38ac688	md-cluster: fix bitmap sub-offset in bitmap_read_sb bitmap_read_sb is modifying mddev->bitmap_info.offset. This works for the first bitmap read. However, when multiple bitmaps need to be opened by the same node, it ends up corrupting the offset. Fix it by using a local variable. Also, bitmap_read_sb is not required in bitmap_copy_from_slot since it is called in bitmap_create. Remove bitmap_read_sb(). Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>	2015-07-24 13:37:55 +10:00
Goldwyn Rodrigues	f735727319	md: Skip cluster setup in case of error while reading bitmap If the bitmap read fails, the error code set is -EINVAL. However, we don't check for errors and go ahead with cluster_setup. Skip the cluster setup in case of error. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>	2015-07-24 13:37:48 +10:00
Goldwyn Rodrigues	d3b178adb3	md: Skip cluster setup for dm-raid There is a bug that the bitmap superblock isn't initialised properly for dm-raid, so a new field can have garbage in new fields. (dm-raid does initialisation in the kernel - md initialised the superblock in mdadm). This means that for dm-raid we cannot currently trust the new ->nodes field. So: - use __GFP_ZERO to initialise the superblock properly for all new arrays - initialise all fields in bitmap_info in bitmap_new_disk_sb - ignore ->nodes for dm arrays (yes, this is a hack) This bug exposes dm-raid to bug in the (still experimental) md-cluster code, so it is suitable for -stable. It does cause crashes. References: https://bugzilla.kernel.org/show_bug.cgi?id=100491 Cc: stable@vger.kernel.org (v4.1) Signed-off-By: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>	2015-07-23 09:22:00 +10:00
Linus Torvalds	1dc51b8288	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted VFS fixes and related cleanups (IMO the most interesting in that part are f_path-related things and Eric's descriptor-related stuff). UFS regression fixes (it got broken last cycle). 9P fixes. fs-cache series, DAX patches, Jan's file_remove_suid() work" [ I'd say this is much more than "fixes and related cleanups". The file_table locking rule change by Eric Dumazet is a rather big and fundamental update even if the patch isn't huge. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits) 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation fs/file.c: __fget() and dup2() atomicity rules fs/file.c: don't acquire files->file_lock in fd_install() fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation vfs: avoid creation of inode number 0 in get_next_ino namei: make set_root_rcu() return void make simple_positive() public ufs: use dir_pages instead of ufs_dir_pages() pagemap.h: move dir_pages() over there remove the pointless include of lglock.h fs: cleanup slight list_entry abuse xfs: Correctly lock inode when removing suid and file capabilities fs: Call security_ops->inode_killpriv on truncate fs: Provide function telling whether file_remove_privs() will do anything ...	2015-07-04 19:36:06 -07:00
Miklos Szeredi	2726d56620	vfs: add seq_file_path() helper Turn seq_path(..., &file->f_path, ...); into seq_file_path(..., file, ...); Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-06-23 18:01:07 -04:00
Miklos Szeredi	9bf39ab2ad	vfs: add file_path() helper Turn d_path(&file->f_path, ...); into file_path(file, ...); Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-06-23 18:00:05 -04:00
NeilBrown	8532e34390	md/bitmap: remove rcu annotation from pointer arithmetic. Evaluating "&mddev->disks" is simple pointer arithmetic, so it does not need 'rcu' annotations - no dereferencing is happening. Also enhance the comment to explain that 'rdev' in that case is not actually a pointer to an rdev. Reported-by: Patrick Marlier <patrick.marlier@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2015-05-21 09:14:41 +10:00
Goldwyn Rodrigues	97f6cd39da	md-cluster: re-add capabilities When "re-add" is writted to /sys/block/mdXX/md/dev-YYY/state, the clustered md: 1. Sends RE_ADD message with the desc_nr. Nodes receiving the message clear the Faulty bit in their respective rdev->flags. 2. The node initiating re-add, gathers the bitmaps of all nodes and copies them into the local bitmap. It does not clear the bitmap from which it is copying. 3. Initiating node schedules a md recovery to sync the devices. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>	2015-04-22 07:59:39 +10:00
Goldwyn Rodrigues	124eb761ed	md: Fix bitmap offset calculations The calculations of bitmap offset is incorrect with respect to bits to bytes conversion. Also, remove an irrelevant duplicate message. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>	2015-03-25 13:07:55 +11:00
Stephen Rothwell	3b0e6aacbf	md/bitmap: use sector_div for sector_t divisions neilb: modified to not corrupt ->resync_max_sectors. sector_div usage fixed by Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NeilBrown <neilb@suse.de>	2015-03-04 13:08:16 +11:00
NeilBrown	935f3d4fc6	md/bitmap: fix incorrect DIV_ROUND_UP usage. DIV_ROUTND_UP doesn't work on "long long", - and it should be sector_t anyway. Signed-off-by: NeilBrown <neilb@suse.de>	2015-03-04 12:54:29 +11:00
Goldwyn Rodrigues	11dd35daaa	Copy set bits from another slot bitmap_copy_from_slot reads the bitmap from the slot mentioned. It then copies the set bits to the node local bitmap. This is helper function for the resync operation on node failure. bitmap_set_memory_bits() currently assumes it is only run at startup and that they bitmap is currently empty. So if it finds that a region is already marked as dirty, it won't mark it dirty again. Change bitmap_set_memory_bits() to always set the NEEDED_MASK bit if 'needed' is set. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>	2015-02-23 09:59:05 -06:00
Goldwyn Rodrigues	f9209a3235	bitmap_create returns bitmap pointer This is done to have multiple bitmaps open at the same time. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>	2015-02-23 09:57:57 -06:00
Goldwyn Rodrigues	b97e92574c	Use separate bitmaps for each nodes in the cluster On-disk format: 0 4k 8k 12k ------------------------------------------------------------------- \| idle \| md super \| bm super [0] + bits \| \| bm bits[0, contd] \| bm super[1] + bits \| bm bits[1, contd] \| \| bm super[2] + bits \| bm bits [2, contd] \| bm super[3] + bits \| \| bm bits [3, contd] \| \| \| Bitmap super has a field nodes, which defines the maximum number of nodes the device can use. While reading the bitmap super, if the cluster finds out that the number of nodes is > 0: 1. Requests the md-cluster module. 2. Calls md_cluster_ops->join(), which sets up clustering such as joining DLM lockspace. Since the first time, the first bitmap is read. After the call to the cluster_setup, the bitmap offset is adjusted and the superblock is re-read. This also ensures the bitmap is read the bitmap lock (when bitmap lock is introduced in later patches) Questions: 1. cluster name is repeated in all bitmap supers. Is that okay? Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>	2015-02-23 07:30:11 -06:00
Goldwyn Rodrigues	cf921cc19c	Add node recovery callbacks DLM offers callbacks when a node fails and the lock remastery is performed: 1. recover_prep: called when DLM discovers a node is down 2. recover_slot: called when DLM identifies the node and recovery can start 3. recover_done: called when all nodes have completed recover_slot recover_slot() and recover_done() are also called when the node joins initially in order to inform the node with its slot number. These slot numbers start from one, so we deduct one to make it start with zero which the cluster-md code uses. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>	2015-02-23 07:30:11 -06:00
Goldwyn Rodrigues	c4ce867fda	Introduce md_cluster_info md_cluster_info stores the cluster information in the MD device. The join() is called when mddev detects it is a clustered device. The main responsibilities are: 1. Setup a DLM lockspace 2. Setup all initial locks such as super block locks and bitmap lock (will come later) The leave() clears up the lockspace and all the locks held. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>	2015-02-23 07:28:42 -06:00
NeilBrown	b7b17c9b67	md: remove mddev_lock() from md_attr_show() Most attributes can be read safely without any locking. A race might lead to a slightly out-dated value, but nothing wrong. We already have locking in some places where needed. All that remains is can_clear_show(), behind_writes_used_show() and action_show() which are easily fixed. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-06 09:32:55 +11:00

1 2 3 4 5

246 Commits