OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Lars Ellenberg	ccae7868b0	drbd: log request sector offset and size for IO errors Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	a2a3c74f24	drbd: always write bitmap on detach If we detach due to local read-error (which sets a bit in the bitmap), stay Primary, and then re-attach (which re-reads the bitmap from disk), we potentially lost the "out-of-sync" (or, "bad block") information in the bitmap. Always (try to) write out the changed bitmap pages before going diskless. That way, we don't lose the bit for the bad block, the next resync will fetch it from the peer, and rewrite it locally, which may result in block reallocation in some lower layer (or the hardware), and thereby "heal" the bad blocks. If the bitmap writeout errors out as well, we will (again: try to) mark the "we need a full sync" bit in our super block, if it was a READ error; writes are covered by the activity log already. If that superblock does not make it to disk either, we are sorry. Maybe we just lost an entire disk or controller (or iSCSI connection), and there actually are no bad blocks at all, so we don't need to re-fetch from the peer, there is no "auto-healing" necessary. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	06f10adbdb	drbd: prepare for more than 32 bit flags - struct drbd_conf { ... unsigned long flags; ... } + struct drbd_conf { ... unsigned long drbd_flags[N]; ... } And introduce wrapper functions for test/set/clear bit operations on this member. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	44edfb0d78	drbd: wait for meta data IO completion even with failed disk, unless force-detached The intention of force-detach is to be able to deal with a completely unresponsive lower level IO stack, which does not even deliver error completions anymore, but no completion at all. In all other cases, we must still wait for the meta data IO completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	8b45a5c8a1	drbd: a few more GFP_KERNEL -> GFP_NOIO This has not yet been observed, but conceivably, when using GFP_KERNEL allocations from drbd_md_sync(), drbd_flush_after_epoch() or receive_SyncParam(), we could trigger additional IO to our own device, or an other device in a criss-cross setup, and end up in a local deadlock, or potentially a distributed deadlock in a criss-cross setup involving the peer blocked in a similar way waiting for us to make progress. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	0b143d4382	drbd: fix potential deadlock during bitmap (re-)allocation The former comment arguing that GFP_KERNEL was good enough was wrong: it did not take resize into account at all, and assumed the only path leading here was the normal attach on a still secondary device, so no deadlock would be possible. Both resize on a Primary, or attach on a diskless Primary, could potentially deadlock. drbd_bm_resize() is called while IO to the respective device is suspended, so we must use GFP_NOIO to avoid potential deadlock. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:18 +01:00
Lars Ellenberg	7fb907c15f	drbd: panic on delayed completion of aborted requests "aborting" requests, or force-detaching the disk, is intended for completely blocked/hung local backing devices which do no longer complete requests at all, not even do error completions. In this situation, usually a hard-reset and failover is the only way out. By "aborting", basically faking a local error-completion, we allow for a more graceful swichover by cleanly migrating services. Still the affected node has to be rebooted "soon". By completing these requests, we allow the upper layers to re-use the associated data pages. If later the local backing device "recovers", and now DMAs some data from disk into the original request pages, in the best case it will just put random data into unused pages; but typically it will corrupt meanwhile completely unrelated data, causing all sorts of damage. Which means delayed successful completion, especially for READ requests, is a reason to panic(). We assume that a delayed error completion is OK, though we still will complain noisily about it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:17 +01:00
Philipp Reisner	dbd0820c6f	drbd: Remove dead code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:17 +01:00
Philipp Reisner	599377acb7	drbd: Avoid NetworkFailure state during disconnect Disconnecting is a cluster wide state change. In case the peer node agrees to the state transition, it sends back the fact on the meta-data connection and closes both sockets. In case the node node that initiated the state transfer sees the closing action on the data-socket, before the P_STATE_CHG_REPLY packet, it was going into one of the network failure states. At least with the fencing option set to something else thatn "dont-care", the unclean shutdown of the connection causes a short IO freeze or a fence operation. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:17 +01:00
Philipp Reisner	c12a3d8c84	drbd: Fix a potential issue with the DISCARD_CONCURRENT flag The DISCARD_CONCURRENT flag should be set on one node and cleared on the other node. As the code was before it was theoretical possible that a node accepts the meta socket, but has to close it later on, and keeps the DISCARD_CONCURRENT flag. Correct this by moving the clear_bit(DISCARD_CONCURRENT) where the packet gets sent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:01 +01:00
Lars Ellenberg	02b91b5526	drbd: introduce stop-sector to online verify We now can schedule only a specific range of sectors for online verify, or interrupt a running verify without interrupting the connection. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:01 +01:00
Philipp Reisner	9f2247bb9b	drbd: Protect accesses to the uuid set with a spinlock There is at least the worker context, the receiver context, the context of receiving netlink packts and processes reading a sysfs attribute that access the uuids. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-10-30 08:39:01 +01:00
Kent Overstreet	395c72a707	block: Generalized bio pool freeing With the old code, when you allocate a bio from a bio pool you have to implement your own destructor that knows how to find the bio pool the bio was originally allocated from. This adds a new field to struct bio (bi_pool) and changes bio_alloc_bioset() to use it. This makes various bio destructors unnecessary, so they're then deleted. v6: Explain the temporary if statement in bio_put Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> CC: Alasdair Kergon <agk@redhat.com> CC: Nicholas Bellinger <nab@linux-iscsi.org> CC: Lars Ellenberg <lars.ellenberg@linbit.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-09-09 10:35:38 +02:00
Linus Torvalds	a7e546f175	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block-related fixes from Jens Axboe: - Improvements to the buffered and direct write IO plugging from Fengguang. - Abstract out the mapping of a bio in a request, and use that to provide a blk_bio_map_sg() helper. Useful for mapping just a bio instead of a full request. - Regression fix from Hugh, fixing up a patch that went into the previous release cycle (and marked stable, too) attempting to prevent a loop in __getblk_slow(). - Updates to discard requests, fixing up the sizing and how we align them. Also a change to disallow merging of discard requests, since that doesn't really work properly yet. - A few drbd fixes. - Documentation updates. * 'for-linus' of git://git.kernel.dk/linux-block: block: replace __getblk_slow misfix by grow_dev_page fix drbd: Write all pages of the bitmap after an online resize drbd: Finish requests that completed while IO was frozen drbd: fix drbd wire compatibility for empty flushes Documentation: update tunable options in block/cfq-iosched.txt Documentation: update tunable options in block/cfq-iosched.txt Documentation: update missing index files in block/00-INDEX block: move down direct IO plugging block: remove plugging at buffered write time block: disable discard request merge temporarily bio: Fix potential memory leak in bio_find_or_create_slab() block: Don't use static to define "void *p" in show_partition_start() block: Add blk_bio_map_sg() helper block: Introduce __blk_segment_map_sg() helper fs/block-dev.c:fix performance regression in O_DIRECT writes to md block devices block: split discard into aligned requests block: reorganize rounding of max_discard_sectors	2012-08-25 11:36:43 -07:00
Philipp Reisner	d1aa4d04da	drbd: Write all pages of the bitmap after an online resize We need to write the whole bitmap after we moved the meta data due to an online resize operation. With the support for one peta byte devices bitmap IO was optimized to only write out touched pages. This optimization must be turned off when writing the bitmap after an online resize. This issue was introduced with drbd-8.3.10. The impact of this bug is that after an online resize, the next resync could become larger than expected. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-08-16 17:17:35 +02:00
Philipp Reisner	509fc019e5	drbd: Finish requests that completed while IO was frozen Requests of an acked epoch are stored on the barrier_acked_requests list. In case the private bio of such a request completes while IO on the drbd device is suspended [req_mod(completed_ok)] then the request stays there. When thawing IO because the fence_peer handler returned, then we use tl_clear() to apply the connection_lost_while_pending event to all requests on the transfer-log and the barrier_acked_requests list. Up to now the connection_lost_while_pending event was not applied on requests on the barrier_acked_requests list. Fixed that. I.e. now the connection_lost_while_pending and resend events are applied to requests on the barrier_acked_requests list. For that it is necessary that the resend event finishes (local only) READS correctly. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-08-16 17:14:45 +02:00
Lars Ellenberg	227f052f47	drbd: fix drbd wire compatibility for empty flushes DRBD has a concept of request epochs or reorder-domains, which are separated on the wire by P_BARRIER packets. Older DRBD is not able to handle zero-sized requests at all, so we need to map empty flushes to these drbd barriers. These are the equivalent of empty flushes, and by default trigger flushes on the receiving side anyways (unless not supported or explicitly disabled), so there is no need to handle this differently in newer drbd either. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-08-16 17:12:56 +02:00
Artem Bityutskiy	d97482ede2	drbd: nuke pdflush from comments The pdflush thread is long gone, so this patch removes references to pdflush from drbd comments. Cc: drbd-dev@lists.linbit.com Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-08-04 12:15:39 +04:00
Lars Ellenberg	a73ff3231d	drbd: announce FLUSH/FUA capability to upper layers Unconditionally announce FLUSH/FUA to upper layers. If the lower layers on either node do not actually support this, generic_make_request() will deal with it. If this causes performance regressions on your setup, make sure there are no volatile caches involved, and mount -o nobarrier or equivalent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 15:14:28 +02:00
Lars Ellenberg	db141b2f42	drbd: fix max_bio_size to be unsigned We capped our max_bio_size respectively max_hw_sectors with min_t(int, lower level limit, our limit); unfortunately, some drivers, e.g. the kvm virtio block driver, initialize their limits to "-1U", and that is of course a smaller "int" value than our limit. Impact: we started to request 16 MB resync requests, which lead to protocol error and a reconnect loop. Fix all relevant constants and parameters to be unsigned int. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 15:14:00 +02:00
Lars Ellenberg	7ee1fb93f3	drbd: flush drbd work queue before invalidate/invalidate remote If you do back to back wait-sync/invalidate on a Primary in a tight loop, during application IO load, you could trigger a race: kernel: block drbd6: FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? Fix this by changing the order of the drbd_queue_work() and the wake_up() in dec_ap_pending(), and adding the additional drbd_flush_workqueue() before requesting the full sync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:15:58 +02:00
Lars Ellenberg	c12e9c8964	drbd: fix potential access after free Occasionally, if we disconnect, we triggered this assert: block drbd7: ASSERT FAILED tl_hash[27] == c30b0f04, expected NULL hlist_del() happens only on master bio completion. We used to wait for pending IO to complete before freeing tl_hash on disconnect. We no longer do so, since we learned to "freeze" IO on disconnect. If the local disk is too slow, we may reach C_STANDALONE early, and there are still some requests pending locally when we call drbd_free_tl_hash(). If we now free the tl_hash, and later the local IO completion completes the master bio, which then does hlist_del() and clobbers freed memory. Do hlist_del_init() and hlist_add_fake() before kfree(tl_hash), so the hlist_del() on master bio completion is harmless. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:15:16 +02:00
Lars Ellenberg	63a6d0bb3d	drbd: call local-io-error handler early In case we want to hard-reset from the local-io-error handler, we need to call it before notifying the peer or aborting local IO. Otherwise the peer will advance its data generation UUIDs even if secondary. This way, local io error looks like a "regular" node crash, which reduces the number of different failure cases. This may be useful in a bigger picture where crashed or otherwise "misbehaving" nodes are automatically re-deployed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:10:41 +02:00
Lars Ellenberg	0029d62434	drbd: do not reset rs_pending_cnt too early Fix asserts like block drbd0: in got_BlockAck:4634: rs_pending_cnt = -35 < 0 ! We reset the resync lru cache and related information (rs_pending_cnt), once we successfully finished a resync or online verify, or if the replication connection is lost. We also need to reset it if a resync or online verify is aborted because a lower level disk failed. In that case the replication link is still established, and we may still have packets queued in the network buffers which want to touch rs_pending_cnt. We do not have any synchronization mechanism to know for sure when all such pending resync related packets have been drained. To avoid this counter to go negative (and violate the ASSERT that it will always be >= 0), just do not reset it when we lose a disk. It is good enough to make sure it is re-initialized before the next resync can start: reset it when we re-attach a disk. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:09:53 +02:00
Lars Ellenberg	88437879fb	drbd: reset congestion information before reporting it in /proc/drbd We cache the congestion status in mdev->congestion_reason whenever drbd_congested() was called. Reset this cached info before reporting it when reading /proc/drbd. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:07:48 +02:00
Lars Ellenberg	c2ba686f35	drbd: report congestion if we are waiting for some userland callback If the drbd worker thread is synchronously waiting for some userland callback, we don't want some casual pageout to block on us. Have drbd_congested() report congestion in that case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:07:18 +02:00
Lars Ellenberg	383606e0de	drbd: differentiate between normal and forced detach Aborting local requests (not waiting for completion from the lower level disk) is dangerous: if the master bio has been completed to upper layers, data pages may be re-used for other things already. If local IO is still pending and later completes, this may cause crashes or corrupt unrelated data. Only abort local IO if explicitly requested. Intended use case is a lower level device that turned into a tarpit, not completing io requests, not even doing error completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:06:18 +02:00
Lars Ellenberg	d264580145	drbd: cleanup, remove two unused global flags Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-07-24 14:02:41 +02:00
Jens Axboe	6d407cfaf5	Merge branch 'for-jens' of git://git.drbd.org/linux-drbd into for-linus	2012-06-13 21:19:42 +02:00
Lars Ellenberg	0d5934e3c2	drbd: fix null pointer dereference with on-congestion policy when diskless We must not look at mdev->actlog, unless we have a get_ldev() reference. It also does not make much sense to try to disconnect or pull-ahead of the peer, if we don't have good local data. Only even consider congestion policies, if our local disk is D_UP_TO_DATE. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-06-12 14:35:19 +02:00
Lars Ellenberg	1ed25b269e	drbd: fix list corruption by failing but already aborted reads If a read is aborted due to force-detach of a supposedly unresponsive local backing device, and retried on the peer, it can happen that the local request later still completes (hopefully with an error). As it may already have been completed to upper layers meanwhile, it must not be retried again now. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-06-12 14:34:51 +02:00
Lars Ellenberg	4eccc57979	drbd: fix access of unallocated pages and kernel panic BUG: unable to handle kernel NULL pointer dereference at (null) ... [<d1e17561>] ? _drbd_bm_set_bits+0x151/0x240 [drbd] [<d1e236f8>] ? receive_bitmap+0x4f8/0xbc0 [drbd] This fixes an off-by-one error in the receive_bitmap() path, if run-length encoded bitmap transfer is enabled. If the bitmap is an exact multiple of PAGE_SIZE, which means the visible capacity of the drbd device is an exact multiple of 128 MiB (for 4k page size), and bitmap compression (use-rle) is enabled (which became default with 8.4), and the very last bit is dirty and reported in an rle comressed bitmap packet, we ended up trying to kmap_atomic a page pointer that does not exist (bitmap->bm_pages[last index + 1]). bug introduced by: Date: Fri Jul 24 15:33:24 2009 +0200 set bits: optimize for complete last word, fix off-by-one-word corner case made effective by: Date: Thu Dec 16 00:32:38 2010 +0100 drbd: get rid of unused debug code Long time ago, we had paranoia code in the bitmap that allocated one extra word, assigned a magic value, and checked on every occasion that the magic value was still unchanged. That debug code is unused, the extra long word complicates code a bit. Get rid of it. No-one triggered this bug in the last few years, because a large subset of our userbase is unaffected: * typically the last few blocks of a device are not modified frequently, and remain unset * use-rle was disabled by default in drbd < 8.4 * those with slightly "odd" device sizes, or * drbd internal meta data (which will skew the device size slightly, thus makes it harder to have a bug relevant device size) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-06-12 14:32:48 +02:00
Linus Torvalds	a70f35af4e	Merge branch 'for-3.5/drivers' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "Here are the driver related changes for 3.5. It contains: - The floppy changes from Jiri. Jiri is now also marked as the maintainer of floppy.c, I shall be publically branding his forehead with red hot iron at the next opportune moment. - A batch of drbd updates and fixes from the linbit crew, as well as fixes from others. - Two small fixes for xen-blkfront courtesy of Jan." * 'for-3.5/drivers' of git://git.kernel.dk/linux-block: (70 commits) floppy: take over maintainership floppy: remove floppy-specific O_EXCL handling floppy: convert to delayed work and single-thread wq xen-blkfront: module exit handling adjustments xen-blkfront: properly name all devices drbd: grammar fix in log message drbd: check MODULE for THIS_MODULE drbd: Restore the request restart logic drbd: introduce a bio_set to allocate housekeeping bios from drbd: remove unused define drbd: bm_page_async_io: properly initialize page->private drbd: use the newly introduced page pool for bitmap IO drbd: add page pool to be used for meta data IO drbd: allow bitmap to change during writeout from resync_finished drbd: fix race between drbdadm invalidate/verify and finishing resync drbd: fix resend/resubmit of frozen IO drbd: Ensure that data_size is not 0 before using data_size-1 as index drbd: Delay/reject other state changes while establishing a connection drbd: move put_ldev from __req_mod() to the endio callback drbd: fix WRITE_ACKED_BY_PEER_AND_SIS to not set RQ_NET_DONE ...	2012-05-30 09:05:47 -07:00
David S. Miller	028940342a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-05-16 22:17:37 -04:00
Eric W. Biederman	38bf195398	connector/userns: replace netlink uses of cap_raised() with capable() In 2009 Philip Reiser notied that a few users of netlink connector interface needed a capability check and added the idiom cap_raised(nsp->eff_cap, CAP_SYS_ADMIN) to a few of them, on the premise that netlink was asynchronous. In 2011 Patrick McHardy noticed we were being silly because netlink is synchronous and removed eff_cap from the netlink_skb_params and changed the idiom to cap_raised(current_cap(), CAP_SYS_ADMIN). Looking at those spots with a fresh eye we should be calling capable(CAP_SYS_ADMIN). The only reason I can see for not calling capable is that it once appeared we were not in the same task as the caller which would have made calling capable() impossible. In the initial user_namespace the only difference between between cap_raised(current_cap(), CAP_SYS_ADMIN) and capable(CAP_SYS_ADMIN) are a few sanity checks and the fact that capable(CAP_SYS_ADMIN) sets PF_SUPERPRIV if we use the capability. Since we are going to be using root privilege setting PF_SUPERPRIV seems the right thing to do. The motivation for this that patch is that in a child user namespace cap_raised(current_cap(),...) tests your capabilities with respect to that child user namespace not capabilities in the initial user namespace and thus will allow processes that should be unprivielged to use the kernel services that are only protected with cap_raised(current_cap(),..). To fix possible user_namespace issues and to just clean up the code replace cap_raised(current_cap(), CAP_SYS_ADMIN) with capable(CAP_SYS_ADMIN). Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Philipp Reisner <philipp.reisner@linbit.com> Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-10 23:21:39 -04:00
Lars Ellenberg	92b4ca291f	drbd: grammar fix in log message Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-10 12:00:56 +02:00
Cong Wang	bc4854bc91	drbd: check MODULE for THIS_MODULE THIS_MODULE is NULL only when drbd is compiled as built-in, so the #ifdef CONFIG_MODULES should be #ifdef MODULE instead. This fixes the warning: drivers/block/drbd/drbd_main.c: In function ‘drbd_buildtag’: drivers/block/drbd/drbd_main.c:4187:24: warning: the comparison will always evaluate as ‘true’ for the address of ‘__this_module’ will never be NULL [-Waddress] Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-10 12:00:54 +02:00
Philipp Reisner	f6d0a8dbfd	drbd: Restore the request restart logic It got lost with the commit `5a7bbad27a` "block: remove support for bio remapping from ->make_request" Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 17:20:59 +02:00
Lars Ellenberg	9476f39d66	drbd: introduce a bio_set to allocate housekeeping bios from Don't rely on availability of bios from the global fs_bio_set, we should use our own bio_set for meta data IO. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:07 +02:00
Lars Ellenberg	3c2f7a856f	drbd: remove unused define Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:06 +02:00
Arne Redlich	0c7db27920	drbd: bm_page_async_io: properly initialize page->private If bm_page_async_io is advised to use a new page for I/O (BM_AIO_COPY_PAGES is set), it will get it from a mempool. Once the mempool has to dip into its reserves the page is not reinitialized, i.e. page->private contains garbage, which will lead to various problems once the I/O completes (dereferences of NULL pointers, the submitting thread getting stuck in D-state, ...). Signed-off-by: Arne Redlich <arne.redlich@googlemail.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>	2012-05-09 15:17:04 +02:00
Lars Ellenberg	4d95a10f97	drbd: use the newly introduced page pool for bitmap IO Conflicts: drbd/drbd_bitmap.c Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:03 +02:00
Lars Ellenberg	4281808fb3	drbd: add page pool to be used for meta data IO Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:02 +02:00
Lars Ellenberg	0e8488ade2	drbd: allow bitmap to change during writeout from resync_finished Symptom: messages similar to "FIXME asender in bm_change_bits_to, bitmap locked for 'write from resync_finished' by worker" If a resync or verify is finished (or aborted), a full bitmap writeout is triggered. If we have ongoing local IO, the bitmap may still change during that writeout, pending and not yet processed acks may cause bits to be cleared, while new writes may cause bits to be to be set. To fix this, introduce the drbd_bm_write_copy_pages() variant. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:00 +02:00
Lars Ellenberg	a574daf5d7	drbd: fix race between drbdadm invalidate/verify and finishing resync When a resync or online verify is finished or aborted, drbd does a bulk write-out of changed bitmap pages. If in that very moment a new verify or resync is triggered, this can race: ASSERT( !test_bit(BITMAP_IO, &mdev->flags) ) in drbd_main.c FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? and similar. This can be observed with e.g. tight invalidate loops in test scripts, and probably has no real-life implication. Still, that race can be solved by first quiescen the device, before starting a new resync or verify. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:59 +02:00
Lars Ellenberg	ba280c092e	drbd: fix resend/resubmit of frozen IO DRBD can freeze IO, due to fencing policy (fencing resource-and-stonith), or because we lost access to data (on-no-data-accessible suspend-io). Resuming from there (re-connect, or re-attach, or explicit admin intervention) should "just work". Unfortunately, if the re-attach/re-connect did not happen within the timeout, since the commit drbd: Implemented real timeout checking for request processing time if so configured, the request_timer_fn() would timeout and detach/disconnect virtually immediately. This change tracks the most recent attach and connect, and does not timeout within <configured timeout interval> after attach/connect. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:58 +02:00
Philipp Reisner	5de738272e	drbd: Ensure that data_size is not 0 before using data_size-1 as index This could be exploited by a peer which runs modified code. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:56 +02:00
Philipp Reisner	197296ffed	drbd: Delay/reject other state changes while establishing a connection Changes to the role and disk state should be delayed or rejected while we establish a connection. This is necessary, since the peer will base its resync decision on the UUIDs and the state we sent in the drbd_connect() function. The most prominent example for this race is becoming primary after sending state and UUIDs and before the state changes to C_WF_CONNECTION. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:55 +02:00
Lars Ellenberg	46385c84ac	drbd: move put_ldev from __req_mod() to the endio callback One invocation in the endio handler is good enough, we don't need mention it for each of the different ways it calls __req_mod(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:51 +02:00
Lars Ellenberg	d64957c9a9	drbd: fix WRITE_ACKED_BY_PEER_AND_SIS to not set RQ_NET_DONE Just because this request happened during a resync does not mean it may pretend to have been barrier-acked. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:50 +02:00

1 2 3 4 5 ...

470 Commits