linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Philipp Reisner	2b8a90b555	drbd: Corrected off-by-one error in DRBD_MINOR_COUNT_MAX Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:45:31 +01:00
Andreas Gruenbacher	110a204a35	drbd: Remove useless / wrong comments Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:45:29 +01:00
Philipp Reisner	794abb753e	drbd: Cleaned up the resync timer logic Besides removed a few lines of code, this moves the inspection of the state from before the queuing process to after the queuing. I.e. more closely to the actual invocation of the work. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:45:28 +01:00
Philipp Reisner	da0a78161d	drbd: Be more careful with SyncSource -> Ahead transitions We may not get from SyncSource to Ahead if we have sent some P_RS_DATA_REPLY packets to the peer and are waiting for P_WRITE_ACK. Again, this is not relevant for proper tuned systems, but makes sure that the not-tuned system does not get diverging bitmaps. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:45:26 +01:00
Philipp Reisner	d612d309e4	drbd: No longer answer P_RS_DATA_REQUEST packets when in C_AHEAD mode When the sync source node replies to a P_RS_DATA_REQUEST packet when it is already in ahead mode. I.e. those two packets crossed each other on the wire, that may lead to diverging bitmaps. This never happens in a well-tuned-system. In a well-tuned- system the resync controller has reduced the resync speed to zero long before we got into ahead-mode. But we have to be prepared for the not-well-tuned-system of course as well. Because -> diverging bitmaps = non terminating resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:45:25 +01:00
Philipp Reisner	617049aa7d	drbd: Fixed an issue with AHEAD -> SYNC_SOURCE transitions Create a new barrier when leaving the AHEAD mode. Otherwise we trigger the assertion in req_mod(, barrier_acked) D_ASSERT(req->rq_state & RQ_NET_SENT); The new barrier is created by recycling the newest existing one. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:45:23 +01:00
Lars Ellenberg	0719427278	drbd: ratelimit io error messages Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:45:21 +01:00
Philipp Reisner	3f98688afc	drbd: There might be a resync after unfreezing IO due to no disk [Bugz 332] When on-no-data-accessible is set to suspend-io, also consider that a Primary, SyncTarget node losses its connection. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:45:20 +01:00
Lars Ellenberg	725a97e43e	drbd: fix potential access of on-stack wait_queue_head_t after return I run into something declaring itself as "spinlock deadlock", BUG: spinlock lockup on CPU#1, kjournald/27816, ffff88000ad6bca0 Pid: 27816, comm: kjournald Tainted: G W 2.6.34.6 #2 Call Trace: <IRQ> [<ffffffff811ba0aa>] do_raw_spin_lock+0x11e/0x14d [<ffffffff81340fde>] _raw_spin_lock_irqsave+0x6a/0x81 [<ffffffff8103b694>] ? __wake_up+0x22/0x50 [<ffffffff8103b694>] __wake_up+0x22/0x50 [<ffffffffa07ff661>] bm_async_io_complete+0x258/0x299 [drbd] but the call traces do not fit at all, all other cpus are cpu_idle. I think it may be this race: drbd_bm_write_page wait_queue_head_t io_wait; atomic_t in_flight; bm_async_io submit_bio bm_async_io_complete if (atomic_dec_and_test(in_flight)) wait_event(io_wait, atomic_read(in_flight) == 0) return wake_up(io_wait) The wake_up now accesses the wait_queue_head_t spinlock, which is no longer valid, since the stack frame of drbd_bm_write_page has been clobbered now. Fix this by using struct completion, which does both the condition test as well as the wake_up inside its spinlock, so this race cannot happen. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:45:08 +01:00
Lars Ellenberg	06d33e968d	drbd: improve on bitmap write out timing Even though we now track the need for bitmap writeout per bitmap page, there is no need to trigger the writeout while a resync is going on. Once the resync is finished (or aborted), we trigger bitmap writeout anyways. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:43:40 +01:00
Lars Ellenberg	418e0a927d	drbd: spelling fix in log message Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:43:38 +01:00
Lars Ellenberg	7648cdfe52	drbd: be less noisy with some log messages We expect changes to a bitmap page in drbd_bm_write_page, that's why we submit a copy page. If a page changes during global writeout, that would be unexpected, and reason to warn, though. Also, often page writeout can be skipped (on activity log transactions during normal operation, for example), no need to log that everytime. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:43:37 +01:00
Lars Ellenberg	5a22db8968	drbd: serialize sending of resync uuid with pending w_send_oos To improve the latency of IO requests during bitmap exchange, we recently allowed writes while waiting for the bitmap, sending "set out-of-sync" information packets for any newly dirtied bits. We have to make sure that the new resync-uuid does not overtake these "set oos" packets. Once the resync-uuid is received, the sync target starts the resync process, and expects the bitmap to only be cleared, not re-set. If we use this protocol extension, we queue the generation and sending of the resync-uuid on the worker, which naturally serializes with all previously queued packets. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:43:35 +01:00
Lars Ellenberg	f735e36354	drbd: add debugging assert to make sure the protocol is clean We expect to only receive the recently introduced "set out of sync" packets in specific states. If we receive them in different states, that may confuse the resync process to the point where it won't terminate, or think it made negative progress. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:43:34 +01:00
Philipp Reisner	c88d65e223	drbd: Documenting drbd_should_do_remote() and drbd_should_send_oos() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:43:32 +01:00
Lars Ellenberg	2265b473ae	drbd: fix potential dereference of NULL pointer If drbd used to have crypto digest algorithms configured, then is being unconfigured (but not unloaded), it frees the algorithms, but does not reset the config. If it then is reconfigured to use the very same algorithm, it "forgot" to re-allocate the algorithms, thinking that the config has not changed in that aspect. It will then Oops on the first attempt to actually use those algorithms. Fix this by resetting the config to defaults after cleanup. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:43:30 +01:00
Lars Ellenberg	02851e9f00	drbd: move bitmap write from resync_finished to after_state_change We must not call it directly from resync_finished, as we may be in either receiver or worker context there. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:43:29 +01:00
Lars Ellenberg	84e7c0f7d1	drbd: Removed a reference to debug macros removed long time ago Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:43:27 +01:00
Lars Ellenberg	6850c44214	drbd: get rid of unused debug code Long time ago, we had paranoia code in the bitmap that allocated one extra word, assigned a magic value, and checked on every occasion that the magic value was still unchanged. That debug code is unused, the extra long word complicates code a bit. Get rid of it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:43:26 +01:00
Lars Ellenberg	4b0715f096	drbd: allow petabyte storage on 64bit arch Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:43:24 +01:00
Lars Ellenberg	19f843aa08	drbd: bitmap keep track of changes vs on-disk bitmap When we set or clear bits in a bitmap page, also set a flag in the page->private pointer. This allows us to skip writes of unchanged pages. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:43:19 +01:00
Lars Ellenberg	95a0f10cdd	drbd: store in-core bitmap little endian, regardless of architecture Our on-disk bitmap is a little endian bitstream. Up to now, we have stored the in-core copy of that in native endian, applying byte order conversion when necessary. Instead, keep the bitmap pages little endian, as they are read from disk, and use the generic_*_le_bit family of functions. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:40 +01:00
Lars Ellenberg	7777a8ba1f	drbd: bitmap: don't count unused bits (fix non-terminating resync) We trusted the on-disk bitmap to have unused bits cleared. In case that is not true for whatever reason, and we take a code path where the unused bits don't get cleared elsewhere (bm_clear_surplus is not called), we may miscount the bits, and get confused during resync, waiting for bits to get cleared that we don't even use: the resync process would not terminate. Fix this by masking out unused bits in __bm_count_bits. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:38 +01:00
Andreas Gruenbacher	1b881ef775	drbd: Rename __inc_ap_bio_cond to may_inc_ap_bio The old name is confusing: the function does not increment anything. Also rename _inc_ap_bio_cond to inc_ap_bio_cond: there is no need for an underscore. Finally, make it clear that these functions return boolean values. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:37 +01:00
Andreas Gruenbacher	24dccabb39	drbd: Fix: drbd_bitmap_io does not return an enum determine_dev_size I guess bitmap I/O errors are supposed to cause drbd_determin_dev_size to return dev_size_error. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:35 +01:00
Andreas Gruenbacher	2c46407d24	drbd: receive_bitmap_plain: Get rid of ugly and useless enum Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:34 +01:00
Andreas Gruenbacher	f70af118e3	drbd: send_bitmap_rle_or_plain: Get rid of ugly and useless enum Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:32 +01:00
Andreas Gruenbacher	78fcbdae22	drbd: receive_bitmap: Missing free_page() on error path Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:30 +01:00
Andreas Gruenbacher	de1f8e4a0a	drbd: receive_bitmap: Avoid casting enum drbd_state_rv to int Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:29 +01:00
Andreas Gruenbacher	4114be815f	drbd: receive_bitmap: Fix the wrong return value Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:27 +01:00
Andreas Gruenbacher	f2024e7ce2	drbd: drbd_nl_disk_conf: Avoid a compiler warning Warning: comparison between ‘enum drbd_ret_code’ and ‘enum drbd_state_rv’ Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:26 +01:00
Andreas Gruenbacher	81e84650c2	drbd: Use the standard bool, true, and false keywords Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:24 +01:00
Andreas Gruenbacher	6184ea2145	drbd: This code is dead now Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:22 +01:00
Andreas Gruenbacher	bb4379464e	drbd: Another small enum drbd_state_rv cleanup Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:21 +01:00
Andreas Gruenbacher	bf885f8a67	drbd: Be more explicit about functions that return an enum drbd_state_rv Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:19 +01:00
Andreas Gruenbacher	c8b325632f	drbd: Rename enum drbd_state_ret_codes to enum drbd_state_rv Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:18 +01:00
Andreas Gruenbacher	116676ca62	drbd: Rename enum drbd_ret_codes to enum drbd_ret_code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:16 +01:00
Andreas Gruenbacher	0cf9d27e38	drbd: Get rid of unnecessary macros (2) The FAULT_ACTIVE macro just wraps the drbd_insert_fault macro for no apparent reason. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:15 +01:00
Andreas Gruenbacher	662d91a23a	drbd: Get rid of unnecessary macros (1) This macro doesn't save much code, but makes things a lot harder to read. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:13 +01:00
Andreas Gruenbacher	2f58dcfc85	drbd: Rename drbd_make_request_26 to drbd_make_request Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:11 +01:00
Andreas Gruenbacher	96756784a6	drbd: Remove left-over prototype Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:10 +01:00
Andreas Gruenbacher	cab2f74b45	drbd: Make sure that drbd_send() has sent the right number of bytes Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>	2011-03-10 11:36:08 +01:00
Lars Ellenberg	220df4d006	drbd: fix incomplete error message Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:02 +01:00
Andreas Gruenbacher	7e458c32da	drbd: Removed an unnecessary #undef Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:35:22 +01:00
Lars Ellenberg	8a3c104438	drbd: fix regression, we need to close drbd epochs during normal operation commit e2041475e6ddb081734d161f6421977323f5a9b9 drbd: Starting with protocol 96 we can allow app-IO while receiving the bitmap Contained a bad chunk that tried to optimize away drbd barriers during bitmap exchange, but accidentally dropped them for normal mode as well. Impact: depending on activity log size and access pattern, activity log extents may not be recycled in time, causeing IO to block indefinetely. Fix: skip drbd barriers only if there is no connection to send them on, or the request being completed has not been on the network at all. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:35:20 +01:00
Philipp Reisner	09b9e79793	drbd: Implemented the before-resync-source handler Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:35:18 +01:00
Philipp Reisner	2561b9c1f1	drbd: --force option for disconnect As the network connection can be lost at any time, a --force option for disconnect is just a matter of completeness. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:35:17 +01:00
Lars Ellenberg	42ff269d10	drbd: add packet_type 27 (return_code_only) to netlink api In case we ever should add an other packet type, we must not reuse 27, as that currently used for "empty" return code only replies. Document it as such. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:35:15 +01:00
Lars Ellenberg	3e3a7766c2	drbd: use kzalloc and memset(,0,) to start with clean buffers in drbd_nl Make sure we start with clean buffers to not accidentally send garbage back to userspace. Note: has not been observed; but just in case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:35:14 +01:00
Lars Ellenberg	17a93f3007	drbd: remove /proc/drbd before unregistering from netlink There still exists a (theoretical) race on module unload, where /proc/drbd may still exist, but the netlink callback has been unregistered already, allowing drbdsetup to shout without listeners, and get no reply. Reorder remove_proc_entry and unregister of netlink callback. drbdsetup first checks for existence of the proc entry, and if that is missing, won't even try to contact the module. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:35:12 +01:00
Lars Ellenberg	3da127fa88	drbd: increase module count on /proc/drbd access If someone holds /proc/drbd open, previously rmmod would "succeed" in starting the unload, but then block on remove_proc_entry, leading to a situation where the lsmod does not show drbd anymore, but /proc/drbd being still there (but no longer accessible). I'd rather have rmmod fail up front in this case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:35:11 +01:00
Philipp Reisner	c507f46f26	drbd: Removed 20 seconds upper bound for side-stepping Given low-enough network bandwidth combined with a IO pattern that hammers onto a single RS-extent, side-stepping might be necessary for much longer times. Changed the code to print a single informal message after 20 seconds, but it keeps on stepping aside forever. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:35:09 +01:00
Philipp Reisner	1fc80cf378	drbd: Becoming sync target may not happen out of < C_WF_REPORT_PARAMS This patch is acutally a necessary addendum to the patch "fix for spurious full sync (becoming sync target looked like invalidate)" Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:35:07 +01:00
Philipp Reisner	3719094ec2	drbd: Starting with protocol 96 we can allow app-IO while receiving the bitmap * C_STARTING_SYNC_S, C_STARTING_SYNC_T In these states the bitmap gets written to disk. Locking out of app-IO is done by using the drbd_queue_bitmap_io() and drbd_bitmap_io() functions these days. It is no longer necessary to lock out app-IO based on the connection state. App-IO that may come in after the BITMAP_IO flag got cleared before the state transition to C_SYNC_(SOURCE\|TARGET) does not get mirrored, sets a bit in the local bitmap, that is already set, therefore changes nothing. * C_WF_BITMAP_S In this state we send updates (P_OUT_OF_SYNC packets). With that we make sure they have the same number of bits when going into the C_SYNC_(SOURCE\|TARGET) connection state. * C_UNCONNECTED: The receiver starts, no need to lock out IO. * C_DISCONNECTING: in drbd_disconnect() we had a wait_event() to wait until ap_bio_cnt reaches 0. Removed that. * C_TIMEOUT, C_BROKEN_PIPE, C_NETWORK_FAILURE C_PROTOCOL_ERROR, C_TEAR_DOWN: Same as C_DISCONNECTING * C_WF_REPORT_PARAMS: IO still possible since that is still like C_WF_CONNECTION. And we do not need to send barriers in C_WF_BITMAP_S connection state. Allow concurrent accesses to the bitmap when receiving the bitmap. Everything gets ORed anyways. A drbd_free_tl_hash() is in after_state_chg_work(). At that point all the work items of the last connections must have been processed. Introduced a call to drbd_free_tl_hash() into drbd_free_mdev() for paranoia reasons. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:35:06 +01:00
Philipp Reisner	ab17b68f45	drbd: Improvements in sanitize_state() The relevant change is that the state change to C_FW_BITMAP_S should implicitly change pdsk to C_CONSISTENT. (Think of it as C_OUTDATED, only without the guarantee that the peer has the outdated written to its meta data) At that opportunity I restructured the switch statement so that it gets evaluated every time. (Has declarative character) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:35:04 +01:00
Philipp Reisner	22afd7ee94	drbd: Fixed race condition in drbd_queue_bitmap_io May only test for ap_bio_cnt == 0 under req_lock. It can increase only under req_lock. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:35:03 +01:00
Philipp Reisner	8869d683b7	drbd: Fixed inc_ap_bio() The condition must be checked after perpare_to_wait(). The old implementaion could loose wakeup events. Never observed in real life. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:35:01 +01:00
Philipp Reisner	127b317844	drbd: use test_and_set_bit() to decide if bm_io_work should be queued Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:34:59 +01:00
Philipp Reisner	aeda1cd6a5	drbd: Begin to account BIO processing time before inc_ap_bio() Since inc_ap_bio() might sleep already Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:34:57 +01:00
Philipp Reisner	f91ab6282d	drbd: Implemented side-stepping in drbd_res_begin_io() Before: drbd_rs_begin_io() locked app-IO out of an RS extent, and waited then until all previous app-IO in that area finished. (But not only until the disk-IO was finished but until the barrier/epoch ack came in for that == round trip time latency ++) After: As soon as a new app-IO waits wants to start new IO on that RS extent, drbd_rs_begin_io() steps aside (clearing the BME_NO_WRITES flag again). It retries after 100ms. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:34:56 +01:00
Philipp Reisner	9d77a5fee9	drbd: Make some functions static Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:34:54 +01:00
Philipp Reisner	e3555d8545	drbd: Implemented priority inheritance for resync requests We only issue resync requests if there is no significant application IO going on. = Application IO has higher priority than resnyc IO. If application IO can not be started because the resync process locked an resync_lru entry, start the IO operations necessary to release the lock ASAP. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:34:53 +01:00
Philipp Reisner	59817f4fab	drbd: Do not cleanup resync LRU for the Ahead/Behind SyncSource/SyncTarget transitions This one should be replaced with moving this cleanup to the 'right' position. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:34:51 +01:00
Philipp Reisner	c4752ef128	drbd: When proxy's buffer drained off go into regular resync mode Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:34:49 +01:00
Philipp Reisner	73a01a18b9	drbd: New packet for Ahead/Behind mode: P_OUT_OF_SYNC Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:34:48 +01:00
Philipp Reisner	67531718d8	drbd: Implemented two new connection states Ahead/Behind In this connection mode, the ahead node no longer replicates application IO. The behind's disk becomes out dated. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:34:46 +01:00
Philipp Reisner	422028b1ca	drbd: New configuration parameters for dealing with network congestion net { on_congestion {block\|pull-ahead\|disconnect}; congestion-fill {sectors}; congestion-extents {al-extents}; } Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:34:45 +01:00
Philipp Reisner	759fbdfba6	drbd: Track the numbers of sectors in flight Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:34:43 +01:00
Lars Ellenberg	688593c5a8	drbd: Renamed write_flags_to_bio() to wire_flags_to_bio() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:34:32 +01:00
Lars Ellenberg	4896e8c1b8	drbd: restore compatibility with 32bit kernels With commit drbd: further converge progress display of resync and online-verify accidentally an u64/u64 div was introduced, causing an unresolvable symbol __udivdi3 to be reference. Actually for that division, 32bit are still suficient for now, so we can revert to unsigned long instead. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:19:13 +01:00
Lars Ellenberg	1816a2b47a	drbd: properly use max_hw_sectors to limit the our bio size To ease tracking of bios in some hash tables, we want it to not cross certain boundaries (128k, used to be 32k). We limit the maximum bio size using queue parameters. Historically some defines and variables we use there have been named max_segment_size, which was misguided. Rename them to max_bio_size, and use [blk_]queue_max_hw_sectors where appropriate. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:19:11 +01:00
Lars Ellenberg	3129b1b9ae	drbd: debug: limit nelink-broadcast of request on digest mismatch to 32k We used to be limited to 32k requests, but have increased that limit to 128k now. This part of the code can only deal with 32k, it would scramble arbitrary pages for larger requests. As it is used for debugging only anyways, it is ok to simply truncate the dumped data here. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:19:09 +01:00
Lars Ellenberg	470be44ab1	drbd: detect modification of in-flight buffers With data-integrity digest enabled, double-check on the sending side for modifications by upper layers of buffers under write back, so we can tell it appart from corruption on the "wire". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:19:08 +01:00
Lars Ellenberg	5f9915bbb8	drbd: further converge progress display of resync and online-verify Show progressbar and ETA always, with proc_details >= 1 also show the current sector position for both resync and online-verify on both nodes. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:19:06 +01:00
Lars Ellenberg	18edc0b9d7	drbd: fix potential wrap of 32bit oos:%lu display in /proc/drbd When converting bits (4k resolution, still) to kB, we shift left. If it was a large number of bits on a 32bit box (>= 4 TiB storage), we may wrap the 32bit unsigned long base type, resulting in incorrect display. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:19:04 +01:00
Lars Ellenberg	2649f0809f	drbd: use the resync controller for online-verify requests as well Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:19:03 +01:00
Lars Ellenberg	e65f440d47	drbd: factor out drbd_rs_number_requests Preparation patch to be able to use the auto-throttling resync controller for online-verify requests as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:19:01 +01:00
Lars Ellenberg	9bd28d3c90	drbd: factor out drbd_rs_controller_reset Preparation patch to be able to use the auto-throttling resync controller for online-verify requests as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:18:59 +01:00
Lars Ellenberg	439d595379	drbd: show progress bar and ETA for online-verify Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:18:58 +01:00
Lars Ellenberg	ea5442aff6	drbd: advance progress step marks for online-verify Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:18:56 +01:00
Lars Ellenberg	c6ea14dfa3	drbd: factor out advancement of resync marks for progress reporting This is in preparation to unify progress reporting of online-verify and resync requests. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:18:54 +01:00
Lars Ellenberg	de228bba67	drbd: initialize online-verify progress tracking on verify target For partial (resumed) online verify, initialize the resync step marks once we know what the online verify start sector is. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:18:53 +01:00
Lars Ellenberg	30b743a2d5	drbd: improve online-verify progress tracking For a partial (resumed) online-verify, initialize rs_total not to total bits, but to number of bits to check in this run, to match the meaning rs_total has for actual resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:18:51 +01:00
Lars Ellenberg	2652561886	drbd: only reset online-verify start sector if verify completed For network hickups during online-verify, on the next verify triggered, we by default want to resume where it left off. After any replication link interruption, there will be a (possibly empty) resync. Do not reset online-verify start sector if some resync completed, that would defeats the purpose. Only reset the start sector once a verify run is completed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:18:49 +01:00
Jens Axboe	4c63f5646e	Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core Conflicts: block/blk-core.c block/blk-flush.c drivers/md/raid1.c drivers/md/raid10.c drivers/md/raid5.c fs/nilfs2/btnode.c fs/nilfs2/mdt.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:58:35 +01:00
Jens Axboe	721a9602e6	block: kill off REQ_UNPLUG With the plugging now being explicitly controlled by the submitter, callers need not pass down unplugging hints to the block layer. If they want to unplug, it's because they manually plugged on their own - in which case, they should just unplug at will. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:52:27 +01:00
Jens Axboe	7eaceaccab	block: remove per-queue plugging Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:52:07 +01:00
Tejun Heo	3c0d206092	pktcdvd: Convert to bdops->check_events() Convert from ->media_changed() to ->check_events(). pktcdvd needs to forward all event related operations to the underlying device. Forward ->check_events() instead of ->media_changed() and inherit disk->[async_]events. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Peter Osterlund <petero2@telia.com>	2011-03-09 19:54:28 +01:00
Tejun Heo	6fac80e3aa	umem: Drop dummy ->media_changed() umem doesn't implement media changed detection and there's no need to implement dummy callback anymore. Remove it. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>	2011-03-09 19:54:28 +01:00
Tejun Heo	3a200911ad	xsysace: Convert to bdops->check_events() Convert from ->media_changed() to ->check_events(). xsysace buffers media changed state and clears it on revalidation. It will behave correctly with kernel event polling. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Grant Likely <grant.likely@secretlab.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>	2011-03-09 19:54:28 +01:00
Tejun Heo	aaa7c01546	ub: Convert to bdops->check_events() Convert from ->media_changed() to ->check_events(). ub buffers media changed state and clears it on revalidation. It will behave correctly with kernel event polling. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Pete Zaitcev <zaitcev@redhat.com>	2011-03-09 19:54:28 +01:00
Tejun Heo	4bbde77787	swim[3]: Convert to bdops->check_events() Convert from ->media_changed() to ->check_events(). Both swim and swim3 buffer media changed state and clear it on revalidation. They will behave correctly with kernel event polling. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Laurent Vivier <laurent@lvivier.info> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-03-09 19:54:28 +01:00
Tejun Heo	507daea227	dac960: Convert to bdops->check_events() Convert from ->media_changed() to ->check_events(). DAC960 media change notification seems to be one way (once set, never cleared) and will generate spurious events when polled once the condition triggers. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>	2011-03-09 19:54:28 +01:00
Tejun Heo	b1b56b93f3	paride: Convert to bdops->check_events() Convert paride drivers from ->media_changed() to ->check_events(). pcd and pd buffer and clear events after reporting; however, pf unconditionally reports MEDIA_CHANGE and will generate spurious events when polled. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Tim Waugh <tim@cyberelk.net>	2011-03-09 19:54:28 +01:00
Tejun Heo	1a8a74f03f	floppy,{ami\|ata}flop: Convert to bdops->check_events() Convert the floppy drivers from ->media_changed() to ->check_events(). Both floppy and ataflop buffer media changed state bit and clear them on revalidation and will behave correctly with kernel event polling. I can't tell how amiflop clears its event and it's possible that it may generate spurious events when polled. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>	2011-03-09 19:54:27 +01:00
Owen Smith	51de69523f	xen: Union the blkif_request request specific fields Prepare for extending the block device ring to allow request specific fields, by moving the request specific fields for reads, writes and barrier requests to a union member. Acked-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Owen Smith <owen.smith@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-08 15:07:00 -05:00
Tejun Heo	e83a46bbb1	Merge branch 'for-linus' of ../linux-2.6-block into block-for-2.6.39/core This merge creates two set of conflicts. One is simple context conflicts caused by removal of throtl_scheduled_delayed_work() in for-linus and removal of throtl_shutdown_timer_wq() in for-2.6.39/core. The other is caused by commit `255bb490c8` (block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue()) in for-linus crashing with FLUSH reimplementation in for-2.6.39/core. The conflict isn't trivial but the resolution is straight-forward. * __blk_run_queue() calls in flush_end_io() and flush_data_end_io() should be called with @force_kblockd set to %true. * elv_insert() in blk_kick_flush() should use %ELEVATOR_INSERT_REQUEUE. Both changes are to avoid invoking ->request_fn() directly from request completion path and closely match the changes in the commit `255bb490c8`. Signed-off-by: Tejun Heo <tj@kernel.org>	2011-03-04 19:09:02 +01:00
David S. Miller	0a0e9ae1bd	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x.h	2011-03-03 21:27:42 -08:00
Patrick McHardy	01a16b21d6	netlink: kill eff_cap from struct netlink_skb_parms Netlink message processing in the kernel is synchronous these days, capabilities can be checked directly in security_netlink_recv() from the current process. Signed-off-by: Patrick McHardy <kaber@trash.net> Reviewed-by: James Morris <jmorris@namei.org> [chrisw: update to include pohmelfs and uvesafb] Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-03 13:32:07 -08:00
Petr Uzel	fd51469fb6	block: kill loop_mutex Following steps lead to deadlock in kernel: dd if=/dev/zero of=img bs=512 count=1000 losetup -f img mkfs.ext2 /dev/loop0 mount -t ext2 -o loop /dev/loop0 mnt umount mnt/ Stacktrace: [<c102ec04>] irq_exit+0x36/0x59 [<c101502c>] smp_apic_timer_interrupt+0x6b/0x75 [<c127f639>] apic_timer_interrupt+0x31/0x38 [<c101df88>] mutex_spin_on_owner+0x54/0x5b [<fe2250e9>] lo_release+0x12/0x67 [loop] [<c10c4eae>] __blkdev_put+0x7c/0x10c [<c10a4da5>] fput+0xd5/0x1aa [<fe2250cf>] loop_clr_fd+0x1a9/0x1b1 [loop] [<fe225110>] lo_release+0x39/0x67 [loop] [<c10c4eae>] __blkdev_put+0x7c/0x10c [<c10a59d9>] deactivate_locked_super+0x17/0x36 [<c10b6f37>] sys_umount+0x27e/0x2a5 [<c10b6f69>] sys_oldumount+0xb/0xe [<c1002897>] sysenter_do_call+0x12/0x26 [<ffffffff>] 0xffffffff Regression since `2a48fc0ab2`, which introduced the private loop_mutex as part of the BKL removal process. As per [1], the mutex can be safely removed. [1] http://www.gossamer-threads.com/lists/linux/kernel/1341930 Addresses: https://bugzilla.novell.com/show_bug.cgi?id=669394 Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=29172 Signed-off-by: Petr Uzel <petr.uzel@suse.cz> Cc: stable@kernel.org Reviewed-by: Nikanth Karthikesan <knikanth@suse.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-03 11:53:25 -05:00
Vivek Goyal	cd25f54961	loop: No need to initialize ->queue_lock explicitly before calling blk_cleanup_queue() Now we initialize ->queue_lock at queue allocation time so driver does not have to worry about initializing it before calling blk_cleanup_queue(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-02 19:06:49 -05:00
Grant Likely	1c48a5c93d	dt: Eliminate of_platform_{,un}register_driver Final step to eliminate of_platform_bus_type. They're all just platform drivers now. v2: fix type in pasemi_nand.c (thanks to Stephen Rothwell) Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2011-02-28 13:22:46 -07:00
Linus Torvalds	638691a7a4	Merge branch 'for-linus' of git://neil.brown.name/md * 'for-linus' of git://neil.brown.name/md: md: Fix - again - partition detection when array becomes active Fix over-zealous flush_disk when changing device size. md: avoid spinlock problem in blk_throtl_exit md: correctly handle probe of an 'mdp' device. md: don't set_capacity before array is active. md: Fix raid1->raid0 takeover	2011-02-25 11:13:26 -08:00
Stefano Stabellini	c80a420995	xen-blkfront: handle Xen major numbers other than XENVBD This patch makes sure blkfront handles correctly virtual device numbers corresponding to Xen emulated IDE and SCSI disks: in those cases blkfront translates the major number to XENVBD and the minor number to a low xvd minor. Note: this behaviour is different from what old xenlinux PV guests used to do: they used to steal an IDE or SCSI major number and use it instead. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>	2011-02-25 16:43:05 +00:00
NeilBrown	93b270f76e	Fix over-zealous flush_disk when changing device size. There are two cases when we call flush_disk. In one, the device has disappeared (check_disk_change) so any data will hold becomes irrelevant. In the oter, the device has changed size (check_disk_size_change) so data we hold may be irrelevant. In both cases it makes sense to discard any 'clean' buffers, so they will be read back from the device if needed. In the former case it makes sense to discard 'dirty' buffers as there will never be anywhere safe to write the data. In the second case it doesnot* make sense to discard dirty buffers as that will lead to file system corruption when you simply enlarge the containing devices. flush_disk calls __invalidate_devices. __invalidate_device calls both invalidate_inodes and invalidate_bdev. invalidate_inodes does discard I_DIRTY inodes and this does lead to fs corruption. invalidate_bev doesnot* discard dirty pages, but I don't really care about that at present. So this patch adds a flag to __invalidate_device (calling it __invalidate_device2) to indicate whether dirty buffers should be killed, and this is passed to invalidate_inodes which can choose to skip dirty inodes. flusk_disk then passes true from check_disk_change and false from check_disk_size_change. dm avoids tripping over this problem by calling i_size_write directly rathher than using check_disk_size_change. md does use check_disk_size_change and so is affected. This regression was introduced by commit `608aeef17a` which causes check_disk_size_change to call flush_disk, so it is suitable for any kernel since 2.6.27. Cc: stable@kernel.org Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Andrew Patterson <andrew.patterson@hp.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NeilBrown <neilb@suse.de>	2011-02-24 17:25:47 +11:00
Jiri Kosina	0a9d59a246	Merge branch 'master' into for-next	2011-02-15 10:24:31 +01:00
Soren Hansen	de1f016f88	nbd: remove module-level ioctl mutex Commit `2a48fc0ab2` ("block: autoconvert trivial BKL users to private mutex") replaced uses of the BKL in the nbd driver with mutex operations. Since then, I've been been seeing these lock ups: INFO: task qemu-nbd:16115 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. qemu-nbd D 0000000000000001 0 16115 16114 0x00000004 ffff88007d775d98 0000000000000082 ffff88007d775fd8 ffff88007d774000 0000000000013a80 ffff8800020347e0 ffff88007d775fd8 0000000000013a80 ffff880133730000 ffff880002034440 ffffea0004333db8 ffffffffa071c020 Call Trace: [<ffffffff815b9997>] __mutex_lock_slowpath+0xf7/0x180 [<ffffffff815b93eb>] mutex_lock+0x2b/0x50 [<ffffffffa071a21c>] nbd_ioctl+0x6c/0x1c0 [nbd] [<ffffffff812cb970>] blkdev_ioctl+0x230/0x730 [<ffffffff811967a1>] block_ioctl+0x41/0x50 [<ffffffff81175c03>] do_vfs_ioctl+0x93/0x370 [<ffffffff81175f61>] sys_ioctl+0x81/0xa0 [<ffffffff8100c0c2>] system_call_fastpath+0x16/0x1b Instrumenting the nbd module's ioctl handler with some extra logging clearly shows the NBD_DO_IT ioctl being invoked which is a long-lived ioctl in the sense that it doesn't return until another ioctl asks the driver to disconnect. However, that other ioctl blocks, waiting for the module-level mutex that replaced the BKL, and then we're stuck. This patch removes the module-level mutex altogether. It's clearly wrong, and as far as I can see, it's entirely unnecessary, since the nbd driver maintains per-device mutexes, and I don't see anything that would require a module-level (or kernel-level, for that matter) mutex. Signed-off-by: Soren Hansen <soren@linux2go.dk> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Paul Clements <paul.clements@steeleye.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@kernel.org> [2.6.37.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-11 16:12:20 -08:00
Justin P. Mattock	8e572bab39	fix typos 'comamnd' -> 'command' in comments Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>	2011-02-02 11:31:21 +01:00
Stephen M. Cameron	68264e9d67	cciss: make cciss_revalidate not loop through CISS_MAX_LUNS volumes unnecessarily. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-01-19 08:25:02 -07:00
Tracey Dent	a0700bdd0b	drivers/block/aoe/Makefile: replace the use of <module>-objs with <module>-y Change Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and should now be switched. According to (documentation/kbuild/makefiles.txt). Signed-off-by: Tracey Dent <tdent48227@gmail.com> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-01-19 08:25:02 -07:00
Sergey Senozhatsky	ee71a96867	loop: queue_lock NULL pointer derefence in blk_throtl_exit Performing $ sudo mount -o loop -o umask=0 /dev/sdb1 /mnt/ mount: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg \| tail or so $ sudo modprobe -r loop results in oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [<ffffffff812479d4>] do_raw_spin_lock+0x14/0x122 Process modprobe (pid: 6189, threadinfo ffff88009a898000, task ffff880154a88000) Call Trace: [<ffffffff81486788>] _raw_spin_lock_irq+0x4a/0x51 [<ffffffff8123404b>] ? blk_throtl_exit+0x3b/0xa0 [<ffffffff8105b120>] ? cancel_delayed_work_sync+0xd/0xf [<ffffffff8123404b>] blk_throtl_exit+0x3b/0xa0 [<ffffffff81229bc8>] blk_release_queue+0x21/0x65 [<ffffffff8123bb06>] kobject_release+0x51/0x66 [<ffffffff8123bab5>] ? kobject_release+0x0/0x66 [<ffffffff8123ce1e>] kref_put+0x43/0x4d [<ffffffff8123ba27>] kobject_put+0x47/0x4b [<ffffffff8122717c>] blk_cleanup_queue+0x56/0x5b [<ffffffffa01c3824>] loop_exit+0x68/0x844 [loop] [<ffffffff8107cccc>] sys_delete_module+0x1e8/0x25b [<ffffffff814864c9>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81002112>] system_call_fastpath+0x16/0x1b because of an attempt to acquire NULL queue_lock. I added the same lines as in blk_queue_make_request - index 44e18c0..49e6a54 100644`fall back to embedded per-queue lock'. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-01-19 08:25:02 -07:00
Tracey Dent	04de96c9c6	drivers/block/Makefile: replace the use of <module>-objs with <module>-y Change Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and should now be switched. According to (documentation/kbuild/makefiles.txt). Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-01-19 08:25:02 -07:00
Linus Torvalds	7b0cb1bdac	Merge branch 'for-2.6.38/drivers' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.38/drivers' of git://git.kernel.dk/linux-2.6-block: cciss: reinstate proper FIFO order of command queue list floppy: replace NO_GEOM macro with a function	2011-01-13 10:50:24 -08:00
Linus Torvalds	275220f0fc	Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits) block: ensure that completion error gets properly traced blktrace: add missing probe argument to block_bio_complete block cfq: don't use atomic_t for cfq_group block cfq: don't use atomic_t for cfq_queue block: trace event block fix unassigned field block: add internal hd part table references block: fix accounting bug on cross partition merges kref: add kref_test_and_get bio-integrity: mark kintegrityd_wq highpri and CPU intensive block: make kblockd_workqueue smarter Revert "sd: implement sd_check_events()" block: Clean up exit_io_context() source code. Fix compile warnings due to missing removal of a 'ret' variable fs/block: type signature of major_to_index(int) to major_to_index(unsigned) block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p) cfq-iosched: don't check cfqg in choose_service_tree() fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors cdrom: export cdrom_check_events() sd: implement sd_check_events() sr: implement sr_check_events() ...	2011-01-13 10:45:01 -08:00
Linus Torvalds	a170315420	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix cleanup when trying to mount inexistent image net/ceph: make ceph_msgr_wq non-reentrant ceph: fsc->*_wq's aren't used in memory reclaim path ceph: Always free allocated memory in osdmap_decode() ceph: Makefile: Remove unnessary code ceph: associate requests with opening sessions ceph: drop redundant r_mds field ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS ceph: add dir_layout to inode	2011-01-13 10:25:24 -08:00
Yehuda Sadeh	766fc43973	rbd: fix cleanup when trying to mount inexistent image Previously we didn't clean up the sysfs entry that was just created. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:18 -08:00
Linus Torvalds	94d4c4cd56	Merge branch 'stable/xenbus' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/xenbus' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/xenbus: making backend support modular is too complex xen/pci: Make xen-pcifront be dependent on XEN_XENBUS_FRONTEND xen/xenbus: fixup checkpatch issues in xenbus_probe* xen/netfront: select XEN_XENBUS_FRONTEND xen/xenbus: clean up noise in xenbus_probe_frontend.c xen/xenbus: clean up noise in xenbus_probe_backend.c xen/xenbus: clean up noise in xenbus_probe.c xen/xenbus: cleanup debug noise in xenbus_comms.c xen/xenbus: clean up error handling xen/xenbus: make frontend bus GPL xen/xenbus: make sure backend bus is registered earlier xenbus/frontend: register bus earlier xen: remove xen/evtchn.h xen: add backend driver support xen: separate out frontend xenbus	2011-01-12 08:37:35 -08:00
Jens Axboe	e6e1ee936d	cciss: reinstate proper FIFO order of command queue list Commit `8a3173de` inadvertently changed the ordering when switching to hlists. Change to regular list heads so we can use tail list adds, this improves performance. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-01-10 21:50:33 +01:00
Linus Torvalds	23d69b09b7	Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (33 commits) usb: don't use flush_scheduled_work() speedtch: don't abuse struct delayed_work media/video: don't use flush_scheduled_work() media/video: explicitly flush request_module work ioc4: use static work_struct for ioc4_load_modules() init: don't call flush_scheduled_work() from do_initcalls() s390: don't use flush_scheduled_work() rtc: don't use flush_scheduled_work() mmc: update workqueue usages mfd: update workqueue usages dvb: don't use flush_scheduled_work() leds-wm8350: don't use flush_scheduled_work() mISDN: don't use flush_scheduled_work() macintosh/ams: don't use flush_scheduled_work() vmwgfx: don't use flush_scheduled_work() tpm: don't use flush_scheduled_work() sonypi: don't use flush_scheduled_work() hvsi: don't use flush_scheduled_work() xen: don't use flush_scheduled_work() gdrom: don't use flush_scheduled_work() ... Fixed up trivial conflict in drivers/media/video/bt8xx/bttv-input.c as per Tejun.	2011-01-07 16:58:04 -08:00
Ian Campbell	2de06cc1f1	xen: separate out frontend xenbus Impact: refactor Make a distinct frontend xenbus, in preparation for adding a backend xenbus. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> [corresponds to 2fd433a4188f in git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git with adjustments to reflect changes in the code which is moved] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-01-05 16:29:17 -05:00
David S. Miller	17f7f4d9fc	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/ipv4/fib_frontend.c	2010-12-26 22:37:05 -08:00
Tejun Heo	30d65030fd	xen: don't use flush_scheduled_work() flush_scheduled_work() is deprecated and scheduled to be removed. Directly flush info->work instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-12-24 15:59:06 +01:00
Tejun Heo	8aa0f41384	floppy: don't use flush_scheduled_work() flush_scheduled_work() is deprecated and scheduled to be removed. Directly flush floppy_work instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk>	2010-12-24 15:59:06 +01:00
Linus Torvalds	453434cf3f	Fix build error in drivers/block/cciss.c .. caused by a missing semi-colon, introduced in commit `0fc13c8995` ("cciss: fix cciss_revalidate panic"). Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Thiago Farina <tfransosi@gmail.com> Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-12-20 21:21:49 -08:00
Linus Torvalds	7f8635cc9e	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: cciss: fix cciss_revalidate panic block: max hardware sectors limit wrapper block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead blk-throttle: Correct the placement of smp_rmb() blk-throttle: Trim/adjust slice_end once a bio has been dispatched block: check for proper length of iov entries earlier in blk_rq_map_user_iov() drbd: fix for spin_lock_irqsave in endio callback drbd: don't recvmsg with zero length	2010-12-20 09:19:46 -08:00
Jens Axboe	3603b8eacc	Fix compile warnings due to missing removal of a 'ret' variable Commit `a8adbe3` forgot to remove the return variable, kill it. drivers/block/loop.c: In function 'lo_splice_actor': drivers/block/loop.c:398: warning: unused variable 'ret' [...] fs/nfsd/vfs.c: In function 'nfsd_splice_actor': fs/nfsd/vfs.c:848: warning: unused variable 'ret' Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-12-20 09:15:19 +01:00
Stephen M. Cameron	0fc13c8995	cciss: fix cciss_revalidate panic If you delete a logical drive, and then run BLKRRPART (e.g. via fdisk) on a logical drive which is "after" the deleted logical drive in the h->drv[] array, then cciss_revalidate panics because it will access the null pointer h->drv[x] when x hits the deleted drive. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-12-17 09:01:37 +01:00
Michał Mirosław	a8adbe378b	fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors This patch pulls calls to buf->ops->confirm() from all actors passed (also indirectly) to splice_from_pipe_feed(). Is avoiding the call to buf->ops->confirm() while splice()ing to /dev/null is an intentional optimization? No other user does that and this will remove this special case. Against current linux.git `6313e3c217`. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-12-17 08:56:44 +01:00
Jeremy Fitzhardinge	667c78afae	xen: Provide a variant of __RING_SIZE() that is an integer constant expression Without this, gcc 4.5 won't compile xen-netfront and xen-blkfront, where this is being used to specify array sizes. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: David Miller <davem@davemloft.net> Cc: Stable Kernel <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-12-15 12:34:28 -08:00
Linus Torvalds	04ed0978d5	Merge branch 'rbd-sysfs' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'rbd-sysfs' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: replace the rbd sysfs interface	2010-12-02 08:05:22 -08:00
Yehuda Sadeh	dfc5606dc5	rbd: replace the rbd sysfs interface The new interface creates directories per mapped image and under each it creates a subdir per available snapshot. This allows keeping a cleaner interface within the sysfs guidelines. The ABI documentation was updated too. Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-01 15:53:22 -08:00
Lars Ellenberg	a115413de1	drbd: fix for spin_lock_irqsave in endio callback In commit 9b7f76dc37919ea36caa9680a3f765e5b19b25fb, Author: Lars Ellenberg <lars.ellenberg@linbit.com> Date: Wed Aug 11 23:40:24 2010 +0200 drbd: new configuration parameter c-min-rate a bad chunk slipped through, which is now reverted as well, restoring the correct irqsave for the endio callback. This patch also add comments at both req_mod() and in the endio callback so it should not happen again. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-11-27 19:50:43 +01:00
Lars Ellenberg	c13f7e1a94	drbd: don't recvmsg with zero length This should fix a performance degradation we observed recently. If we don't expect any subheader, we should not call into the tcp stack, as that may add considerable latency if there is no data available at this point. For a synthetic synchronous write load with single outstanding writes, this additional latency when processing the "unplug remote" packet added up to a performance degradation factor >= 10. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-11-27 19:50:43 +01:00
Jens Axboe	f30195c502	Merge branch 'cleanup-bd_claim' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into for-2.6.38/core	2010-11-27 19:49:18 +01:00
Linus Torvalds	78daa87b1d	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: cciss: fix build for PROC_FS disabled block: fix amiga and atari floppy driver compile warning blk-throttle: Fix calculation of max number of WRITES to be dispatched ioprio: grab rcu_read_lock in sys_ioprio_{set,get}() xen/blkfront: cope with backend that fail empty BLKIF_OP_WRITE_BARRIER requests xen/blkfront: Implement FUA with BLKIF_OP_WRITE_BARRIER xen/blkfront: change blk_shadow.request to proper pointer xen/blkfront: map REQ_FLUSH into a full barrier	2010-11-27 07:17:50 +09:00
Arnd Bergmann	451a3c24b0	BKL: remove extraneous #include <smp_lock.h> The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-17 08:59:32 -08:00
Jens Axboe	bbe425cd9a	cciss: fix build for PROC_FS disabled The recent patch to fix the removal of a non-existing proc directory introduced this build problem for !CONFIG_PROC_FS: drivers/block/cciss.c:4929: error: 'proc_cciss' undeclared (first use in this function) Fix it by moving proc_cciss outside of the CONFIG_PROC_FS scope. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-11-17 11:56:13 +01:00
Jeff Garzik	f281233d3e	SCSI host lock push-down Move the mid-layer's ->queuecommand() invocation from being locked with the host lock to being unlocked to facilitate speeding up the critical path for drivers who don't need this lock taken anyway. The patch below presents a simple SCSI host lock push-down as an equivalent transformation. No locking or other behavior should change with this patch. All existing bugs and locking orders are preserved. Additionally, add one parameter to queuecommand, struct Scsi_Host * and remove one parameter from queuecommand, void (done)(struct scsi_cmnd ) Scsi_Host* is a convenient pointer that most host drivers need anyway, and 'done' is redundant to struct scsi_cmnd->scsi_done. Minimal code disturbance was attempted with this change. Most drivers needed only two one-line modifications for their host lock push-down. Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Acked-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-16 13:33:23 -08:00
Vivek Goyal	3e9bb2a071	block: fix amiga and atari floppy driver compile warning Geert, my crosstool don't produce warning below. I guess this has to do something with compiler version. - Geert noticed following warning during compilation. drivers/block/amiflop.c:1344: warning: ‘rq’ may be used uninitialized in this function drivers/block/ataflop.c:1402: warning: ‘rq’ may be used uninitialized in this function - Initialize rq to NULL to fix the warning. If we can't find a suitable request to dispatch, this function should return NULL instead of a possibly garbage pointer. - Cross compile tested only. Don't have hardware to test it. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-11-15 19:32:43 +01:00
David S. Miller	c25ecd0a21	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2010-11-14 11:57:05 -08:00
Tejun Heo	d4d7762995	block: clean up blkdev_get() wrappers and their users After recent blkdev_get() modifications, open_by_devnum() and open_bdev_exclusive() are simple wrappers around blkdev_get(). Replace them with blkdev_get_by_dev() and blkdev_get_by_path(). blkdev_get_by_dev() is identical to open_by_devnum(). blkdev_get_by_path() is slightly different in that it doesn't automatically add %FMODE_EXCL to @mode. All users are converted. Most conversions are mechanical and don't introduce any behavior difference. There are several exceptions. * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no reason to OR it explicitly on blkdev_put(). * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in sb->s_mode. * With the above changes, sb->s_mode now always should contain FMODE_EXCL. WARN_ON_ONCE() added to kill_block_super() to detect errors. The new blkdev_get_*() functions are with proper docbook comments. While at it, add function description to blkdev_get() too. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Joern Engel <joern@lazybastard.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: reiserfs-devel@vger.kernel.org Cc: xfs-masters@oss.sgi.com Cc: Alexander Viro <viro@zeniv.linux.org.uk>	2010-11-13 11:55:18 +01:00
Tejun Heo	e525fd89d3	block: make blkdev_get/put() handle exclusive access Over time, block layer has accumulated a set of APIs dealing with bdev open, close, claim and release. * blkdev_get/put() are the primary open and close functions. * bd_claim/release() deal with exclusive open. * open/close_bdev_exclusive() are combination of open and claim and the other way around, respectively. * bd_link/unlink_disk_holder() to create and remove holder/slave symlinks. * open_by_devnum() wraps bdget() + blkdev_get(). The interface is a bit confusing and the decoupling of open and claim makes it impossible to properly guarantee exclusive access as in-kernel open + claim sequence can disturb the existing exclusive open even before the block layer knows the current open if for another exclusive access. Reorganize the interface such that, * blkdev_get() is extended to include exclusive access management. @holder argument is added and, if is @FMODE_EXCL specified, it will gain exclusive access atomically w.r.t. other exclusive accesses. * blkdev_put() is similarly extended. It now takes @mode argument and if @FMODE_EXCL is set, it releases an exclusive access. Also, when the last exclusive claim is released, the holder/slave symlinks are removed automatically. * bd_claim/release() and close_bdev_exclusive() are no longer necessary and either made static or removed. * bd_link_disk_holder() remains the same but bd_unlink_disk_holder() is no longer necessary and removed. * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev() and blkdev_get(). It also has an unexpected extra bdev_read_only() test which probably should be moved into blkdev_get(). * open_by_devnum() is modified to take @holder argument and pass it to blkdev_get(). Most of bdev open/close operations are unified into blkdev_get/put() and most exclusive accesses are tested atomically at the open time (as it should). This cleans up code and removes some, both valid and invalid, but unnecessary all the same, corner cases. open_bdev_exclusive() and open_by_devnum() can use further cleanup - rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop special features. Well, let's leave them for another day. Most conversions are straight-forward. drbd conversion is a bit more involved as there was some reordering, but the logic should stay the same. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Brown <neilb@suse.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Philipp Reisner <philipp.reisner@linbit.com> Cc: Peter Osterlund <petero2@telia.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Alex Elder <aelder@sgi.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: dm-devel@redhat.com Cc: drbd-dev@lists.linbit.com Cc: Leo Chen <leochen@broadcom.com> Cc: Scott Branden <sbranden@broadcom.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Joern Engel <joern@logfs.org> Cc: reiserfs-devel@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk>	2010-11-13 11:55:17 +01:00
Linus Torvalds	8a9f772c14	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: (27 commits) block: remove unused copy_io_context() Documentation: remove anticipatory scheduler info block: remove REQ_HARDBARRIER ioprio: rcu_read_lock/unlock protect find_task_by_vpid call (V2) ioprio: fix RCU locking around task dereference block: ioctl: fix information leak to userland block: read i_size with i_size_read() cciss: fix proc warning on attempt to remove non-existant directory bio: take care not overflow page count when mapping/copying user data block: limit vec count in bio_kmalloc() and bio_alloc_map_data() block: take care not to overflow when calculating total iov length block: check for proper length of iov entries in blk_rq_map_user_iov() cciss: remove controllers supported by hpsa cciss: use usleep_range not msleep for small sleeps cciss: limit commands allocated on reset_devices cciss: Use kernel provided PCI state save and restore functions cciss: fix board status waiting code drbd: Removed checks for REQ_HARDBARRIER on incomming BIOs drbd: REQ_HARDBARRIER -> REQ_FUA transition for meta data accesses drbd: Removed the BIO_RW_BARRIER support form the receiver/epoch code ...	2010-11-12 08:52:47 -08:00
Jens Axboe	1ff5125fb8	Merge branch 'upstream/blkfront' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen into for-linus Conflicts: drivers/block/xen-blkfront.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-11-12 08:47:04 +01:00
Christoph Hellwig	02e031cbc8	block: remove REQ_HARDBARRIER REQ_HARDBARRIER is dead now, so remove the leftovers. What's left at this point is: - various checks inside the block layer. - sanity checks in bio based drivers. - now unused bio_empty_barrier helper. - Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while, but Xen really needs to sort out it's barrier situaton. - setting of ordered tags in uas - dead code copied from old scsi drivers. - scsi different retry for barriers - it's dead and should have been removed when flushes were converted to FS requests. - blktrace handling of barriers - removed. Someone who knows blktrace better should add support for REQ_FLUSH and REQ_FUA, though. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-11-10 14:54:09 +01:00
Jens Axboe	00e375e7e9	Merge branch 'for-2.6.37/drivers' into for-linus Conflicts: drivers/block/cciss.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-11-10 14:51:27 +01:00
Mike Snitzer	77304d2aba	block: read i_size with i_size_read() Convert direct reads of an inode's i_size to using i_size_read(). i_size_{read,write} use a seqcount to protect reads from accessing incomple writes. Concurrent i_size_write()s require mutual exclussion to protect the seqcount that is used by i_size_{read,write}. But i_size_read() callers do not need to use additional locking. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: NeilBrown <neilb@suse.de> Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-11-10 14:40:53 +01:00
Jens Axboe	90fdb0b98a	cciss: fix proc warning on attempt to remove non-existant directory Randy reports that he gets the following stack trace when removing the cciss module: [ 109.164277] Pid: 3463, comm: rmmod Not tainted 2.6.37-rc1 #7 [ 109.164280] Call Trace: [ 109.164292] [<ffffffff8107eb8d>] warn_slowpath_common+0xc6/0xf3 [ 109.164299] [<ffffffff8107ecaa>] warn_slowpath_fmt+0x5b/0x6b [ 109.164307] [<ffffffff8155175b>] ? _raw_spin_unlock+0x40/0x4b [ 109.164313] [<ffffffff8123dd1e>] remove_proc_entry+0x156/0x35e [ 109.164320] [<ffffffff812cd91b>] ? do_raw_spin_unlock+0xff/0x10f [ 109.164327] [<ffffffff8113823d>] ? trace_hardirqs_on+0x10/0x4a [ 109.164333] [<ffffffff8155162d>] ? _raw_spin_unlock_irq+0x4c/0x7b [ 109.164339] [<ffffffff8154d4d1>] ? wait_for_common+0x145/0x15e [ 109.164345] [<ffffffff81075337>] ? default_wake_function+0x0/0x22 [ 109.164357] [<ffffffffa0615a8f>] cciss_cleanup+0xa9/0xc7 [cciss] [ 109.164365] [<ffffffff810d3cb0>] sys_delete_module+0x2d6/0x368 [ 109.164371] [<ffffffff8155036b>] ? lockdep_sys_exit_thunk+0x35/0x67 [ 109.164377] [<ffffffff810fdfaf>] ? audit_syscall_entry+0x172/0x1a5 [ 109.164383] [<ffffffff815502f5>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 109.164389] [<ffffffff8100ea72>] system_call_fastpath+0x16/0x1b [ 109.164394] ---[ end trace 88e8568246ed0b1d ]--- which will happen if you don't actually have an HP CISS adapter, since it'll do an uncondional removal of a proc directory it never attempted to create in that case. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Tested-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-11-10 14:40:52 +01:00
Eric Dumazet	840a185ddd	aoe: remove dev_base_lock use from aoecmd_cfg_pkts() dev_base_lock is the legacy way to lock the device list, and is planned to disappear. (writers hold RTNL, readers hold RCU lock) Convert aoecmd_cfg_pkts() to RCU locking. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-08 13:50:07 -08:00
Pekka Enberg	2b51dca79a	floppy: replace NO_GEOM macro with a function This patch replaces the NO_GEOM macro with a proper static inline function and converts an open-coded caller in check_floppy_change() to use it. Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-11-08 14:44:34 +01:00
Vivek Goyal	d017bf6b4f	floppy: fix another use-after-free While scanning the floopy code due to `c093ee4f07` ("floppy: fix use-after-free in module load failure path"), I found one more instance of trying to access disk->queue pointer after doing put_disk() on gendisk. For some reason , floppy moule still loads/unloads fine. The object is probably still around with right pointer values. o There seems to be one more instance of trying to cleanup the request queue after we have called put_disk() on associated gendisk. o This fix is more out of code inspection. Even without this fix for some reason I am able to load/unload floppy module without any issues. o Floppy module loads/unloads fine after the fix. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-06 07:49:56 -07:00
Linus Torvalds	c093ee4f07	floppy: fix use-after-free in module load failure path Commit `488211844e` ("floppy: switch to one queue per drive instead of sharing a queue") introduced a use-after-free. We do "put_disk()" on the disk device _before_ we then clean up the queue associated with that disk. Move the put_disk() down to avoid dereferencing a free'd data structure. Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Vivek Goyal <vgoyal@redhat.com> Reported-and-tested-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-05 17:45:59 -07:00
Jeremy Fitzhardinge	dcb8baecea	xen/blkfront: cope with backend that fail empty BLKIF_OP_WRITE_BARRIER requests Some(?) Xen block backends fail BLKIF_OP_WRITE_BARRIER requests, which Linux uses as a cache flush operation. In that case, disable use of FLUSH. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Daniel Stodden <daniel.stodden@citrix.com>	2010-11-02 13:46:46 -04:00
Jeremy Fitzhardinge	be2f8373c1	xen/blkfront: Implement FUA with BLKIF_OP_WRITE_BARRIER The BLKIF_OP_WRITE_BARRIER is a full ordered barrier, so we can use it to implement FUA as well as a plain FLUSH. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Christoph Hellwig <hch@lst.de>	2010-11-02 11:27:59 -04:00
Jeremy Fitzhardinge	a945b9801a	xen/blkfront: change blk_shadow.request to proper pointer Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-11-02 11:27:58 -04:00
Jeremy Fitzhardinge	c64e38ea17	xen/blkfront: map REQ_FLUSH into a full barrier Implement a flush as a full barrier, since we have nothing weaker. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Christoph Hellwig <hch@lst.de>	2010-11-02 10:43:51 -04:00
Linus Torvalds	18cb657ca1	Merge branch 'stable/xen-pcifront-0.8.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen and branch 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm * 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm: xen: register xen pci notifier xen: initialize cpu masks for pv guests in xen_smp_init xen: add a missing #include to arch/x86/pci/xen.c xen: mask the MTRR feature from the cpuid xen: make hvc_xen console work for dom0. xen: add the direct mapping area for ISA bus access xen: Initialize xenbus for dom0. xen: use vcpu_ops to setup cpu masks xen: map a dummy page for local apic and ioapic in xen_set_fixmap xen: remap MSIs into pirqs when running as initial domain xen: remap GSIs as pirqs when running as initial domain xen: introduce XEN_DOM0 as a silent option xen: map MSIs into pirqs xen: support GSI -> pirq remapping in PV on HVM guests xen: add xen hvm acpi_register_gsi variant acpi: use indirect call to register gsi in different modes xen: implement xen_hvm_register_pirq xen: get the maximum number of pirqs from xen xen: support pirq != irq * 'stable/xen-pcifront-0.8.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (27 commits) X86/PCI: Remove the dependency on isapnp_disable. xen: Update Makefile with CONFIG_BLOCK dependency for biomerge.c MAINTAINERS: Add myself to the Xen Hypervisor Interface and remove Chris Wright. x86: xen: Sanitse irq handling (part two) swiotlb-xen: On x86-32 builts, select SWIOTLB instead of depending on it. MAINTAINERS: Add myself for Xen PCI and Xen SWIOTLB maintainer. xen/pci: Request ACS when Xen-SWIOTLB is activated. xen-pcifront: Xen PCI frontend driver. xenbus: prevent warnings on unhandled enumeration values xenbus: Xen paravirtualised PCI hotplug support. xen/x86/PCI: Add support for the Xen PCI subsystem x86: Introduce x86_msi_ops msi: Introduce default_[teardown\|setup]_msi_irqs with fallback. x86/PCI: Export pci_walk_bus function. x86/PCI: make sure _PAGE_IOMAP it set on pci mappings x86/PCI: Clean up pci_cache_line_size xen: fix shared irq device passthrough xen: Provide a variant of xen_poll_irq with timeout. xen: Find an unbound irq number in reverse order (high to low). xen: statically initialize cpu_evtchn_mask_p ... Fix up trivial conflicts in drivers/pci/Makefile	2010-10-28 17:11:17 -07:00
Mike Miller	6fa9775208	cciss: remove overlapping PCI IDs This patch removes the controller overlap between cciss and hpsa. It was decided that no overlap should exist. All new controllers will use the hpsa SCSI based driver. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-28 06:33:27 -06:00
Vasiliy Kulikov	7ab5118d7c	block: cciss: fix information leak to userland Structure IOCTL_Command_struct is copied to userland with some padding fields at the end of the struct unitialized. It leads to leaking of contents of kernel stack memory. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-28 06:31:55 -06:00
Andrew Morton	027b180d74	drivers/block/aoe/aoeblk.c: ratelimit a warning printk As described in https://bugzilla.kernel.org/show_bug.cgi?id=19922 : I had an AoE device go down overnight, and while a server was trying to : write to it, it was also writing this message to its logs: : : 209 printk(KERN_INFO "aoe: device %ld.%d is not up\n", : 210 d->aoemajor, d->aoeminor); : : The message appeared many times per second, and over several hours : produced about 7.5 gigabytes of log files, filling up all free space on : the root filesystem. Cc: "Ed L. Cashin" <ecashin@coraid.com> Suggested-by: Roman Mamedov <roman@rm.pp.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-28 06:15:26 -06:00
Geert Uytterhoeven	e1fbd9210d	drivers/block/z2ram.c: correct printing of sector_t If CONFIG_LBDAF=y, `sector_t' becomes `u64' instead of `unsigned long': drivers/block/z2ram.c: In function ¡do_z2_request¢: drivers/block/z2ram.c:83: warning: format %lu expects type `long unsigned int', but argument 2 has type `sector_t' Hence always cast it to `unsigned long long' for printing. Also do the pr_err() dance, while we're at it. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-28 06:15:26 -06:00
Tejun Heo	5ad21a3374	aoe: don't use flush_scheduled_work() flush_scheduled_work() is deprecated and scheduled to be removed. Directly cancel aoedev->work on free instead of depending on flush_scheduled_works(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-28 06:15:26 -06:00
Nicolas Kaiser	2027ae1fa9	drivers/block/drbd/drbd_main.c: fix error path Failure to create drbd_ee_mempool appears not to get checked. Looks like a copy-and-paste problem to me. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-28 06:15:26 -06:00
Milan Broz	51a0bb0c2e	loop: Properly clear sysfs in autoclear mode In autoclear mode bdev is NULL but the sysfs entry should be destroyed otherwise this warning appears: WARNING: at fs/sysfs/dir.c:451 sysfs_add_one+0x82/0x95() sysfs: cannot create duplicate filename '/devices/virtual/block/loop0/loop' Fixes commit `ee86273062` Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-27 19:51:30 -06:00
Peter Zijlstra	61ecdb801e	mm: strictly nested kmap_atomic() Ensure kmap_atomic() usage is strictly nested Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:08 -07:00
Linus Torvalds	51f00a471c	Merge branch 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6 * 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6: mtd/m25p80: add support to parse the partitions by OF node of/irq: of_irq.c needs to include linux/irq.h of/mips: Cleanup some include directives/files. of/mips: Add device tree support to MIPS of/flattree: Eliminate need to provide early_init_dt_scan_chosen_arch of/device: Rework to use common platform_device_alloc() for allocating devices of/xsysace: Fix OF probing on little-endian systems of: use __be32 types for big-endian device tree data of/irq: remove references to NO_IRQ in drivers/of/platform.c of/promtree: add package-to-path support to pdt of/promtree: add of_pdt namespace to pdt code of/promtree: no longer call prom_ functions directly; use an ops structure of/promtree: make drivers/of/pdt.c no longer sparc-only sparc: break out some PROM device-tree building code out into drivers/of of/sparc: convert various prom_* functions to use phandle sparc: stop exporting openprom.h header powerpc, of_serial: Endianness issues setting up the serial ports of: MTD: Fix OF probing on little-endian systems of: GPIO: Fix OF probing on little-endian systems	2010-10-25 08:19:14 -07:00
Stephen M. Cameron	4205df3400	cciss: remove controllers supported by hpsa We would prefer not to have any overlap between the two drivers. Remove the cciss_allow_hpsa option, as it it is no longer needed. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-23 18:47:31 +02:00
Stephen M. Cameron	332c2f80a8	cciss: use usleep_range not msleep for small sleeps Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-23 18:45:09 +02:00
Stephen M. Cameron	186fb9cf6a	cciss: limit commands allocated on reset_devices This is to conserve memory in a memory-limited kdump scenario Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-23 18:45:08 +02:00
Stephen M. Cameron	f442e64b93	cciss: Use kernel provided PCI state save and restore functions and use the doorbell reset method if available (which doesn't lock up the controller if you properly save and restore all the PCI registers that you're supposed to.) Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-23 18:45:07 +02:00
Stephen M. Cameron	afa842fa64	cciss: fix board status waiting code After a reset, we should first wait for the board to become "not ready", and then wait for it to become "ready", instead of immediately waiting for it to become "ready", and do this waiting after restoring PCI config space registers. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-23 18:45:06 +02:00
Jens Axboe	53c2eb24ff	Merge branch 'for-jens' of git://git.drbd.org/linux-2.6-drbd into for-2.6.37/drivers	2010-10-23 18:43:55 +02:00
Philipp Reisner	650789c87f	drbd: Removed checks for REQ_HARDBARRIER on incomming BIOs Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-23 13:02:34 +02:00
Philipp Reisner	a8a4e51e69	drbd: REQ_HARDBARRIER -> REQ_FUA transition for meta data accesses Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-23 13:01:45 +02:00
Philipp Reisner	2451fc3b2b	drbd: Removed the BIO_RW_BARRIER support form the receiver/epoch code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-23 13:00:48 +02:00
Linus Torvalds	5cc1035062	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (141 commits) USB: mct_u232: fix broken close USB: gadget: amd5536udc.c: fix error path USB: imx21-hcd - fix off by one resource size calculation usb: gadget: fix Kconfig warning usb: r8a66597-udc: Add processing when USB was removed. mxc_udc: add workaround for ENGcm09152 for i.MX35 USB: ftdi_sio: add device ids for ScienceScope USB: musb: AM35x: Workaround for fifo read issue USB: musb: add musb support for AM35x USB: AM35x: Add musb support usb: Fix linker errors with CONFIG_PM=n USB: ohci-sh - use resource_size instead of defining its own resource_len macro USB: isp1362-hcd - use resource_size instead of defining its own resource_len macro USB: isp116x-hcd - use resource_size instead of defining its own resource_len macro USB: xhci: Fix compile error when CONFIG_PM=n USB: accept some invalid ep0-maxpacket values USB: xHCI: PCI power management implementation USB: xHCI: bus power management implementation USB: xHCI: port remote wakeup implementation USB: xHCI: port power management implementation ... Manually fix up (non-data) conflict: the SCSI merge gad renamed the 'hw_sector_size' member to 'physical_block_size', and the USB tree brought a new use of it.	2010-10-22 20:30:48 -07:00
Linus Torvalds	a2887097f2	Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits) xen-blkfront: disable barrier/flush write support Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c block: remove BLKDEV_IFL_WAIT aic7xxx_old: removed unused 'req' variable block: remove the BH_Eopnotsupp flag block: remove the BLKDEV_IFL_BARRIER flag block: remove the WRITE_BARRIER flag swap: do not send discards as barriers fat: do not send discards as barriers ext4: do not send discards as barriers jbd2: replace barriers with explicit flush / FUA usage jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier jbd: replace barriers with explicit flush / FUA usage nilfs2: replace barriers with explicit flush / FUA usage reiserfs: replace barriers with explicit flush / FUA usage gfs2: replace barriers with explicit flush / FUA usage btrfs: replace barriers with explicit flush / FUA usage xfs: replace barriers with explicit flush / FUA usage block: pass gfp_mask and flags to sb_issue_discard dm: convey that all flushes are processed as empty ...	2010-10-22 17:07:18 -07:00
Linus Torvalds	8abfc6e7a4	Merge branch 'for-2.6.37/drivers' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.37/drivers' of git://git.kernel.dk/linux-2.6-block: (95 commits) cciss: fix PCI IDs for new Smart Array controllers drbd: add race-breaker to drbd_go_diskless drbd: use dynamic_dev_dbg to optionally log uuid changes dynamic_debug.h: Fix dynamic_dev_dbg() macro if CONFIG_DYNAMIC_DEBUG not set drbd: cleanup: change "<= 0" to "== 0" drbd: relax the grace period of the md_sync timer again drbd: add some more explicit drbd_md_sync drbd: drop wrong debug asserts, fix recently introduced race drbd: cleanup useless leftover warn/error printk's drbd: add explicit drbd_md_sync to drbd_resync_finished drbd: Do not log an ASSERT for P_OV_REQUEST packets while C_CONNECTED drbd: fix for possible deadlock on IO error during resync drbd: fix unlikely access after free and list corruption drbd: fix for spurious fullsync (uuids rotated too fast) drbd: allow for explicit resync-finished notifications drbd: preparation commit, using full state in receive_state() drbd: drbd_send_ack_dp must not rely on header information drbd: Fix regression in recv_bm_rle_bits (compressed bitmap) drbd: Fixed a stupid copy and paste error drbd: Allow larger values for c-fill-target. ... Fix up trivial conflict in drivers/block/ataflop.c due to BKL removal	2010-10-22 17:03:12 -07:00
Linus Torvalds	e9dd2b6837	Merge branch 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block: (39 commits) cfq-iosched: Fix a gcc 4.5 warning and put some comments block: Turn bvec_k{un,}map_irq() into static inline functions block: fix accounting bug on cross partition merges block: Make the integrity mapped property a bio flag block: Fix double free in blk_integrity_unregister block: Ensure physical block size is unsigned int blkio-throttle: Fix possible multiplication overflow in iops calculations blkio-throttle: limit max iops value to UINT_MAX blkio-throttle: There is no need to convert jiffies to milli seconds blkio-throttle: Fix link failure failure on i386 blkio: Recalculate the throttled bio dispatch time upon throttle limit change blkio: Add root group to td->tg_list blkio: deletion of a cgroup was causes oops blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n block: set the bounce_pfn to the actual DMA limit rather than to max memory block: revert bad fix for memory hotplug causing bounces Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK block: set the bounce_pfn to the actual DMA limit rather than to max memory block: Prevent hang_check firing during long I/O cfq: improve fsync performance for small files ... Fix up trivial conflicts due to __rcu sparse annotation in include/linux/genhd.h	2010-10-22 17:00:32 -07:00
Stefano Stabellini	67ba37293e	Merge commit 'konrad/stable/xen-pcifront-0.8.2' into 2.6.36-rc8-initial-domain-v6	2010-10-22 21:24:06 +01:00
Linus Torvalds	092e0e7e52	Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: vfs: make no_llseek the default vfs: don't use BKL in default_llseek llseek: automatically add .llseek fop libfs: use generic_file_llseek for simple_attr mac80211: disallow seeks in minstrel debug code lirc: make chardev nonseekable viotape: use noop_llseek raw: use explicit llseek file operations ibmasmfs: use generic_file_llseek spufs: use llseek in all file operations arm/omap: use generic_file_llseek in iommu_debug lkdtm: use generic_file_llseek in debugfs net/wireless: use generic_file_llseek in debugfs drm: use noop_llseek	2010-10-22 10:52:56 -07:00
Linus Torvalds	c37927d435	Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: block: autoconvert trivial BKL users to private mutex drivers: autoconvert trivial BKL users to private mutex ipmi: autoconvert trivial BKL users to private mutex mac: autoconvert trivial BKL users to private mutex mtd: autoconvert trivial BKL users to private mutex scsi: autoconvert trivial BKL users to private mutex Fix up trivial conflicts (due to addition of private mutex right next to deletion of a version string) in drivers/char/pcmcia/cm40[04]0_cs.c	2010-10-22 10:49:54 -07:00
Michal Nazarewicz	8fa7fd74ef	USB: storage: Use USB_ prefix instead of US_ prefix This commit changes prefix for some of the USB mass storage class related macros (ie. USB_SC_ for subclass and USB_PR_ for class). Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-10-22 10:21:49 -07:00
Philipp Reisner	8825f7c3e5	drbd: Silenced an assert That assertion's condition needed adjustment for today's semantics Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-22 15:55:22 +02:00
Lars Ellenberg	fb2c7a10ee	drbd: rate limit an error message If we don't rate limit it, and you happen to log err level messages via serial console, an IO error on a disconnected Primary may cause serious unresponsiveness. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-22 15:53:10 +02:00
Lars Ellenberg	bc571b8cb9	drbd: fix a misleading printk This codepath used to be called only for failed kmalloc GFP_ATOMIC, but is now also triggered by other things. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-22 15:51:22 +02:00
Lars Ellenberg	6719fb036c	drbd: fix potential data divergence after multiple failures If we get an IO-error during an activity log transaction, if we failed to write the bitmap of the evicted extent, we must not write the transaction itself. If we failed to write the transaction, we must not even submit the corresponding bio, as its extent is not yet marked in the activity log. Otherwise, if this was a disconneted Primary (degraded cluster), which now lost its disk as well, and we later re-attach the same backend storage, we possibly "forget" to resync some parts of the disk that potentially have been changed. On the receiving side, when receiving from a peer with unhealthy disk, checking for pdsk == D_DISKLESS is not enough, we need to set out of sync and do AL transactions for everything pdsk < D_INCONSISTENT on the receiving side. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-22 15:50:27 +02:00
Lars Ellenberg	82f59cc635	drbd: fix potential deadlock on detach If we have contention in drbd_al_begin_iod (heavy randon IO), an administrative request to detach the disk may deadlock for similar reasons as the recently fixed deadlock if detaching because of IO-error. The approach taken here is to either go through the intermediate cleanup state D_FAILED, or first lock out application io, don't just go directly to D_DISKLESS. We need an additional state bit (WAS_IO_ERROR) to distinguish the -> D_FAILED because of IO-error from other failures. Sanitize D_ATTACHING -> D_FAILED to D_ATTACHING -> D_DISKLESS. If only attaching, ldev may be missing still, but would be referenced from within the after_state_ch for -> D_FAILED, potentially dereferencing a NULL pointer. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-22 15:46:11 +02:00
Lars Ellenberg	3beec1d446	drbd: tag a few error messages with "assert failed" If those messages ever get logged, clearly state that they are actually failed ASSERTS, so our regression tests can pick them up from the logs more easily. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-22 15:41:20 +02:00
Lars Ellenberg	aaa8e2b34c	drbd: consolidate explicit drbd_md_sync into drbd_create_new_uuid Every code path changing the current UUID needs to get it on stable storage anyways. Flush it to disk right there, remove the now obsolte explicit drbd_md_sync statements in the other code paths. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-22 15:36:56 +02:00
Jens Axboe	005a1d15f5	xen-blkfront: disable barrier/flush write support The driver doesn't handle empty flushes. Disable barrier/flush write support until this is fixed up. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-22 10:58:33 +02:00
Linus Torvalds	94ebd235c4	Merge branch 'virtio' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * 'virtio' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: virtio_blk: remove BKL leftovers virtio: console: Disable lseek(2) for port file operations virtio: console: Send SIGIO in case of port unplug virtio: console: Send SIGIO on new data arrival on ports virtio: console: Send SIGIO to processes that request it for host events virtio: console: Reference counting portdev structs is not needed virtio: console: Add reference counting for port struct virtio: console: Use cdev_alloc() instead of cdev_init() virtio: console: Add a find_port_by_devt() function virtio: console: Add a list of portdevs that are active virtio: console: open: Use a common path for error handling virtio: console: remove_port() should return void virtio: console: Make write() return -ENODEV on hot-unplug virtio: console: Make read() return -ENODEV on hot-unplug virtio: console: Unblock poll on port hot-unplug virtio: console: Un-block reads on chardev close virtio: console: Check if portdev is valid in send_control_msg() virtio: console: Remove control vq data only if using multiport support virtio: console: Reset vdev before removing device	2010-10-21 12:40:33 -07:00
Linus Torvalds	2017bd1945	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (22 commits) ceph: do not carry i_lock for readdir from dcache fs/ceph/xattr.c: Use kmemdup rbd: passing wrong variable to bvec_kunmap_irq() rbd: null vs ERR_PTR ceph: fix num_pages_free accounting in pagelist ceph: add CEPH_MDS_OP_SETDIRLAYOUT and associated ioctl. ceph: don't crash when passed bad mount options ceph: fix debugfs warnings block: rbd: removing unnecessary test block: rbd: fixed may leaks ceph: switch from BKL to lock_flocks() ceph: preallocate flock state without locks held ceph: add pagelist_reserve, pagelist_truncate, pagelist_set_cursor ceph: use mapping->nrpages to determine if mapping is empty ceph: only invalidate on check_caps if we actually have pages ceph: do not hide .snap in root directory rbd: introduce rados block device (rbd), based on libceph ceph: factor out libceph from Ceph file system ceph-rbd: osdc support for osd call and rollback operations ceph: messenger and osdc changes for rbd ...	2010-10-21 12:38:28 -07:00
Christoph Hellwig	fe5a50a10c	virtio_blk: remove BKL leftovers Remove the BKL usage added in "block: push down BKL into .locked_ioctl". Virtio-blk doesn't use the BKL for anything, and doesn't implement any ioctl command by itself, but only uses the generic scsi_cmd_ioctl which is fine without the BKL. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2010-10-21 17:44:05 +10:30
Dan Carpenter	85b5aaa624	rbd: passing wrong variable to bvec_kunmap_irq() We should be passing "buf" here insead of "bv". This is tricky because it's not the same as kmap() and kunmap(). GCC does warn about it if you compile on i386 with CONFIG_HIGHMEM. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:38:25 -07:00
Dan Carpenter	b8d0638a98	rbd: null vs ERR_PTR ceph_alloc_page_vector() returns ERR_PTR(-ENOMEM) on errors. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:38:24 -07:00
Yehuda Sadeh	f4cf3deef4	block: rbd: removing unnecessary test rbd_get_segment() can't return a negative value, we don't need to check the return output. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2010-10-20 15:38:20 -07:00
Vasiliy Kulikov	28f259b7cd	block: rbd: fixed may leaks rbd_client_create() doesn't free rbdc, this leads to many leaks. seg_len in rbd_do_op() is unsigned, so (seg_len < 0) makes no sense. Also if fixed check fails then seg_name is leaked. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2010-10-20 15:38:19 -07:00
Yehuda Sadeh	602adf4002	rbd: introduce rados block device (rbd), based on libceph The rados block device (rbd), based on osdblk, creates a block device that is backed by objects stored in the Ceph distributed object storage cluster. Each device consists of a single metadata object and data striped over many data objects. The rbd driver supports read-only snapshots. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:38:13 -07:00
Mike Miller	6362beea89	cciss: fix PCI IDs for new Smart Array controllers cciss: fix PCI IDs for new controllers This patch fixes the botched up PCI IDs of new controllers. Please consider this patch for inclusion. Signed-off-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-19 09:40:34 +02:00
Jens Axboe	fa251f8990	Merge branch 'v2.6.36-rc8' into for-2.6.37/barrier Conflicts: block/blk-core.c drivers/block/loop.c mm/swapfile.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-19 09:13:04 +02:00
Michal Simek	bda80da469	of/xsysace: Fix OF probing on little-endian systems Convert big-endian DTB to little-endian if necessary. Signed-off-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2010-10-18 09:50:09 -06:00
Noboru Iwamatsu	b78c951256	xenbus: prevent warnings on unhandled enumeration values XenbusStateReconfiguring/XenbusStateReconfigured were introduced by c/s 437, but aren't handled in many switch statements. .. also pulled from the linux-2.6-sparse-tree tree. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-18 10:49:36 -04:00
Arnd Bergmann	6038f373a3	llseek: automatically add .llseek fop All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode i, struct file f) { <+... ( nonseekable_open(...) \| nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable / }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, / open uses nonseekable / }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, / we have seq_read / }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, / readdir is present / }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, / read accesses f_pos / }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, / write accesses f_pos / }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, / read and write both use no f_pos / }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, / write uses no f_pos / }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, / read uses no f_pos / }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, / no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>	2010-10-15 15:53:27 +02:00
Lars Ellenberg	5dbfe7aedf	drbd: add race-breaker to drbd_go_diskless This adds a necessary race breaker to these commits: drbd: fix for possible deadlock on IO error during resync drbd: drop wrong debug asserts, fix recently introduced race What we do is get a refcount, check the state, then depending on the state and the requested minimum disk state, either hold it (success), or give it back immediately (failed "try lock"). Some code paths (flushing of drbd metadata) may still grab and hold a refcount even if we are D_FAILED (application IO won't). So even if we hit local_cnt == 0 once after being D_FAILED, we still need to wait for that again after we changed to D_DISKLESS. Once local_cnt reaches 0 while we are D_DISKLESS, we can be sure that no one will look at the protected members anymore, so only then is it safe to free them. We cannot easily convert to standard locking primitives here, as we want to be able to use it in atomic context (we always do a "try lock"), as well as hold references for a "long time" (from IO submission to completion callback). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-15 14:06:53 +02:00
Lars Ellenberg	ac7241211d	drbd: use dynamic_dev_dbg to optionally log uuid changes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-15 10:52:42 +02:00
Dan Carpenter	2265769531	drbd: cleanup: change "<= 0" to "== 0" dt is unsigned so it's never less than zero. We are calculating the elapsed time, and that's never less than zero (unless there is a bug or we invent time travel). The comparison here is just to guard against divide by zero bugs. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>	2010-10-14 19:17:23 +02:00
Lars Ellenberg	ca0e6098aa	drbd: relax the grace period of the md_sync timer again Consolidate the ifdef's for the debug level, accidentally the used both DEBUG and DRBD_DEBUG_MD_SYNC. Default to off. For production, we can safely reduce the grace period for this timer again the the value we used to have. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 19:15:38 +02:00
Lars Ellenberg	856c50c7b6	drbd: add some more explicit drbd_md_sync It sometimes may take a while for the after state change work to be scheduled, which does drbd_md_sync. At convenient places, we should do explicit drbd_md_sync to have the new state information on disk as soon as possible. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 19:08:58 +02:00
Lars Ellenberg	9d282875d8	drbd: drop wrong debug asserts, fix recently introduced race commit 2372c38caadeaebc68a5ee190782c2a0df01edc3 drbd: fix for possible deadlock on IO error during resync introduced a new ASSERT, which turns out to be wrong. Drop it. Also serialize the state change to D_DISKLESS with the after state change work of the -> D_FAILED transition, don't open a new race. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 19:08:32 +02:00
Lars Ellenberg	0f8488e160	drbd: cleanup useless leftover warn/error printk's Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:53 +02:00
Lars Ellenberg	13d42685be	drbd: add explicit drbd_md_sync to drbd_resync_finished As we usually update the generation UUIDs here, we should explicitly sync them to disk. So far this has been done only implicitly by related code paths. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:52 +02:00
Philipp Reisner	b18b37befb	drbd: Do not log an ASSERT for P_OV_REQUEST packets while C_CONNECTED This might happen if on the VERIFY_S node the disk gets dropped. Although this is an cluster wide state transition, the VERIFY_T node, updates it connection state first. Then the ack packet for the cluster wide state transition travels back, and the VERIFY_S node stops to produce the P_OV_REQUEST packets. There is absolutely nothing wrong with that. Further, do not log "Can not satisfy peer's..." on the VERIFY_S node in this case, but pretend that they had equal checksum. [Bugz 327] Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:51 +02:00
Lars Ellenberg	e9e6f3ec53	drbd: fix for possible deadlock on IO error during resync Scenario: Something (say, flush-147:0) is in drbd_al_begin_io, holding a local_cnt, waiting for the resync to make progress. Disk fails, worker in after_state_ch does drbd_rs_cancel_all, then waits for local_cnt to drop to zero. flush-147:0 is woken by drbd_rs_cancel_all, needs to write an AL transaction, and queues that on the worker. Deadlock. Fix: do not wait in the worker, have put_ldev() trigger the state change D_FAILED -> D_DISKLESS when necessary. put_ldev() cannot do the state change directly, as it may or may not already hold various spinlocks. We queue a short work instead. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:50 +02:00
Lars Ellenberg	22cc37a943	drbd: fix unlikely access after free and list corruption Various cleanup paths have been incomplete, for the very unlikely case that we cannot allocate enough bios from process context when submitting on behalf of the peer or resync process. Never observed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:49 +02:00
Lars Ellenberg	af85e8e83d	drbd: fix for spurious fullsync (uuids rotated too fast) If it was an "empty" resync, the SyncSource may have already "finished" the resync and rotated the UUIDs, before noticing the connection loss (and generating a new uuid, if Primary, rotating again), while the SyncTarget did not change its uuids at all, or only got to the previous sync-uuid. This would then again lead to a full sync on next handshake (see also Bug #251). Fix: Use explicit resync finished notification even for empty resyncs, do not finish an empty resync implicitly on the SyncSource. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:48 +02:00
Lars Ellenberg	e9ef7bb6f9	drbd: allow for explicit resync-finished notifications Preparation patch so more drbd_send_state() usage on the peer will not confuse drbd in receive_state(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:47 +02:00
Lars Ellenberg	4ac4aadacb	drbd: preparation commit, using full state in receive_state() no functional change, just using full state instead of just the .conn part of it for comparisons. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:46 +02:00
Lars Ellenberg	2b2bf2148f	drbd: drbd_send_ack_dp must not rely on header information drbd commit 17c854fea474a5eb3cfa12e4fb019e46debbc4ec drbd: receiving of big packets, for payloads between 64kByte and 4GByte introduced a new on-the-wire packet header format. We must no longer assume either format, but use the result of whatever drbd_recv_header has decoded. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:45 +02:00
Lars Ellenberg	004352fa60	drbd: Fix regression in recv_bm_rle_bits (compressed bitmap) We used to be16_to_cpu the length field in our received packet header. drbd commit 17c854fea474a5eb3cfa12e4fb019e46debbc4ec drbd: receiving of big packets, for payloads between 64kByte and 4GByte changed this, but forgot to adjust a few places where we relied on h->length being in native byte order. This broke the receiving side of the RLE compressed bitmap exchange. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:44 +02:00
Philipp Reisner	f10f262349	drbd: Fixed a stupid copy and paste error This caused rs_planed to be not in sync with the content of the fifo. That in turn could cause that the resync comes to a complete halt. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:43 +02:00
Philipp Reisner	00b425377d	drbd: Allow larger values for c-fill-target. Connections through a compressing proxy might have more bits on the fly. 500MByte instead of 50MByte Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:42 +02:00
Lars Ellenberg	f65363cfa0	drbd: fix possible access after free If we release the page pointed to by md_io_tmpp, we need to zero out the pointer, too, as that may be used later to decide whether we need to allocate a new page again. Impact: a previously freed page may be used and clobbered. Depending on what that particular page is being used for meanwhile, this may result in silent data corruption of completely unrelated things. Only of concern on devices with logical_block_size != 512 byte, if you re-attach after becoming diskless once. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:41 +02:00
Lars Ellenberg	8979d9c9e0	drbd: protocol compatibility for maximum packet sizes Two missing corner cases to the "maximum packet size" handshake. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:41 +02:00
Philipp Reisner	fb22c402ff	drbd: Track the reasons to suspend IO in dedicated state bits There are three ways to get IO suspended: * Loss of any access to data * Fence-peer-handler running * User requested to suspend IO Track those in different bits, so that one condition clearing its state bit does not interfere with the other two conditions. Only when the user resumes IO he overrules all three bits. The fact is hidden from the user, he sees only a single suspend bit. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:40 +02:00
Lars Ellenberg	78db89287c	drbd: DIV_ROUND_UP not needed here Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:39 +02:00
Philipp Reisner	5a75cc7cfb	drbd: Fixed compatibility with protocol versions smaller than 95 Forgot to consider the max size for the resync requests. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:38 +02:00
Lars Ellenberg	f2906e183f	drbd: fix for spurious full sync (becoming sync target looked like invalidate) If a synctarget lost connection while being WFSyncUUID, due to "state sanitizing", the attempted state change to SyncTarget looked like an "invalidate" to after_state_ch() later, thus caused a full sync on next handshake (Bug #318). drbd0: PingAck did not arrive in time. drbd0: peer( Primary -> Unknown ) conn( WFSyncUUID -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) from : { cs:NetworkFailure ro:Secondary/Unknown ds:UpToDate/DUnknown r--- } to : { cs:SyncTarget ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- } after sanizising, resulted in state: { cs:NetworkFailure ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- } drbd0: disk( UpToDate -> Inconsistent ) Fix: don't mask state transition errors in "sanitizing", so the requested state change to SyncTarget fails, instead of being implicitly "remaped" to invalidate. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:37 +02:00
Lars Ellenberg	02bc7174ae	drbd: cosmetic, don't report resync for online-verify Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:36 +02:00
Lars Ellenberg	a821cc4a9a	drbd: fix spurious protocol error If we cannot satisfy a request (because our disk just broke), we still need to drain the payload. Or we'll get a protocol error when interpreting the payload as DRBD packet header. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:35 +02:00
Lars Ellenberg	1d53f09e17	drbd: fix potential kernel BUG (NULL deref) BUG trace would look like: lc_find drbd_rs_complete_io got_OVResult drbd_asender Could be triggered by explicit, or IO-error policy based, detach during online-verify. We may only dereference mdev->resync, if we first get_ldev(), as the disk may break any time, causing mdev->resync to disappear once all ldev references have been returned. Already in flight online-verify requests or replies may still come in, which we then need to ignore. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:34 +02:00
Lars Ellenberg	435f07402b	drbd: don't count sendpage()d pages only referenced by tcp as in use Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:33 +02:00
Philipp Reisner	76d2e7eca8	drbd: Adding support for BIO/Request flags: REQ_FUA, REQ_FLUSH and REQ_DISCARD Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:32 +02:00
Lars Ellenberg	1090c056c5	drbd: drbd_md_sync before calling user space helpers Just in case we have some pending meta data changes to sync, do it before we call our userland helper, as that may take some time, or even cause a hard reboot. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:31 +02:00
Lars Ellenberg	ee15b03816	drbd: fix race on meta-data update, addendum addendum to baa33ae4eaa4477b60af7c434c0ddd1d182c1ae7 The race: drbd_md_sync() if (!test_and_clear_bit(MD_DIRTY, &mdev->flags)) return; ==> RACE with drbd_md_mark_dirty() rearming the timer. del_timer(&mdev->md_sync_timer); Fixed by moving the del_timer before the test_and_clear_bit. Additionally only rearm the timer in drbd_md_mark_dirty, if MD_DIRTY was not already set, reduce the grace period from five to one second, and add an ifdef'ed debuging aid to find code paths missing an explicit drbd_md_sync, if any, as those are the only relevant ones for this race. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:30 +02:00
Philipp Reisner	63106d3c6c	drbd: Removed a race that could cause unexpected execution of w_make_resync_request() The actual race happened int the drbd_start_resync() function. Where drbd_resync_finished() -> __drbd_set_state() set STOP_SYNC_TIMER and armed the timer. If the timer fired before execution reaches the mod_timer statement at the end of drbd_start_resync() the latter would cause an unexpected call to w_make_resync_request(). Removed the STOP_SYNC_TIMER bit, and base it on the connection state. The STOP_SYNC_TIMER bit probably originates probably the time before the state engine. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:29 +02:00
Lars Ellenberg	ef50a3e34f	drbd: implicitly create unconfigured devices on sync-after dependencies If pacemaker (for example) decided to initialize minor devices not in the exact sync-after dependency order, the configuration partially failed with an error "The sync-after minor number is invalid". (Bugz. #322) We can avoid that by implicitly creating unconfigured minor devices, if others depend on them. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:28 +02:00
Lars Ellenberg	3f3a9b849d	drbd: fix race on meta-data update The race: drbd_md_mark_dirty() drbd_md_sync() if (!test_and_clear_bit(MD_DIRTY, &mdev->flags)) return; drbd_md_sync_page_io(mdev, mdev->ldev, sector, WRITE) ==> RACE clear_bit(MD_DIRTY, &mdev->flags); <== spurious Fixed by removing the spurious clear_bit. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:28 +02:00
Lars Ellenberg	c518d04fde	drbd: fix race between deconfiguring and reconfiguring network If a drbd_nl_net_conf hits the small window between the state change to C_STANDALONE and the corresponding cleanup in after_state_ch, that cleanup would throw away stuff we now need again, and later trigger BUG_ON()s. Fixed by properly serializing the new config request with any pending cleanup. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:27 +02:00
Philipp Reisner	0778286a13	drbd: Disable activity log updates when the whole device is out of sync When the complete device is marked as out of sync, we can disable updates of the on disk AL. Currently AL updates are only disabled if one uses the "invalidate-remote" command on an unconnected, primary device, or when at attach time all bits in the bitmap are set. As of now, AL updated do not get disabled when a all bits becomes set due to application writes to an unconnected DRBD device. While this is a missing feature, it is not considered important, and might get added later. BTW, after initializing a "one legged" DRBD device drbdadm create-md resX drbdadm -- --force primary resX AL updates also get disabled, until the first connect. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:26 +02:00
Philipp Reisner	d53733893d	drbd: Actually allow BIOs up to 128k (was 32k). Now we have multiple BIOs per ee, packets with a 32 bit length field, it gets time to use these goodies. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:25 +02:00
Philipp Reisner	02918be227	drbd: receiving of big packets, for payloads between 64kByte and 4GByte Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:24 +02:00
Philipp Reisner	0b70a13dac	drbd: Sending of big packets, for payloads from 64KByte to 4GByte Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:23 +02:00
Philipp Reisner	204bba9965	drbd: Bugfix for regression introduced with f9bc8913c06022e If we intent to use the block_id member of an epoch entry, we may not use the digest member. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:22 +02:00
Philipp Reisner	48acf86898	drbd: Microfix: Assigning sector once is sufficient Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:21 +02:00
Lars Ellenberg	0f0601f4ea	drbd: new configuration parameter c-min-rate We now track the data rate of locally submitted resync related requests, and can thus detect non-resync activity on the lower level device. If the current sync rate is above c-min-rate, and the lower level device appears to be busy, we throttle the resyncer. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:20 +02:00
Lars Ellenberg	80a40e439e	drbd: reduce code duplication when receiving data requests also canonicalize the return values of read_for_csum and drbd_rs_begin_io to return -ESOMETHING, or 0 for success. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:19 +02:00
Lars Ellenberg	1d7734a0df	drbd: use rolling marks for resync speed calculation The current resync speed as displayed in /proc/drbd fluctuates a lot. Using an array of rolling marks makes this calculation much more stable. We used to have this (a long time ago with 0.7), but it got lost somehow. If "stalled", do not discard the rest of the information, just add a " (stalled)" tag to the progress line. This patch also shortens a spinlock critical section somewhat, and reduces the number of atomic operations in put_ldev. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:18 +02:00
Lars Ellenberg	0bb70bf601	drbd: remove outdated comment and dead code Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:17 +02:00
Lars Ellenberg	c36c3ced69	drbd: let drbd_free_ee implicitly free any digest Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:16 +02:00
Philipp Reisner	85719573dd	drbd: Replaced some casts by an union. Improved comments Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:15 +02:00
Philipp Reisner	d207450cf2	drbd: Bugfix: rs_in_flight could become wrong if read_for_csum() requested reschedule later Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:14 +02:00
Philipp Reisner	778f271dfe	drbd: The new, smarter resync speed controller Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:14 +02:00
Philipp Reisner	8e26f9ccb9	drbd: New sync_param packet, that includes the parameters of the new controller Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:13 +02:00
Philipp Reisner	9a31d7164d	drbd: New sync parameters for the smart resync rate controller Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:38:12 +02:00
Lars Ellenberg	d28fd092a5	drbd: fix list corruption (recent regression) The commit `288f422ec1` drbd: Track all IO requests on the TL, not writes only moved a list_add_tail(req, ) into a region where req may have just been freed due to conflict detection. Fix this by adding a proper cleanup section for that code path. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 18:31:43 +02:00
Philipp Reisner	e756414f7d	drbd: Initialize all members of sync_conf to their defaults [Bugz 315] Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 15:12:07 +02:00
Philipp Reisner	6709893059	drbd: Make sure tl_restart(, resend) can not get called multiple times for a new connection Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 15:09:09 +02:00
Philipp Reisner	f70b351159	drbd: Do not try to free tl_hash in drbd_disconnect() when IO is suspended We may not free tl_hash when IO is suspended, since we can not wait until ap_bio_cnt reaches zero. We can do this after susp reched 0, since then tl_clear was called Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 15:08:27 +02:00
Philipp Reisner	8f488156c0	drbd: Allow attach while IO is suspended Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 15:05:32 +02:00
Philipp Reisner	cfa03415a1	drbd: Allow tl_restart() to do IO completion while IO is suspended Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 15:05:08 +02:00
Philipp Reisner	84dfb9f564	drbd: Fixed a deadlock, probably only affected UP machines After disconnect (most likely mdev->net_cnt == 0) and we are still in an unstable state (!drbd_state_is_stable()). When we get an IO request in drbd_get_max_buffers() (called from __inc_ap_bio_cond(), called from inc_ap_bio()) we wake up misc_wait. Misc_wait is also used in inc_ap_bio() to sleep until the outcome of __inc_ap_bio_cond() changes. => Busy loop! Solution: Have a dedicated wait queue for get_net_conf() and put_net_conf(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 15:04:46 +02:00
Philipp Reisner	65d922c33e	drbd: Do not do a hard state change when establishing a connection [bugz 304] Make sure the state engine can deny two primaries to connect Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 15:04:10 +02:00
Philipp Reisner	481c6f5032	drbd: Ensure that the peer was not rebootet in the meantime before resending TL Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 15:01:37 +02:00
Philipp Reisner	43a5182ccc	drbd: Delayed creation of current-UUID When a fencing policy of "resource-and-stonith" is configured, and DRBD looses connection to it's peer, we can delay the creation of a new current-UUID until IO gets thawed. That allows one to deploy fence-peer handlers that actually commit suicide on the machine they get started. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 14:59:21 +02:00
Philipp Reisner	87f7be4cf8	drbd: Run the fence-peer helper asynchronously Since we can not thaw the transfer log, the next logical step is to allow reconnects while the fence-peer handler runs. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 14:58:36 +02:00
Philipp Reisner	1616a25493	drbd: Reduce the verbosity of some state transitions State transitions in the space of non-allowed states used to be very noisy. Reduce that, since that has little value for the majority of the user base. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 14:57:22 +02:00
Philipp Reisner	999122bc18	drbd: Removing a by now obsolete clause in the state sanitizing Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 14:56:50 +02:00
Philipp Reisner	18a50fa213	drbd: Now we need to handle the ed_uuid of an diskless, unconnected primary correctly Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 14:56:00 +02:00
Philipp Reisner	894c6a9461	drbd: Disabled the crashed_primary detection for re-attach of last data while IO is frozen Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 14:55:11 +02:00
Philipp Reisner	47ff2d0a8e	drbd: Do not allow a fencing-policy of resource-and-stonith with protocol A Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 14:53:42 +02:00
Philipp Reisner	265be2d098	drbd: Finished the "on-no-data-accessible suspend-io;" functionality When no data is accessible (no connection to the peer, nor a local disk) allow the user to select to freeze all IO operations instead of getting IO errors. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 14:52:53 +02:00
Philipp Reisner	905cd7d8ac	drbd: Removed redundant error checks in the request code path Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 14:39:38 +02:00
Philipp Reisner	5ba82308ea	drbd: factored drbd_req_make_private_bio() out of drbd_req_new() Preparing tl_thaw_dio() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 14:37:33 +02:00
Philipp Reisner	b9b98716f8	drbd: Do not send two barriers without any writes between them Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 14:36:51 +02:00
Philipp Reisner	11b58e73a3	drbd: factored tl_restart() out of tl_clear(). If IO was frozen for a temporal network outage, resend the content of the transfer-log into the newly established connection. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 14:35:58 +02:00
Philipp Reisner	2a80699f80	drbd: mod_req has now a return value Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 14:26:45 +02:00
Philipp Reisner	288f422ec1	drbd: Track all IO requests on the TL, not writes only With that the drbd_fail_pending_reads() function becomes obsolete. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 14:25:20 +02:00
Philipp Reisner	7e602c0aaf	drbd: renamed drbd_tl_epoch.n_req to drbd_tl_epoch.n_writes Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2010-10-14 14:23:45 +02:00
Dan Carpenter	93055c3104	ps3disk: passing wrong variable to bvec_kunmap_irq() This should pass "buf" to bvec_kunmap_irq() instead of "bv". The api is like kmap_atomic() instead of kmap(). Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-12 18:56:33 +02:00
Mike Snitzer	e4c4776dea	virtio-blk: fix request leak. Must drop reference taken by blk_make_request(). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org # .35.x Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-09 11:42:37 -07:00
Arnd Bergmann	2a48fc0ab2	block: autoconvert trivial BKL users to private mutex The block device drivers have all gained new lock_kernel calls from a recent pushdown, and some of the drivers were already using the BKL before. This turns the BKL into a set of per-driver mutexes. Still need to check whether this is safe to do. file=$1 name=$2 if grep -q lock_kernel ${file} ; then if grep -q 'include.linux.mutex.h' ${file} ; then sed -i '/include.<linux\/smp_lock.h>/d' ${file} else sed -i 's/include.<linux\/smp_lock.h>.$/include <linux\/mutex.h>/g' ${file} fi sed -i ${file} \ -e "/^#include.linux.mutex.h/,$ { 1,/^$static\\|int\\|long$/ { /^$static\\|int\\|long$/istatic DEFINE_MUTEX(${name}_mutex); } }" \ -e "s/$un$lock_kernel\>[ ]()/mutex_\1lock(\&${name}_mutex)/g" \ -e '/[ ]cycle_kernel_lock();/d' else sed -i -e '/include.*\<smp_lock.h\>/d' ${file} \ -e '/cycle_kernel_lock()/d' fi Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-05 15:01:10 +02:00
Arnd Bergmann	613655fa39	drivers: autoconvert trivial BKL users to private mutex All these files use the big kernel lock in a trivial way to serialize their private file operations, typically resulting from an earlier semi-automatic pushdown from VFS. None of these drivers appears to want to lock against other code, and they all use the BKL as the top-level lock in their file operations, meaning that there is no lock-order inversion problem. Consequently, we can remove the BKL completely, replacing it with a per-file mutex in every case. Using a scripted approach means we can avoid typos. These drivers do not seem to be under active maintainance from my brief investigation. Apologies to those maintainers that I have missed. file=$1 name=$2 if grep -q lock_kernel ${file} ; then if grep -q 'include.linux.mutex.h' ${file} ; then sed -i '/include.<linux\/smp_lock.h>/d' ${file} else sed -i 's/include.<linux\/smp_lock.h>.$/include <linux\/mutex.h>/g' ${file} fi sed -i ${file} \ -e "/^#include.linux.mutex.h/,$ { 1,/^$static\\|int\\|long$/ { /^$static\\|int\\|long$/istatic DEFINE_MUTEX(${name}_mutex); } }" \ -e "s/$un$lock_kernel\>[ ]()/mutex_\1lock(\&${name}_mutex)/g" \ -e '/[ ]cycle_kernel_lock();/d' else sed -i -e '/include.*\<smp_lock.h\>/d' ${file} \ -e '/cycle_kernel_lock()/d' fi Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-05 15:01:04 +02:00
Dan Rosenberg	252a52aa4f	Fix pktcdvd ioctl dev_minor range check The PKT_CTRL_CMD_STATUS device ioctl retrieves a pointer to a pktcdvd_device from the global pkt_devs array. The index into this array is provided directly by the user and is a signed integer, so the comparison to ensure that it falls within the bounds of this array will fail when provided with a negative index. This can be used to read arbitrary kernel memory or cause a crash due to an invalid pointer dereference. This can be exploited by users with permission to open /dev/pktcdvd/control (on many distributions, this is readable by group "cdrom"). Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> [ Rather than add a cast, just make the function take the right type -Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-27 16:29:06 -07:00
Vivek Goyal	504c6d1b44	amiga floppy: Compile failure fixes o Compile fixes for amiga floppy driver. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-26 12:23:25 +09:00
Vivek Goyal	639e2f2aa7	atari floppy: Stop sharing request queue across multiple gendisks o Use one request queue per gendisk instead of sharing the queue. o Don't have hardware. No compile testing or run time testing done. Completely untested. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-24 20:35:45 +02:00
Vivek Goyal	786029ff81	amiga floppy: Stop sharing request queue across multiple gendisks o Use one request queue per gendisk instead of sharing request queue o Don't have hardware. No compile testing or run time testing done. Completely untested. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-24 20:35:44 +02:00
Jens Axboe	488211844e	floppy: switch to one queue per drive instead of sharing a queue Pretty straight forward conversion. Note that we do round-robin between the drives that have available requests, before we simply used the drive that the IO scheduler told us to. Since the IO scheduler doesn't care about multiple devices per queue, the resulting sort would not have made sense. Fixed by Vivek to get rid of a double lock problem in set_next_request() Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>	2010-09-22 09:32:36 +02:00
Dan Carpenter	b0722cb1ac	cciss: freeing uninitialized data on error path The "h->scatter_list" is allocated inside a for loop. If any of those allocations fail, then the rest of the list is uninitialized data. When we free it we should start from the top and free backwards so that we don't call kfree() on uninitialized pointers. Also if the allocation for "h->scatter_list" fails then we would get an Oops here. I should have noticed this when I send: `4ee69851c` "cciss: handle allocation failure." but I didn't. Sorry about that. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-21 11:49:17 +02:00
Christoph Hellwig	dd3932eddf	block: remove BLKDEV_IFL_WAIT All the blkdev_issue_* helpers can only sanely be used for synchronous caller. To issue cache flushes or barriers asynchronously the caller needs to set up a bio by itself with a completion callback to move the asynchronous state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always specified when calling blkdev_issue_* and also remove the now unused flags argument to blkdev_issue_flush and blkdev_issue_zeroout. For blkdev_issue_discard we need to keep it for the secure discard flag, which gains a more descriptive name and loses the bitops vs flag confusion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-16 20:52:58 +02:00
Martin K. Petersen	c8bf133682	Consolidate min_not_zero We have several users of min_not_zero, each of them using their own definition. Move the define to kernel.h. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>	2010-09-10 20:07:38 +02:00
Linus Torvalds	ff3cb3fec3	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: Range check cpu in blk_cpu_to_group scatterlist: prevent invalid free when alloc fails writeback: Fix lost wake-up shutting down writeback thread writeback: do not lose wakeup events when forking bdi threads cciss: fix reporting of max queue depth since init block: switch s390 tape_block and mg_disk to elevator_change() block: add function call to switch the IO scheduler from a driver fs/bio-integrity.c: return -ENOMEM on kmalloc failure bio-integrity.c: remove dependency on __GFP_NOFAIL BLOCK: fix bio.bi_rw handling block: put dev->kobj in blk_register_queue fail path cciss: handle allocation failure cfq-iosched: Documentation help for new tunables cfq-iosched: blktrace print per slice sector stats cfq-iosched: Implement tunable group_idle cfq-iosched: Do group share accounting in IOPS when slice_idle=0 cfq-iosched: Do not idle if slice_idle=0 cciss: disable doorbell reset on reset_devices blkio: Fix return code for mkdir calls	2010-09-10 07:26:27 -07:00
Tejun Heo	02c42b7a68	virtio_blk: drop REQ_HARDBARRIER support Remove now unused REQ_HARDBARRIER support. virtio_blk already supports REQ_FLUSH and the usefulness of REQ_FUA for virtio_blk is questionable at this point, so there's nothing else to do to support new REQ_FLUSH/FUA interface. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-10 12:35:37 +02:00
Tejun Heo	6259f28459	block/loop: implement REQ_FLUSH/FUA support Deprecate REQ_HARDBARRIER and implement REQ_FLUSH/FUA instead. Also, instead of checking file->f_op->fsync() directly, look at the value of vfs_fsync() and ignore -EINVAL return. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-10 12:35:37 +02:00
Tejun Heo	9cbbdca44a	block: remove spurious uses of REQ_HARDBARRIER REQ_HARDBARRIER is deprecated. Remove spurious uses in the following users. Please note that other than osdblk, all other uses were already spurious before deprecation. * osdblk: osdblk_rq_fn() won't receive any request with REQ_HARDBARRIER set. Remove the test for it. * pktcdvd: use of REQ_HARDBARRIER in pkt_generic_packet() doesn't mean anything. Removed. * aic7xxx_old: Setting MSG_ORDERED_Q_TAG on REQ_HARDBARRIER is spurious. Removed. * sas_scsi_host: Setting TASK_ATTR_ORDERED on REQ_HARDBARRIER is spurious. Removed. * scsi_tcq: The ordered tag path wasn't being used anyway. Removed. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Boaz Harrosh <bharrosh@panasas.com> Cc: James Bottomley <James.Bottomley@suse.de> Cc: Peter Osterlund <petero2@telia.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-10 12:35:36 +02:00
Tejun Heo	4913efe456	block: deprecate barrier and replace blk_queue_ordered() with blk_queue_flush() Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA requests. Deprecate barrier. All REQ_HARDBARRIERs are failed with -EOPNOTSUPP and blk_queue_ordered() is replaced with simpler blk_queue_flush(). blk_queue_flush() takes combinations of REQ_FLUSH and FUA. If a device has write cache and can flush it, it should set REQ_FLUSH. If the device can handle FUA writes, it should also set REQ_FUA. All blk_queue_ordered() users are converted. * ORDERED_DRAIN is mapped to 0 which is the default value. * ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH. * ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH \| REQ_FUA. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Boaz Harrosh <bharrosh@panasas.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Cc: David S. Miller <davem@davemloft.net> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Pierre Ossman <drzeus@drzeus.cx> Cc: Stefan Weinhuber <wein@de.ibm.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-10 12:35:36 +02:00
Tejun Heo	6958f14545	block: kill QUEUE_ORDERED_BY_TAG Nobody is making meaningful use of ORDERED_BY_TAG now and queue draining for barrier requests will be removed soon which will render the advantage of tag ordering moot. Kill ORDERED_BY_TAG. The following users are affected. * brd: converted to ORDERED_DRAIN. * virtio_blk: ORDERED_TAG path was already marked deprecated. Removed. * xen-blkfront: ORDERED_TAG case dropped. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-10 12:35:36 +02:00
Tejun Heo	589d7ed02a	block/loop: queue ordered mode should be DRAIN_FLUSH loop implements FLUSH using fsync but was incorrectly setting its ordered mode to DRAIN. Change it to DRAIN_FLUSH. In practice, this doesn't change anything as loop doesn't make use of the block layer ordered implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-10 12:35:36 +02:00
Stephen M. Cameron	fcfb5c0ce1	cciss: remove some superfluous tests from cciss_bigpassthru() Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-10 12:12:40 +02:00
Stephen M. Cameron	0c9f5ba7cb	cciss: factor out cciss_big_passthru Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-10 12:12:39 +02:00

... 4 5 6 7 8 ...

2102 Commits