OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Philipp Reisner	e6b3ea83bc	drbd: moved receiver, worker and asender from mdev to tconn Patch mostly: sed -i -e 's/mdev->receiver/mdev->tconn->receiver/g' \ -e 's/mdev->worker/mdev->tconn->worker/g' \ -e 's/mdev->asender/mdev->tconn->asender/g' \ *.[ch] Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:27:06 +02:00
Philipp Reisner	e42325a576	drbd: moved data and meta from mdev to tconn Patch mostly: sed -i -e 's/mdev->data/mdev->tconn->data/g' \ -e 's/mdev->meta/mdev->tconn->meta/g' \ *.[ch] Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:27:05 +02:00
Philipp Reisner	b2fb6dbe52	drbd: moved net_cont and net_cnt_wait from mdev to tconn Patch partly generated by: sed -i -e 's/get_net_conf(mdev)/get_net_conf(mdev->tconn)/g' \ -e 's/put_net_conf(mdev)/put_net_conf(mdev->tconn)/g' \ -e 's/get_net_conf(odev)/get_net_conf(odev->tconn)/g' \ -e 's/put_net_conf(odev)/put_net_conf(odev->tconn)/g' \ *.[ch] Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:27:04 +02:00
Philipp Reisner	89e58e755e	drbd: moved net_conf from mdev to tconn Besides moving the struct member, everything else is generated by: sed -i -e 's/mdev->net_conf/mdev->tconn->net_conf/g' \ -e 's/odev->net_conf/odev->tconn->net_conf/g' \ *.[ch] Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:27:03 +02:00
Andreas Gruenbacher	841ce241fa	drbd: Replace the ERR_IF macro with an assert-like macro Remove the file name and line number from the syslog messages generated: we have no duplicate function names, and no function contains the same assertion more than once. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:57 +02:00
Andreas Gruenbacher	e77a0a5cc1	drbd: Convert all constants in enum drbd_thread_state to upper case Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:56 +02:00
Andreas Gruenbacher	8554df1c6d	drbd: Convert all constants in enum drbd_req_event to upper case Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:55 +02:00
Andreas Gruenbacher	bb3bfe9614	drbd: Remove the unused hash tables Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:54 +02:00
Andreas Gruenbacher	8b946255f8	drbd: Use interval tree for overlapping epoch entry detection Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:53 +02:00
Andreas Gruenbacher	010f6e678f	drbd: Put sector and size in struct drbd_epoch_entry into struct drbd_interval Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:52 +02:00
Andreas Gruenbacher	bc9c5c4118	drbd: Use the read and write request trees for request lookups Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-29 11:26:50 +02:00
Andreas Gruenbacher	de696716e8	drbd: Use interval tree for overlapping write request detection Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:58:06 +02:00
Andreas Gruenbacher	ace652acf2	drbd: Put sector and size in struct drbd_request into struct drbd_interval Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:58:05 +02:00
Andreas Gruenbacher	c3afd8f568	drbd: Request lookup code cleanup (4) Factor out duplicate code in got_NegAck(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:58:03 +02:00
Andreas Gruenbacher	ae3388daae	drbd: Request lookup code cleanup (3) Get rid of the ar_id_to_req() and ack_id_to_req() wrappers. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:58:02 +02:00
Andreas Gruenbacher	668eebc6a1	drbd: Request lookup code cleanup (2) Unify the ar_id_to_req() and ack_id_to_req() functions: make both fail if the consistency check fails. Move the request lookup code now duplicated in both functions into its own function. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:58:01 +02:00
Andreas Gruenbacher	5162458564	drbd: Request lookup code cleanup (1) Move _ar_id_to_req() to drbd_receiver.c and mark it non-inline. Remove the leading underscores from _ar_id_to_req() and _ack_id_to_req(). Mark ar_hash_slot() inline. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:58:00 +02:00
Andreas Gruenbacher	9c50842a35	drbd: Update outdated comment Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:57:59 +02:00
Andreas Gruenbacher	d628769b3c	drbd: Move drbd_free_tl_hash() to drbd_main() This is the only place where this function is used. Make it static. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:57:58 +02:00
Andreas Gruenbacher	579b57ed73	drbd: Magic reserved block_id value cleanup The ID_VACANT definition has become entirely irrelevant by now. The is_syncer_block_id() macro does not improve the code, so eliminated it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:57:57 +02:00
Andreas Gruenbacher	ca9bc12b90	drbd: Get rid of BE_DRBD_MAGIC and BE_DRBD_MAGIC_BIG Converting the constants happens at compile time. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:57:56 +02:00
Andreas Gruenbacher	9a8e77530f	drbd: Consistently use block_id == ID_SYNCER for checksum based resync and online verify DRBD_MAGIC has nothing to do with block ids and the funny values computed were not actually used, anyway. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:57:55 +02:00
Andreas Gruenbacher	28c455ceb2	drbd: Get rid of req_validator_fn typedef Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-08-25 14:57:31 +02:00
Lars Ellenberg	cb6518cbef	drbd: when receive times out on meta socket, also check last receive time on data socket If we have an asymetrically congested network, we may send P_PING, but due to congestion, the corresponding P_PING_ACK would time out, and we would drop a (congested, but otherwise) healthy connection ("PingAck did not arrive in time.") Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-06-30 09:23:44 +02:00
Philipp Reisner	9b2f61aec7	drbd: fix warning Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>	2011-05-24 10:38:32 +02:00
Bart Van Assche	24c4830c8e	drbd: Fix spelling Found these with the help of ispell -l. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>	2011-05-24 10:21:29 +02:00
Philipp Reisner	99432fcc52	drbd: Take a more conservative approach when deciding max_bio_size The old (optimistic) implementation could shrink the bio size on an primary device. Shrinking the bio size on a primary device is bad. Since there we might get BIOs with the old (bigger) size shortly after we published the new size. The new implementation is more conservative, and eventually increases the max_bio_size on a primary device (which is valid). It does so, when it knows the local limit AND the remote limit. We cache the last seen max_bio_size of the peer in the meta data, and rely on that, to make the operation of single nodes more efficient. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-05-24 10:08:58 +02:00
Philipp Reisner	a8e407925d	drbd: Fix for the connection problems on high latency links It seems that the real cause of all the issues where that we did not noticed in drbd_try_connect() when the other guy closes one socket if the round trip time gets higher than 100ms. There were that 100ms hard coded! Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-05-24 10:07:22 +02:00
Lars Ellenberg	f36af18c7b	drbd: fix disconnect/reconnect loop, if ping-timeout == ping-int If there is no replication traffic within the idle timeout (ping-int seconds), DRBD will send a P_PING, and adjust the timeout to ping-timeout. If there is no P_PING_ACK received within this ping-timeout, DRBD finally drops the connection, and tries to re-establish it. To decide which timeout was active, we compared the current timeout with the ping-timeout, and dropped the connection, if that was the case. By default, ping-int is 10 seconds, ping-timeout is 500 ms. Unfortunately, if you configure ping-timeout to be the same as ping-int, expiry of the idle-timeout had been mistaken for a missing ping ack, and caused an immediate reconnection attempt. Fix: Allow both timeouts to be equal, use a local variable to store which timeout is active. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-05-24 10:03:30 +02:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Philipp Reisner	7fde2be930	drbd: Implemented real timeout checking for request processing time Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:48:16 +01:00
Lars Ellenberg	fdda6544ad	drbd: improve log message if received sector offset exceeds local capacity Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:48:13 +01:00
Lars Ellenberg	10f6d9926c	drbd: don't BUG_ON, if bio_add_page of a single page to an empty bio fails Just deal with it more gracefully, if we fail to add even a single page to an empty bio. We used to BUG_ON() there, but it has been observed in some Xen deployment, so we need to handle that case more robustly now. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:48:10 +01:00
Lars Ellenberg	0ddc5549f8	drbd: silence some noisy log messages during disconnect If we fail to send the information that we lost our disk, we have no connection, and no disk: no access to data anymore. That is either expected (deconfiguration), or there will be so much noise in the logs that "Sending state failed" is not useful at all. Drop it. If the reason for a shorter than expected receive was a signal, which we sent because we already decided to disconnect, these additional log messages are confusing and useless. This patch follows this pattern: - dev_warn(DEV, "short read expecting header on sock: r=%d\n", r); + if (!signal_pending(current)) + dev_warn(DEV, "short read expecting header on sock: r=%d\n", r); Also make them all dev_warn for consistency. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:48:04 +01:00
Lars Ellenberg	20ceb2b22e	drbd: describe bitmap locking for bulk operation in finer detail Now that we do no longer in-place endian-swap the bitmap, we allow selected bitmap operations (testing bits, sometimes even settting bits) during some bulk operations. This caused us to hit a lot of FIXME asserts similar to FIXME asender in drbd_bm_count_bits, bitmap locked for 'write from resync_finished' by worker Which now is nonsense: looking at the bitmap is perfectly legal as long as it is not being resized. This cosmetic patch defines some flags to describe expectations in finer detail, so the asserts in e.g. bm_change_bits_to() can be skipped if appropriate. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:48:02 +01:00
Lars Ellenberg	62b0da3a24	drbd: log UUIDs whenever they change All decisions about sync, sync direction, and wether or not to allow a connect or attach are based on our set of UUIDs to tag a data generation. Log changes to the UUIDs whenever they occur, logging "new current UUID P:Q:R:S" is more useful than "Creating new current UUID". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:48:01 +01:00
Philipp Reisner	d07c9c10e5	drbd: We can not process BIOs with a size of 0 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:47:59 +01:00
Lars Ellenberg	79a30d2d71	drbd: queue bitmap writeout more intelligently The "lazy writeout" of cleared bitmap pages happens during resync, and should happen again once the resync finishes cleanly, or is aborted. If resync finished cleanly, or was aborted because of peer disk failure, we trigger the writeout from worker context in the after state change work. If resync was aborted because of connection failure, we should not immediately trigger bitmap writeout, but rather postpone the writeout to after the connection cleanup happened. We now do it in the receiver context from drbd_disconnect(). If resync was aborted because of local disk failure, well, there is nothing to write to anymore. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:47:56 +01:00
Philipp Reisner	20ee639024	drbd: cleaned up __set_current_state() followed by schedule_timeout() calls Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:47:42 +01:00
Philipp Reisner	2deb8336d0	drbd: Fixed P_NEG_ACK processing for protocol A and B Protocol A has no P_WRITE_ACKs, but has P_NEG_ACKs. The master bio might already be completed, therefore the request is no longer in the collision hash. => Do not try to validate block_id as request In Protocol B we might already have got a P_RECV_ACK but then get a P_NEG_ACK after wards. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:45:40 +01:00
Philipp Reisner	94f2b05f03	drbd: Killed an assert that is no longer valid The point is that drbd_disconnect() can be called with a cstate of WFConnection. That happens if the user issues "drbdsetup disconnect" while the drbd_connect() function executes. Then drbdd_init() will call drbdd(), which in turn will return without receiving any packets. Then drbdd_init() will end up calling drbd_disconnect() with a cstate of WFConnection. Bottom line: This assertion is wrong as it is, and we do not see value in fixing it. => Removing it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:45:39 +01:00
Philipp Reisner	148efa165e	drbd: Do not drop net config if sending in drbd_send_protocol() fails Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:45:37 +01:00
Philipp Reisner	370a43e798	drbd: Work on the Ahead -> SyncSource transition The test if rs_pending_cnt == 0 was too weak. Using Test for unacked_cnt == 0 instead. Moved that into the worker. Since unacked_cnt gets already increased when an P_RS_DATA_REQ comes in. Also using a timer to make Ahead -> SyncSource -> Ahead cycles slower... Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:45:36 +01:00
Philipp Reisner	4a23f26496	drbd: Do not full sync if a P_SYNC_UUID packet gets lost See also commit from 2009-08-15 "drbd_uuid_compare(): Do not full sync in case a P_SYNC_UUID packet gets lost." We saw cases where the History UUIDs where not as expected. So the detection of the special case did not trigger. With the sync UUID no longer being a random number, but deducible from the previous bitmap UUID, the detection of this special case becomes more reliable. The SyncUUID now is the previous bitmap UUID + 0x1000000000000. Rule 5a: Cs = H1p & H1p + Offset = Bp Connection was lost before SyncUUID Packet came through. Corrent (peer) UUIDs: Bp = H1p H1p = H2p H2p = 0 Become Sync target. Rule 7a: Cp = H1s & H1s + Offset = Bs Connection was lost before SyncUUID Packet came through. Correct (own) UUIDs: Bs = H1s H1s = H2s H2s = 0 Become Sync source. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:45:32 +01:00
Philipp Reisner	da0a78161d	drbd: Be more careful with SyncSource -> Ahead transitions We may not get from SyncSource to Ahead if we have sent some P_RS_DATA_REPLY packets to the peer and are waiting for P_WRITE_ACK. Again, this is not relevant for proper tuned systems, but makes sure that the not-tuned system does not get diverging bitmaps. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:45:26 +01:00
Philipp Reisner	d612d309e4	drbd: No longer answer P_RS_DATA_REQUEST packets when in C_AHEAD mode When the sync source node replies to a P_RS_DATA_REQUEST packet when it is already in ahead mode. I.e. those two packets crossed each other on the wire, that may lead to diverging bitmaps. This never happens in a well-tuned-system. In a well-tuned- system the resync controller has reduced the resync speed to zero long before we got into ahead-mode. But we have to be prepared for the not-well-tuned-system of course as well. Because -> diverging bitmaps = non terminating resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:45:25 +01:00
Lars Ellenberg	f735e36354	drbd: add debugging assert to make sure the protocol is clean We expect to only receive the recently introduced "set out of sync" packets in specific states. If we receive them in different states, that may confuse the resync process to the point where it won't terminate, or think it made negative progress. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:43:34 +01:00
Andreas Gruenbacher	2c46407d24	drbd: receive_bitmap_plain: Get rid of ugly and useless enum Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:34 +01:00
Andreas Gruenbacher	78fcbdae22	drbd: receive_bitmap: Missing free_page() on error path Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:30 +01:00
Andreas Gruenbacher	de1f8e4a0a	drbd: receive_bitmap: Avoid casting enum drbd_state_rv to int Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2011-03-10 11:36:29 +01:00

1 2 3 4

155 Commits