If a local or remote READ request fails, just push it back to the retry
workqueue. It will re-enter __drbd_make_request, and be re-assigned to
a suitable local or remote path, or failed, if we do not have access to
good data anymore.
This obsoletes w_read_retry_remote(),
and eliminates two goto...retry blocks in __req_mod()
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
* drbd-8.3:
drbd: fix spurious meta data IO "error"
drbd: Fixed a race condition between detach and start of resync
drbd: fix harmless race to not trigger an ASSERT
drbd: Derive sync-UUIDs only from the bitmap-uuid if it is non-zero
drbd: Fixed current UUID generation (regression introduced recently, after 8.3.11)
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Since we now apply the AL in user space onto the bitmap, the AL
is not active for the requests we want to reply.
For that a al_write_transaction() that might be called from
worker context became necessary.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
* drbd-8.3:
documentation: Documented detach's --force and disk's --disk-timeout
drbd: Implemented the disk-timeout option
drbd: Force flag for the detach operation
drbd: Allow new IOs while the local disk in in FAILED state
drbd: Bitmap IO functions can not return prematurely if the disk breaks
drbd: Added a kref to bm_aio_ctx
drbd: Hold a reference to ldev while doing meta-data IO
drbd: Keep a reference to the bio until the completion handler finished
drbd: Implemented wait_until_done_or_disk_failure()
drbd: Replaced md_io_mutex by an atomic: md_io_in_use
drbd: moved md_io into mdev
drbd: Immediately allow completion of IOs, that wait for IO completions on a failed disk
drbd: Keep a reference to barrier acked requests
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The 32-bit resync_after netlink field takes a device minor number as
parameter, which is no longer limited to 255. We cannot statically
verify which device numbers are valid, so set the ummer limit to the
highest possible signed 32-bit integer.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
* Updates to all configuration items is done under genl_lock().
Including removal of mdevs or tconns.
* All read non sleeping read sides are protected by rcu
* All sleeping read sides keep reference counts to keep the
objects alive
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Change the --no-tcp-cork drbdsetup command line option as well as
the no_cork netlink packet.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This removes the issue with using peer_seq_lock out of different
contexts.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
* Moved rs_planed into it, named total
* When having a pointer to the object the values can
be embedded into the fifo object.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
There are races where the receiver may be exiting,
but still need the worker to process some stuff.
Do not wait for the receiver to die from an exiting worker.
The receiver must already be dead in case the worker decides to exit.
If the receiver was still alive, it may still want to queue work, and do
drbd_flush_workqueue() from it's disconnect cleanup code,
which would no longer be processed by an exiting worker.
This also would deadlock,
if the worker was to synchornously wait for the receiver to die.
Do not implicitly stop the worker.
The worker will only be stopped from configuration context, from
conn_reconfig_done(), drbd_adm_down() or drbd_adm_delete_connection(),
after making sure the receiver is already stopped.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When the last volume of a replication group is unconfigured,
the worker thread exits. To not interfere with cleanup
of other threads, before the the last cleanups run,
we need to make sure the receiver has already exited.
The commend explaining that clearly belongs above
drbd_thread_stop(&tconn->receiver), not in the cleanup loop below.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Removing the get_net_conf()/put_net_conf() calls
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
With this commit the locking for all accesses to IDRs is complete:
* Non sleeping read accesses are protected by RCU
* sleeping read accesses are protocted by a read lock on drbd_cfg_rwsem
* accesses that add anything are protected by a write lock
* accesses that remove an object are protoected by a write lock
and a call to synchronize_rcu() after it is removed from the IDR
and before the object is actually free()ed.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Prepare the introduction of the protocol 100 headers. The actual protocol
header is removed for the packet declarations. I.e. allow us to use the
packets with different headers.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Centralize sock->mutex locking and unlocking in [drbd|conn]_prepare_command()
and [drbd|conn]_send_comman().
Therefore all *_send_* functions are touched to use these primitives instead
of drbd_get_data_sock()/drbd_put_data_sock() and former helper functions.
That change makes the *_send_* functions more standardized.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We want to avoid bio_split for bios crossing activity log boundaries.
So we may need to activate two activity log extents "atomically".
drbd_al_begin_io() needs to know more than just the start sector.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This helps to ensure that we don't miss one of them when changing their
return value semantics.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This commit breaks the API again.
Move per-volume former syncer options into disk_conf.
Move per-connection former syncer options into net_conf.
Renamed the remainign sync_conf to res_opts
Syncer settings have been changeable at runtime, so we need to prepare
for these settings to be runtime-changeable in their new home as well.
Introduce new configuration operations, and share the netlink attribute
between "attach" (create new disk) and "disk-opts" (change options).
Same for "connect" and "net-opts".
Some fields cannot be changed at runtime, however.
Introduce a new flag GENLA_F_INVARIANT to be able to trigger on that in
the generated validation and assignment functions.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>