OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Alex Elder	29334ba49c	rbd: kill rbd_update_mapping_size() Since rbd_update_mapping_size() is now a trivial wrapper, just open code it in its two callers. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-08 07:45:39 -05:00
Alex Elder	00a653e216	rbd: update capacity in rbd_dev_refresh() When a mapped image changes size, we change the capacity recorded for the Linux disk associated with it, in rbd_update_mapping_size(). That function is called in two places--the format 1 and format 2 refresh routines. There is no need to set the capacity while holding the header semaphore. Instead, do it in the common rbd_dev_refresh(), using the logic that's already there to initiate disk revalidation. Add handling in the request function, just in case a request that exceeds the capacity of the device comes in (perhaps one that was started before a refresh shrunk the device). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-08 07:45:30 -05:00
Alex Elder	e627db085e	rbd: revalidate only for mapping size changes This commit: `d98df63e` rbd: revalidate_disk upon rbd resize instituted a call to revalidate_disk() to notify interested parties that a mapped image has changed size. This works well, as long as the the rbd device doesn't map a snapshot. A snapshot will never change size. However, the base image the snapshot is associated with can, and it can do so while the snapshot is mapped. The problem is that the test for the size is looking at the size of the base image, not the size of the mapped snapshot. This patch corrects that. Update the warning message shown in the event of error, and move it into the callers. This resolves: http://tracker.ceph.com/issues/4911 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-08 07:40:48 -05:00
Alex Elder	49ece55428	rbd: fix leak of format 2 snapshot context When rbd_dev_v2_refresh() is called, the rbd device already has a snapshot context associated with it. But that never gets freed, the pointer just gets overwritten. Fix this by dropping the rbd device's reference to the snapshot context before overwriting the pointer. Because ceph_put_snap_context() already handles for a null pointer we don't need to check for that (for the probe case, where no context has yet been assigned). This resolves: http://tracker.ceph.com/issues/4912 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-08 07:38:30 -05:00
Linus Torvalds	292088ee03	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "A couple of fixes + getting rid of __blkdev_put() return value" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: proc: Use PDE attribute setting accessor functions make blkdev_put() return void block_device_operations->release() should return void mtd_blktrans_ops->release() should return void hfs: SMP race on directory close()	2013-05-07 15:14:53 -07:00
Al Viro	db2a144bed	block_device_operations->release() should return void The value passed is 0 in all but "it can never happen" cases (and those only in a couple of drivers) and it would've been lost on the way out anyway, even if something tried to pass something meaningful. Just don't bother. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-07 02:16:21 -04:00
Alex Elder	b5b09be30c	rbd: fix image request leak on parent read When a read for a layered image object finds the target object doesn't exist, a read image request for the parent image is created and submitted. When that completes, the callback routine was not releasing that parent image request. Fix that. The slab allocation stuff just added has greatly simplified the search for the source of this memory leak. This resolves: http://tracker.ceph.com/issues/4803 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-02 12:15:28 -05:00
Alex Elder	78c2a44aae	rbd: allocate image object names with a slab allocator The names of objects used for image object requests are always fixed size. So create a slab cache to manage them. Define a new function rbd_segment_name_free() to match rbd_segment_name() (which is what supplies the dynamically-allocated name buffer). This is part of: http://tracker.ceph.com/issues/3926 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-02 11:58:30 -05:00
Alex Elder	868311b1eb	rbd: allocate object requests with a slab allocator Create a slab cache to manage rbd_obj_request allocation. We aren't using a constructor, and we'll zero-fill object request structures when they're allocated. This is part of: http://tracker.ceph.com/issues/3926 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-02 11:58:30 -05:00
Alex Elder	f907ad5596	rbd: allocate name separate from obj_request The next patch will define a slab allocator for a object requests. To use that we'll need to allocate the name of an object separate from the request structure itself. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-02 11:58:29 -05:00
Alex Elder	1c2a9dfe21	rbd: allocate image requests with a slab allocator Create a slab cache to manage rbd_img_request allocation. Nothing too fancy at this point--we'll still initialize everything at allocation time (no constructor) This is part of: http://tracker.ceph.com/issues/3926 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-02 11:58:29 -05:00
Alex Elder	30d1cff817	rbd: use binary search for snapshot lookup Use bsearch(3) to make snapshot lookup by id more efficient. (There could be thousands of snapshots, and conceivably many more.) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-02 11:58:17 -05:00
Alex Elder	15228ede7d	rbd: clear EXISTS flag if mapped snapshot disappears This functionality inadvertently disappeared in the last patch. Image snapshots can get removed at just about any time. In particular it can disappear even if it is in use by an rbd client as a mapped image. The rbd client deals with such a disappearance by responding to new requests with ENXIO. This is implemented by each rbd device maintaining an EXISTS flag, which is normally set but cleared if a snapshot disappears. This patch (re-)implements the clearing of that flag. Whenever mapped image header information is refreshed, if the mapping is for a snapshot, verify the mapped snapshot is still present in the updated snapshot context. If it is not, clear the flag. It is not necessary to check this in the initial probe, because the probe will not succeed if the snapshot doesn't exist. This resolves: http://tracker.ceph.com/issues/4880 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-02 11:57:03 -05:00
Alex Elder	33dca39f5c	rbd: kill off the snapshot list We no longer use the snapshot list for anything. When we need to look up a snapshot name, id, size, or feature mask, we just do it directly rather than relying on this list being updated with every refresh. The main reason it existed was for the benefit of the device/sysfs entries that previously were associated with snapshots. So get rid of the snapshot list, and struct rbd_snap, and the hundreds of lines of code that supported them. This resolves: http://tracker.ceph.com/issues/4868 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:22 -07:00
Alex Elder	2ad3d7167e	rbd: define rbd_snap_size() and rbd_snap_features() This patch defines a handful of new functions that will allow us to get rid of the rbd device structure's list of snapshots. Define rbd_snap_id_by_name() to look up a snapshot id given its name. This is efficient for format 1 images but not for format 2. Fortunately it only gets called at mapping time so it's not that critical. Use rbd_snap_id_by_name() to find out the id for a snapshot getting mapped, and pass that id to new functions rbd_snap_size() and rbd_snap_features() to look up information about a given snapshot's size and feature mask given its snapshot id. All this gets done in rbd_dev_mapping_set(). As a result, snap_by_name() is no longer needed, so get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:20 -07:00
Alex Elder	54cac61fb6	rbd: use snap_id not index to look up snap info In order to align with what was needed for format 1 rbd images, rbd_dev_v2_snap_info() was set up to take as argument an index into the array of snapshot ids in a rbd device's snapshot context. This switches that around, so we pass the snapshot id instead. In doing this, rbd_snap_name() now returns a dynamically-allocated string rather than a fixed one, so there's no need to make a duplicate in its caller, rbd_dev_spec_update(). This means the following functions take a snapshot id where they previously used an index value: rbd_dev_snap_info() rbd_dev_v1_snap_info() rbd_dev_v2_snap_info() A new function, rbd_dev_snap_index(), determines the snap index for format 1 images and uses it to look up the name. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:19 -07:00
Alex Elder	9682fc6d3a	rbd: look up snapshot name in names buffer Rather than scanning the list of snapshot structures for it, scan the snapshot context buffer containing snapshot names in order to determine for a format 1 image the name associated with a given snapshot id. Pull out the part of rbd_dev_v1_snap_info() that does this scan into a new function, _rbd_dev_v1_snap_name(). Have that function return a dynamically-allocated copy of the name, and don't duplicate it in rbd_dev_v1_snap_info(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:18 -07:00
Alex Elder	dedc81ea84	rbd: drop obj_request->version Nothing ever uses the version field maintained in the object request structure any more, so get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:17 -07:00
Alex Elder	e2a58ee55b	rbd: drop rbd_obj_method_sync() version parameter Only NULL is passed as the version argument to rbd_obj_method_sync(), so get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:16 -07:00
Alex Elder	cc4a38bdd5	rbd: more version parameter removal Continued from the last patch, more parameters that can go away because we no longer have a need to track object versions. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:15 -07:00
Alex Elder	7097f8df6e	rbd: get rid of some version parameters Several functions in rbd have parameters meant to allow the version of an object to be passed in or out. The purpose of those was to allow the version of a header object to be maintained, but we no longer do that. As a result, these parameters are never actually needed or used, so get rid of them. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:14 -07:00
Alex Elder	b21ebdddeb	rbd: stop tracking header object version The rbd code takes care to maintain the version of the header object. This was done in hopes of using it to detect a change in the object between reading it and setting up a watch request to be notified of changes. The mechanism was never fully implemented, however. And we now avoid the original problem by setting up the watch request before ever reading the content of the header. The osd doesn't interpret the object version supplied with a WATCH osd op, nor does it use the version supplied with a NOTIFY_ACK op (we can just supply 0 for both). There is therefore no need to maintain the header's object version any more, so stop doing so. We'll be able to simplify some more rbd code in the next few patches as a result of this. This resolves: http://tracker.ceph.com/issues/3952 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:13 -07:00
Alex Elder	cb75223d2b	rbd: snap names are pointer to constant data Make explicit that snapshot names don't change by making functions return and take parameters that that point to const qualified data. This resolves: http://tracker.ceph.com/issues/4867 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:12 -07:00
Alex Elder	a3fbe5d447	rbd: don't revalidate so much Whenever a header object event causes a mapped rbd image to refresh its header information, revalidate_disk() is being called. This was done in rbd_dev_refresh() outside the control mutex in order to avoid a lock inversion. Although a an event like this might indicate the image has changed size, most of the time it does not. Record the image size before and after the refresh, and only call revalidate_disk() if it changes. This resolves: http://tracker.ceph.com/issues/4867 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:11 -07:00
Alex Elder	96882f55c4	rbd: fix up the layering warning message A warning gets spewed for any image being probed, including parent images. Set up a condition such that the warning message only gets printed for the image being mapped, not any of its parents. Also, I didn't like the way the warning ended up being so long. Make it a terse warning instead. People experimenting with layering will know what the message means. This is part of: http://tracker.ceph.com/issues/4867 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:10 -07:00
Alex Elder	812164f8c3	ceph: use ceph_create_snap_context() Now that we have a library routine to create snap contexts, use it. This is part of: http://tracker.ceph.com/issues/4857 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:09 -07:00
Alex Elder	b536f69a3a	rbd: set up devices only for mapped images Stop setting up Linux devices during the image probe operation. Instead, set up the devices as a separate step after the image probe, in rbd_add(). A consequence of this is that only mapped images get devices assigned to them, which is pretty sweet. This resolves: http://tracker.ceph.com/issues/4774 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:07 -07:00
Alex Elder	8ad42cd0c0	rbd: don't have device release destroy rbd_dev Currently an rbd_device structure gets destroyed from the release routine for the device embedded within it. Stop doing that, instead calling rbd_dev_image_release() right after rbd_bus_del_dev() wherever the latter is called. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:05 -07:00
Alex Elder	6fd48b3be9	rbd: define rbd_dev_unprobe() Define a new function rbd_dev_unprobe() which undoes state changes that occur from calling rbd_dev_v1_probe() or rbd_dev_v2_probe(). Note that this is a superset of rbd_header_free(), which is now getting removed (it seems to have been used improperly anyway). Flesh out rbd_dev_image_release() so it undoes exactly what rbd_dev_image_probe() does. This means that: - rbd_dev_device_release() gets called when the last device reference gets dropped; - that undoes everything done by the rbd_dev_device_setup() call at the end of rbd_dev_image_probe() (and nothing more), ending by calling rbd_dev_image_release(); and - rbd_dev_image_release() undoes everything else done by rbd_dev_image_probe() (and this includes a call to rbd_dev_unprobe(). This means the image and device portions of an rbd device are fairly cleanly separated now, so error paths should be a little easier to verify than they used to be. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:04 -07:00
Alex Elder	200a6a8be5	rbd: don't destroy rbd_dev in device release function Rename rbd_dev_probe_finish() to be rbd_dev_device_setup(). Its purpose is to set up the Linux side of an rbd device mapping. Rename rbd_dev_release() to be rbd_dev_device_release(), making it more obvious it serves as the inverse of the setup function (or it will). Encapsulate some of what was done in rbd_dev_release() into a new function rbd_dev_image_release(), which serves as the inverse of setting up the ceph side of the mapped rbd image. Define a new helper rbd_dev_clear_mapping() to simply zero out the fields of a mapping structure--the inverse of rbd_dev_set_mapping(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:03 -07:00
Alex Elder	79ab7558aa	rbd: drop module later Drop the module reference at the end of rbd_remove() for symmetry with adding a reference at the top of rbd_add(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:02 -07:00
Alex Elder	b644de2ba0	rbd: set up watch in rbd_dev_image_probe() Move setting up the watch request for an image so it's done in rbd_dev_image_probe() rather than rbd_dev_probe_finish(). Move it all the way up to before doing the initial probe. This avoids a potential race condition, in which we get (and use) the initial snapshot context for an image, and it gets changed between that time and the time we get the watch set up. This resolves: http://tracker.ceph.com/issues/3871 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:01 -07:00
Alex Elder	96f03e08f9	rbd: don't bother checking whether order changes When a format 2 image is refreshed, code is in place to verify that the object order never changes from what it was originally. This relies on the fact that the refresh will occur after an initial load of information about the image. An upcoming patch makes it possible for the refresh to occur first, so we can no longer make this order check. The order really can't ever change anyway--this was just a sanity check. So get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:00 -07:00
Alex Elder	0d8189e175	rbd: don't clean up watch in device release function Currently, a watch on an rbd device header object gets torn down when its final Linux device reference gets dropped. Instead, tear it down when removing the device. If an error occurs cleaning up the watch event when unmapping, abort the unmap request. All images (including parents) still get watch requests set up, so tear these down also, in rbd_dev_remove_parent(). For now, ignore any errors that occur in this case. Get rid of local variable "rc" in rbd_remove(); use "ret" instead (they both somehow ended up defined in the function and only one is needed). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:59 -07:00
Alex Elder	332bb12db9	rbd: define rbd_header_name() Define a new function rbd_header_name(), which allocates and formats the name of the header object for the rbd device. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:58 -07:00
Alex Elder	9bb81c9be9	rbd: move more initialization into rbd_dev_image_probe() Move a block of initialization related to the "ceph-side" of an rbd image out of rbd_dev_probe_finish() and into rbd_dev_image_probe(). Add appropriate error handling to clean things up in the event any of these new functions return an error. We know that rbd_dev_snaps_update(), rbd_dev_spec_update(), and rbd_dev_probe_parent() all clean up after themselves before they return an error, so no special cleanup is required except when an earlier call succeeds. Since rbd_dev_spec_update() only updates the spec field (whose cleanup will be handled by dropping the last reference to the spec) there is no cleanup action associatied with that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:57 -07:00
Alex Elder	5de10f3b0c	rbd: probe for the parent earlier Probe for a parent device earlier in rbd_dev_probe_finish(), before starting to set up the Linux side of the rbd device. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:56 -07:00
Alex Elder	2e93bf9e46	rbd: remove parent devices on probe error When an error occurs while finishing probing a device it is assumed that parent devices get cleaned up when deleting a device. They don't. Add a call to clean them up. Note that this means the parent spec will already be cleaned up so it doesn't have to be in one of the rbd_add() error paths. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:55 -07:00
Alex Elder	ad945fc1da	rbd: fix rbd_dev_remove_parent() In certain error paths, it is possible for an rbd device to have a parent spec but no parent rbd_dev. In rbd_dev_remove_parent() use the parent field rather than parent_spec in determining whether to try to remove any parent devices. Use assertions to indicate that any non-null parent pointer has parent_spec associated with it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:54 -07:00
Alex Elder	b480815a17	rbd: kill __rbd_remove() The function __rbd_remove() is used in two spots, and it's fairly simple. It combines cleanup of part of the ceph-side state as well as cleaning up the Linux-side state. Just open code it in the two callers and eliminate the function. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:53 -07:00
Alex Elder	d1cf578845	rbd: set mapping info earlier Set the mapping size and features earlier in rbd_dev_probe_finish(). Define rbd_dev_mapping_clear() as an inverse for setting those fields, and use it both in error handling in rbd_dev_image_probe() and in the final cleanup in rbd_dev_release(). Change the name of rbd_dev_set_mapping() to of rbd_dev_mapping_set(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:51 -07:00
Alex Elder	05a46afdc7	rbd: encapsulate removing parent devices Encapsulate the code that removes an rbd device's parent images into a new function, rbd_dev_remove_parent(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:50 -07:00
Alex Elder	124afba25d	rbd: encapsulate probing for parent devices Encapsulate the code that probes for an rbd device's parent images into a new function, rbd_dev_probe_parent(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:49 -07:00
Alex Elder	b5156e76da	rbd: defer setting disk capacity Don't set the disk capacity until right before we announce the device as available for use. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:48 -07:00
Alex Elder	129b79d449	rbd: only set device exists flag when ready Hold off setting the EXISTS rbd device flag until just before we announce the disk as available for use. There's no point in doing so any earlier than that, and at that point the device truly is fully set up and ready to use. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:47 -07:00
Alex Elder	fc71d8330e	rbd: fix up some sysfs stuff This just tweaks a few things in the routines that implement rbd sysfs files. All of the entries for an rbd device in /sys/bus/rbd/devices/<id>/ will represent information whose valid values are known by the time they are accessible. Right now we get the size of the mapped image by a call to get_capacity(). There's no need to do this, because that will return what we last set the capacity to, which is just the size recorded for the mapping. So just show that value instead. We also get this under protection of the header semaphore, in order to provide a precisely correct value. This isn't really necessary; these files are really informational only and it's not necessary to be so careful. Finally, print a special value in case the major device number is not recorded. Right now that won't matter much but soon the parent images won't have devices associated with them. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:46 -07:00
Alex Elder	e28626a08b	rbd: fix a bug in resizing a mapping When a snapshot context update occurs, rbd_update_mapping_size() is called to set the capacity of the disk to record the updated size of the image in case it has changed. There's a bug though. The mapping size is in units of bytes. The code that updates the mapping size field is assigning a value that has been scaled down to sectors. Fix that. Also, check to see if the size has actually changed, and don't bother updating things (specifically, calling set_capacity()) if it has not. This resolves: http://tracker.ceph.com/issues/4833 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:45 -07:00
Alex Elder	2e9f7f1c0d	rbd: refactor rbd_dev_probe_update_spec() Fairly straightforward refactoring of rbd_dev_probe_update_spec(). The name is changed to rbd_dev_spec_update(). Rearrange it so nothing gets assigned to the spec until all of the names have been successfully acquired. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:44 -07:00
Alex Elder	71f293e26e	rbd: rename rbd_dev_probe() Rename rbd_dev_probe() to be rbd_dev_image_probe(). Its purpose will eventually be to probe for the existence of a valid rbd image for the rbd device--focusing only on the ceph side and not the Linux device side of initialization. For now the two "sides" are not fully separated, and this function is still the entry point for initializing the full rbd device. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:43 -07:00
Alex Elder	9f5dffdc8f	rbd: make rbd_dev_destroy() match rbd_dev_create() Currently, rbd_dev_destroy() does more than just the inverse of what rbd_dev_create() does. Stop doing that, and move the two extra things it does into the three call sites. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:42 -07:00
Alex Elder	468521c1b1	rbd: define rbd snap context routines Encapsulate the creation of a snapshot context for rbd in a new function rbd_snap_context_create(). Define rbd wrappers for getting and dropping references to them once they're created. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:41 -07:00
Alex Elder	c0cd10db46	rbd: use rbd_warn(), not WARN_ON() Change some calls to WARN_ON() so they use rbd_warn() instead, so we get consistent messaging. A few remain but they can probably just go away eventually. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:40 -07:00
Alex Elder	500d0c0fbb	rbd: move stripe_unit and stripe_count into header This commit added fetching if fancy striping parameters: 09186ddb rbd: get and check striping parameters They are almost unused, but the two fields storing the information really belonged in the rbd_image_header structure. This patch moves them there. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:39 -07:00
Alex Elder	ecb4dc2256	rbd: make rbd spec names pointer to const Make the names and image id in an rbd_spec be pointers to constant data. This required the use of a local variable to hold the snapshot name in rbd_add_parse_args() to avoid a warning. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:37 -07:00
Alex Elder	e1d4213f09	rbd: set snapshot id in rbd_dev_probe_update_spec() Set the rbd spec's snapshot id for an image getting mapped in rbd_dev_probe_update_spec() rather than rbd_dev_set_mapping(). This is the more logical place for that to happen (even though it means we might look up the snapshot by name twice). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:36 -07:00
Alex Elder	8b0241f85a	rbd: have snap_by_name() return a snapshot A function called snap_by_name() ought to just look up a snapshot by name. It does that, but then it assigns some stuff to the rbd device structure as well. Change the function to do just the lookup, and have the caller do the assignments that follow. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:35 -07:00
Alex Elder	5655c4d940	rbd: fix image id leak in initial probe If a format 2 image id is found for an image being mapped, but the subsequent probe of the image fails, rbd_dev_probe() quits without freeing the image id. Fix that. Also drop a redundant hunk of code in rbd_dev_image_id(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:34 -07:00
Alex Elder	c0fba36880	rbd: have rbd_dev_image_id() set format 1 image id Currently, rbd_dev_probe() assumes that any error returned by rbd_dev_image_id() is most likely -ENOENT, and responds by calling the format 1 probe routine, rbd_dev_v1_probe(). Then, at the top of rbd_dev_v1_probe(), an empty string is allocated for the image id. This is sort of unbalanced. Fix this by having rbd_dev_image_id() look for -ENOENT from its "get_id" method call. If that is seen, have it allocate the empty string there rather than depending on rbd_dev_v1_probe() to do it. Given that this is effectively defining the format of the image, set rbd_dev->image_format inside rbd_dev_image_id() rather than in the format-specific probe routines. Also drop a redundant hunk of code in rbd_dev_image_id(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:33 -07:00
Alex Elder	a0cab92432	rbd: avoid dropping extra reference in rbd_free_disk() I found during some failure injection testing that the call to rbd_free_disk() in the error path of rbd_dev_probe_finish() was dropping an extra reference to the disk queue. The problem occurred when put_disk tried to drop a reference to the disk's queue. A call to blk_cleanup_queue() just prior to that will have also dropped a reference to the queue. The problem is that the reference dropped by put_disk() is assumed to have been taken by add_disk(). Our code has error paths that can occur after the disk and its queue are initialized, but before the call to add_disk(), and in those paths we won't have that extra reference. The fix is easy though. In rbd_free_disk() we're already checking the disk's GENHD_FL_UP flag. That flag is an indication that add_disk() has been called, so just call blk_cleanup_queue() conditional on that flag being set. This resolves: http://tracker.ceph.com/issues/4800 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:32 -07:00
Alex Elder	f40eb349e0	rbd: use rbd_obj_method_sync() return value Now that rbd_obj_method_sync() returns the number of bytes returned by the method call, that value should be used by callers to ensure we don't overrun the valid portion of the buffer. Fix the two spots that remained that weren't doing that, rbd_dev_image_name() and rbd_dev_v2_snap_name(). Rearrange the error path slightly in rbd_dev_v2_snap_name(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:31 -07:00
Alex Elder	6e584f5244	rbd: fix leak of format 2 snapshot names When the snapshot context for an rbd device gets updated (or the initial one is recorded) a a list of snapshot structures is created to represent them, one entry per snapshot. Each entry includes a dynamically-allocated copy of the snapshot name. Currently the name is allocated in rbd_snap_create(), as a duplicate of the passed-in name. For format 1 images, the snapshot name provided is just a pointer to an existing name. But for format 2 images, the passed-in name is already dynamically allocated, and in the the process of duplicating it here we are leaking the passed-in name. Fix this by dynamically allocating the name for format 1 snapshots also, and then stop allocating a duplicate in rbd_snap_create(). Change rbd_dev_v1_snap_info() so none of its parameters is side-effected unless it's going to return success. This is part of: http://tracker.ceph.com/issues/4803 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:30 -07:00
Alex Elder	6087b51b9e	rbd: rename __rbd_add_snap_dev() Rename __rbd_add_snap_dev() to be rbd_snap_create(). We no longer have devices for non-mapped snapshots, and we're not actually "adding" it to the list in this function, just creating it. Rename rbd_remove_snap_dev() to be rbd_snap_destroy() for reasons similar to the above. Stop having this function delete the snapshot from its list (to be symmetrical with its create counterpart) and do that in the caller instead. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:29 -07:00
Alex Elder	acb1b6caf1	rbd: only update values on snap_info success Change rbd_dev_v2_snap_info() so it only ever sets values of the size and features parameters if looking up the snapshot name was successful. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:28 -07:00
Alex Elder	c86f86e9e7	rbd: make snap_size order parameter optional Only one of the two callers of _rbd_dev_v2_snap_size() needs the order value returned. So make that an optional argument--a null pointer if the caller doesn't need it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:27 -07:00
Alex Elder	522a0cc0f0	rbd: fix leak of snapshots during initial probe When an rbd image is initially mapped, its snapshot context is collected, and then a list of snapshot entries representing the snapshots in that context is created. The list is created using rbd_dev_snaps_update(). (This function also supports updating an existing snapshot list based on a new snapshot context.) If an error occurs, updating the list is aborted, and the list is currently left as-is, in an inconsistent state. At that point, there may be a partially-constructed list, but the calling functions (rbd_dev_probe_finish() from rbd_dev_probe() from rbd_add()) never clean them up. So this constitutes a leak. A snapshot list that is inconsistent with the current snapshot context is of no use, and might even be actively bad. So rather than just having the caller clean it up, have rbd_dev_snaps_update() just clear out the entire snapshot list in the event an error occurs. The other place rbd_dev_snaps_update() is used is when a refresh is triggered, either because of a watch callback or via a write to the /sys/bus/rbd/devices/<id>/refresh interface. An error while updating the snapshots has no substantive effect in either of those cases, but one of them issues a warning. Move that warning to the common rbd_dev_refresh() function so it gets issued regardless of how it got initiated. This is part of: http://tracker.ceph.com/issues/4803 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:26 -07:00
Alex Elder	3e83b65bb9	rbd: don't create sysfs entries for non-mapped snapshots When an rbd image gets mapped a device entry gets created for it under /sys/bus/rbd/devices/<id>/. Inside that directory there are sysfs files that contain information about the image: its size, feature bits, major device number, and so on. Additionally, if that image has any snapshots, a device entry gets created for each of those as a "child" of the mapped device. Each of these is a subdirectory of the mapped device, and each directory contains a few files with information about the snapshot (its snapshot id, size, and feature mask). There is no clear benefit to having those device entries for the snapshots. The information provided via sysfs of of little real value--and all of it is available via rbd CLI commands. If we still wanted to see the kernel's view of this information it could be done much more simply by including it in a single sysfs file for the mapped image. But there is a clear cost to supporting them. Every time a snapshot context changes, these entries need to be updated (deleted snapshots removed, new snapshots created). The rbd driver is notified of changes to the snapshot context via callbacks from an osd, and care must be taken to coordinate removal of snapshot data structures with the possibility of one these notifications occurring. Things would be considerably simpler if we just didn't have to maintain device entries for the snapshots. So get rid of them. The ability to map a snapshot of an rbd image will remain; the only thing lost will be the ability to query these sysfs directories for information about snapshots of mapped images. This resolves: http://tracker.ceph.com/issues/4796 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:25 -07:00
Alex Elder	770eba6e29	rbd: activate support for layered images Now that we have most everything in place to support layered rbd images, enable support for them in the kernel client. Issue a warning to the log that the support is considered experimental whenever a format 2 layered image is mapped. Note that we also have to claim to support the STRIPINGV2 feature, due to a mistake in the way the rbd CLI set up those flags. This feature can work if it has the right parameters, and safeguards have been put in place to reject those images that do not have compatible parameters. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:23 -07:00
Alex Elder	cc070d59bc	rbd: get and check striping parameters If an rbd format 2 image indicates it supports the STRIPINGV2 feature we need to find out its stripe unit and stripe count in order to know whether we can use it. We don't yet support fancy striping fully, but if the default parameters are used the behavior is indistinguishible from non-fancy striping. This is necessary because some images require the STRIPINGV2 feature even if they use the default parameters. (Which is to say the feature bit was erroneously set even if the feature was not used.) This resolves: http://tracker.ceph.com/issues/4709 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:21 -07:00
Alex Elder	57385b51c3	rbd: have rbd_obj_method_sync() return transfer count Callers of rbd_obj_method_sync() don't know how many bytes of data got returned by the class method call. As a result, they have been assuming enough got returned to decode whatever was expected. This isn't safe. We know how many bytes got transferred, so have rbd_obj_method_sync() return that amount (rather than just 0) if the call is successful. Change all callers to use this return value to ensure decoding of the results is done safely. On the other hand, most callers of rbd_obj_method_sync() only indicate success or failure, so all of their callers can simply test for non-zero result. This resolves: http://tracker.ceph.com/issues/4773 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:20 -07:00
Alex Elder	4157976b27	rbd: void data pointers for rbd_obj_method_sync() Make the inbound and outbound data parameters have void rather than character type for rbd_obj_method_sync(). This makes it more clear they don't expect typed data, and eliminates the need for some silly type casts. One more unrelated change: define the features buffer used in _rbd_dev_v2_snap_features() to be a packed data structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:19 -07:00
Alex Elder	80ef15bf71	rbd: give rbd_obj_read_sync() buffer void type Make the buf parameter into which the data is to be read have type void pointer. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:18 -07:00
Alex Elder	a9e8ba2cb3	rbd: enforce parent overlap A clone image has a defined overlap point with its parent image. That is the byte offset beyond which the parent image has no defined data to back the clone, and anything thereafter can be viewed as being zero-filled by the clone image. This is needed because a clone image can be resized. If it gets resized larger than the snapshot it is based on, the overlap defines the original size. If the clone gets resized downward below the original size the new clone size defines the overlap. If the clone is subsequently resized to be larger, the overlap won't be increased because the previous resize invalidated any parent data beyond that point. This resolves: http://tracker.ceph.com/issues/4724 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:15 -07:00
Alex Elder	0eefd470f0	rbd: issue a copyup for layered writes This implements the main copyup functionality for layered writes. Here we add a copyup_pages field to the object request, which is used only for copyup requests to keep track of the page array containing data read from the parent image. A copyup request is currently the only request rbd has that requires two osd operations. Because of this we handle copyup specially. All image object requests get an osd request allocated when they are created. For a write request, if a copyup is required, the osd request originally allocated is released, and a new one (with room for two osd ops) is allocated to replace it. A new function rbd_osd_req_create_copyup() allocates an osd request suitable for a copyup request. The first op is then filled with a copyup object class method call, supplying the array of pages containing data read from the parent. The second op is filled in with the original write request. The original request otherwise remains intact, and it describes the original write request (found in the second osd op). The presence of the copyup op is sort of implicit; a non-null copyup_pages field could be used to distinguish between a "normal" write request and a request containing both a copyup call and a write. This resolves: http://tracker.ceph.com/issues/3419 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:14 -07:00
Alex Elder	3d7efd18d9	rbd: implement full object parent reads As a step toward implementing layered writes, implement reading the data for a target object from the parent image for a write request whose target object is known to not exist. Add a copyup_pages field to an image request to track the page array used (only) for such a request. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:13 -07:00
Laurent Barbe	d98df63ea7	rbd: revalidate_disk upon rbd resize If rbd disk is open and rbd resize is done, new size is not visible by filesystem. Like is done in virtio-blk and dm driver, revalidate_disk() permits to update the bd_inode size. Signed-off-by: Laurent Barbe <laurent@ksperis.com> Reviewed-by: Alex Elder <elder@inktank.com>	2013-05-01 21:19:12 -07:00
Alex Elder	f1a4739f33	rbd: support page array image requests This patch adds the ability to build an image request whose data will be written from or read into memory described by a page array. (Previously only bio lists were supported.) Originally this was going to define a new function for this purpose but it was largely identical to the rbd_img_request_fill_bio(). So instead, rbd_img_request_fill_bio() has been generalized to handle both types of image request. For the moment we still only fill image requests with bio data. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:11 -07:00
Alex Elder	b9434c5b43	rbd: define zero_pages() Define a new function zero_pages() that zeroes a range of memory defined by a page array, along the lines of zero_bio_chain(). It saves and the irq flags like bvec_kmap_irq() does, though I'm not sure at this point that it's necessary. Update rbd_img_obj_request_read_callback() to use the new function if the object request contains page rather than bio data. For the moment, only bio data is used for osd READ ops. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:10 -07:00
Alex Elder	b454e36d26	rbd: encapsulate submission of image object requests Object requests that are part of an image request are subject to some additional handling. Define rbd_img_obj_request_submit() to encapsulate that, and use it when initially submitting an image object request, and when re-submitting it during callback of an object existence check. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:08 -07:00
Alex Elder	9d4df01f08	rbd: define separate read and write format funcs Separate rbd_osd_req_format() into two functions, one for read requests and the other for write requests. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:07 -07:00
Alex Elder	c5b5ef6c51	rbd: issue stat request before layered write This is a step toward fully implementing layered writes. Add checks before request submission for the object(s) associated with an image request. For write requests, if we don't know that the target object exists, issue a STAT request to find out. When that request completes, mark the known and exists flags for the original object request accordingly and re-submit the object request. (Note that this still does the existence check only; the copyup operation is not yet done.) A new object request is created to perform the existence check. A pointer to the original request is added to that object request to allow the stat request to re-issue the original request after updating its flags. If there is a failure with the stat request the error code is stored with the original request, which is then completed. This resolves: http://tracker.ceph.com/issues/3418 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:04 -07:00
Alex Elder	5679c59f60	rbd: add target object existence flags This creates two new flags for object requests to indicate what is known about the existence of the object to which a request is to be sent. The KNOWN flag will be true if the the EXISTS flag is meaningful. That is: KNOWN EXISTS ----- ------ 0 0 don't know whether the object exists 0 1 (not used/invalid) 1 0 object is known to not exist 1 0 object is known to exist This will be used in determining how to handle write requests for data objects for layered rbd images. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:03 -07:00
Alex Elder	57acbaa7fb	rbd: always check IMG_DATA flag In a few spots, whether the an object request's img_request pointer is null is used to determine whether an object request is being done as part of an image data request. Stop doing that, and instead always use the object request IMG_DATA flag for that purpose. Swap the order of the definition of the IMG_DATA and DONE flag helpers, because obj_request_done_set() now refers to obj_request_img_data_set() to get its rbd_dev value. This will become important because the img_request pointer is about to become part of a union. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:02 -07:00
Alex Elder	b155e86cf6	rbd: adjust image object request ref counting An extra reference is taken when an object request is added as one of the requests making up an image object. A reference is dropped again when the image's object requests get submitted. The original reference for the object request will remain throughout this period, so we don't need to add and then take away an extra one. This can be interpreted as the image request inheriting the original object request's reference. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:01 -07:00
Alex Elder	406e2c9f92	libceph: kill off osd data write_request parameters In the incremental move toward supporting distinct data items in an osd request some of the functions had "write_request" parameters to indicate, basically, whether the data belonged to in_data or the out_data. Now that we maintain the data fields in the op structure there is no need to indicate the direction, so get rid of the "write_request" parameters. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:58 -07:00
Alex Elder	8b3e1a5698	rbd: implement layered reads Implement layered read requests for format 2 rbd images. If an rbd image is a clone of a snapshot, the snapshot will be the clone's "parent" image. When an object read request on a clone comes back with ENOENT it indicates that the clone is not yet populated with that portion of the image's data, and the parent image should be consulted to satisfy the read. When this occurs, a new image request is created, directed to the parent image. The offset and length of the image are the same as the image-relative offset and length of the object request that produced ENOENT. Data from the parent image therefore satisfies the object read request for the original image request. While this code works, it will not be active until we enable the layering feature (by adding RBD_FEATURE_LAYERING to the value of RBD_FEATURES_SUPPORTED). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:48 -07:00
Alex Elder	2f82ee54d9	rbd: probe the parent of an image if present Call the probe function for the parent device if one is present. Since we don't formally support the layering feature we won't be using this functionality just yet. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:47 -07:00
Alex Elder	6365d33a27	rbd: add an object request flag for image data objects Add a flag to distinguish between object requests being done on standalone objects and requests being sent for objects representing rbd image data (i.e., object requests that are the result of image request). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:46 -07:00
Alex Elder	926f9b3f08	rbd: define an rbd object request flags field We're going to need some more Boolean values for object requests, so create a flags bit field and use it to record whether the request is done. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:44 -07:00
Alex Elder	1217857fbf	rbd: encapsulate image object end request handling Encapsulate the code that completes processing of an object request that's part of an image request. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:43 -07:00
Alex Elder	d0b2e94455	rbd: define image request layered flag Define a flag indicating whether an image request is for a layered image (one with a parent image to which requests will be redirected if the target object of a request does not exist). The code that checks this flag will be added shortly. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:42 -07:00
Alex Elder	9849e98636	rbd: define image request originator flag Define a flag indicating whether an image request originated from the Linux block layer (from blk_fetch_request()) or whether it was initiated in order to satisfy an object request for a child image of a layered rbd device. For image requests initiated by objects of child images we'll save a pointer to the object request rather than the Linux block request. For now, only block requests are used. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:41 -07:00
Alex Elder	0c425248e0	rbd: define image request flags There are several Boolean values we'll be maintaining for image requests. Switch from the single write_request field to a general-purpose flags field, and use one if its bits to represent the direction of I/O for the image request. Define helper functions for setting and testing that flag. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:40 -07:00
Alex Elder	7da22d296d	rbd: record image-relative offset in object requests For an image object request we will need to know what offset within the rbd image the request covers. Record that when the object request gets created. Update the I/O error warnings so they use this so what's reported is more informative. Rename a local variable to fit the convention used everywhere else. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:39 -07:00
Alex Elder	55f27e0931	rbd: record aggregate image transfer count Compute the total number of bytes transferred for an image request--the sum across each of the request's object requests. To avoid contention do it only when all object requests are complete, in rbd_img_request_complete(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:38 -07:00
Alex Elder	a5a337d438	rbd: record overall image request result If any image object request produces a non-zero result, preserve that as the result of the overall image request. If multiple objects have non-zero results, save only the first one. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:37 -07:00
Alex Elder	5cbf6f12c4	rbd: update feature bits There is a new rbd feature bit defined for "fancy striping." Add it to the ones defined in the kernel client. Change RBD_FEATURES_ALL so it represents the set of all feature bits (rather than just the ones we support). Define a new symbol RBD_FEATURES_SUPPORTED to indicate the supported ones. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:36 -07:00
Alex Elder	04017e29bb	libceph: make method call data be a separate data item Right now the data for a method call is specified via a pointer and length, and it's copied--along with the class and method name--into a pagelist data item to be sent to the osd. Instead, encode the data in a data item separate from the class and method names. This will allow large amounts of data to be supplied to methods without copying. Only rbd uses the class functionality right now, and when it really needs this it will probably need to use a page array rather than a page list. But this simple implementation demonstrates the functionality on the osd client, and that's enough for now. This resolves: http://tracker.ceph.com/issues/4104 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:35 -07:00
Alex Elder	a4ce40a9a7	libceph: combine initializing and setting osd data This ends up being a rather large patch but what it's doing is somewhat straightforward. Basically, this is replacing two calls with one. The first of the two calls is initializing a struct ceph_osd_data with data (either a page array, a page list, or a bio list); the second is setting an osd request op so it associates that data with one of the op's parameters. In place of those two will be a single function that initializes the op directly. That means we sort of fan out a set of the needed functions: - extent ops with pages data - extent ops with pagelist data - extent ops with bio list data and - class ops with page data for receiving a response We also have define another one, but it's only used internally: - class ops with pagelist data for request parameters Note that we still haven't gotten rid of the osd request's r_data_in and r_data_out fields. All the osd ops refer to them for their data. For now, these data fields are pointers assigned to the appropriate r_data_* field when these new functions are called. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:23 -07:00
Alex Elder	2169238dd3	rbd: rearrange some code for consistency This patch just trivially moves around some code for consistency. In preparation for initializing osd request data fields in ceph_osdc_build_request(), I wanted to verify that rbd did in fact call that immediately before it called ceph_osdc_start_request(). It was true (although image requests are built in a group and then started as a group). But I made the changes here just to make it more obvious, by making all of the calls follow a common sequence: osd_req_op_<optype>_init(); ceph_osd_data_<type>_init() osd_req_op_<optype>_<datafield>() rbd_osd_req_format() ... ret = rbd_obj_request_submit() I moved the initialization of the callback for image object requests into rbd_img_request_fill_bio(), again, for consistency. To avoid a forward reference, I moved the definition of rbd_img_obj_callback() up in the file. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:18 -07:00
Alex Elder	44cd188d48	rbd: separate initialization of osd data The osd data for a request is currently initialized inside rbd_osd_req_create(), but that assumes an object request's data belongs in the osd request's data in or data out field. There are only three places where requests with data are set up, and it turns out it's easier to call just the osd data init routines directly there rather than handling it in rbd_osd_req_create(). (The real motivation here is moving toward getting rid of the osd request in and out data fields.) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:17 -07:00

1 2 3 4 5 ...

461 Commits