Commit Graph

726 Commits

Author SHA1 Message Date
Alex Elder d23a4b3fd6 rbd: fix safety of rbd_put_client()
The rbd_client structure uses a kref to arrange for cleaning up and
freeing an instance when its last reference is dropped.  The cleanup
routine is rbd_client_release(), and one of the things it does is
delete the rbd_client from rbd_client_list.  It acquires node_lock
to do so, but the way it is done is still not safe.

The problem is that when attempting to reuse an existing rbd_client,
the structure found might already be in the process of getting
destroyed and cleaned up.

Here's the scenario, with "CLIENT" representing an existing
rbd_client that's involved in the race:

 Thread on CPU A                | Thread on CPU B
 ---------------                | ---------------
 rbd_put_client(CLIENT)         | rbd_get_client()
   kref_put()                   |   (acquires node_lock)
     kref->refcount becomes 0   |   __rbd_client_find() returns CLIENT
     calls rbd_client_release() |   kref_get(&CLIENT->kref);
                                |   (releases node_lock)
       (acquires node_lock)     |
       deletes CLIENT from list | ...and starts using CLIENT...
       (releases node_lock)     |
       and frees CLIENT         | <-- but CLIENT gets freed here

Fix this by having rbd_put_client() acquire node_lock.  The result
could still be improved, but at least it avoids this problem.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-02 12:56:59 -08:00
Alex Elder 97bb59a03d rbd: fix a memory leak in rbd_get_client()
If an existing rbd client is found to be suitable for use in
rbd_get_client(), the rbd_options structure is not being
freed as it should.  Fix that.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-02 12:49:27 -08:00
Alex Elder 0e805a1d85 rbd: initialize snap_rwsem in rbd_add()
New rbd device structures get initialized in rbd_add().  Many of
the fields rely on being initially zero-filled.  However we lockdep
was noticing that the rw_semaphore embedded in the header field
was not getting properly initialized.  Fix that.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-01-12 11:00:50 -08:00
Josh Durgin 51703306b3 rbd: remove buggy rollback functionality
This doesn't interact with resizing well, since it doesn't set the
size of the device to the size at the snapshot. It's also an expensive
operation to be synchronous. Rollback can still be done with the
userspace rbd tool.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-12-07 10:46:19 -08:00
Josh Durgin 81e759fbf7 rbd: return an error when an invalid header is read
This protects against opening future rbd images that have incompatible format changes.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-12-07 10:46:10 -08:00
Linus Torvalds 97d2eb13a0 Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-client
* 'for-linus' of git://ceph.newdream.net/git/ceph-client:
  libceph: fix double-free of page vector
  ceph: fix 32-bit ino numbers
  libceph: force resend of osd requests if we skip an osdmap
  ceph: use kernel DNS resolver
  ceph: fix ceph_monc_init memory leak
  ceph: let the set_layout ioctl set single traits
  Revert "ceph: don't truncate dirty pages in invalidate work thread"
  ceph: replace leading spaces with tabs
  libceph: warn on msg allocation failures
  libceph: don't complain on msgpool alloc failures
  libceph: always preallocate mon connection
  libceph: create messenger with client
  ceph: document ioctls
  ceph: implement (optional) max read size
  ceph: rename rsize -> rasize
  ceph: make readpages fully async
2011-10-28 16:42:18 -07:00
Sage Weil 6ab00d465a libceph: create messenger with client
This simplifies the init/shutdown paths, and makes client->msgr available
during the rest of the setup process.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:15 -07:00
Jiri Kosina e060c38434 Merge branch 'master' into for-next
Fast-forward merge with Linus to be able to merge patches
based on more recent version of the tree.
2011-09-15 15:08:18 +02:00
Justin P. Mattock 699324871f treewide: remove extra semicolons from various parts of the kernel
This is a resend from the original, changing the title from PATCH to
RFC(since this is a review for commit, and I should have put that the first go around).
and also removing some of the commit's with ia64 and bash since it is significant.
let me know if I might have missed anything etc..

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15 14:50:49 +02:00
Josh Durgin 029bcbd8b0 rbd: set blk_queue request sizes to object size
This improves performance since more requests can be merged.

Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
2011-07-26 11:29:35 -07:00
Yehuda Sadeh 79e3057c4c rbd: cancel watch request when releasing the device
We were missing this cleanup, so when a device was released
the osd didn't clean up its watchers list, so following notifications
could be slow as osd needed to timeout on the client.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2011-07-26 11:29:04 -07:00
Sage Weil 9db4b3e327 rbd: handle online resize of underlying rbd image
If we get a notification that the image header has changed, check for
a change in the image size.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-24 11:52:08 -07:00
Sage Weil aedfec59ee rbd: use snprintf for disk->disk_name
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-24 11:52:03 -07:00
Sage Weil 916d4d6727 rbd: cleanup: make kfree match kmalloc
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-24 11:52:01 -07:00
Sage Weil 13143d2d1c rbd: warn on update_snaps failure on notify
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-19 11:25:05 -07:00
Yehuda Sadeh 1fec70932d rbd: fix split bio handling
The rbd driver currently splits bios when they span an object boundary.
However, the blk_end_request expects the completions to roll up the results
in block device order, and the split rbd/ceph ops can complete in any
order.  This patch adds a struct rbd_req_coll to track completion of split
requests and ensures that the results are passed back up to the block layer
in order.

This fixes errors where the file system gets completion of a read operation
that spans an object boundary before the data has actually arrived.  The
bug is easily reproduced with iozone with a working set larger than
available RAM.

Reported-by: Fyodor Ustinov <ufm@ufm.su>
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-13 13:52:57 -07:00
Sage Weil 11f770027b rbd: fix leak of ops struct
The ops vector must be freed by the rbd_do_request caller.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-12 20:59:14 -07:00
Sage Weil 4ad12621e4 libceph: fix ceph_osdc_alloc_request error checks
ceph_osdc_alloc_request returns NULL on failure.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-03 09:28:13 -07:00
Yehuda Sadeh 59c2be1e4d rbd: use watch/notify for changes in rbd header
Send notifications when we change the rbd header (e.g. create a snapshot)
and wait for such notifications.  This allows synchronizing the snapshot
creation between different rbd clients/rools.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-22 11:33:56 -07:00
Yehuda Sadeh 766fc43973 rbd: fix cleanup when trying to mount inexistent image
Previously we didn't clean up the sysfs entry that was just
created.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12 15:15:18 -08:00
Yehuda Sadeh dfc5606dc5 rbd: replace the rbd sysfs interface
The new interface creates directories per mapped image
and under each it creates a subdir per available snapshot.
This allows keeping a cleaner interface within the sysfs
guidelines. The ABI documentation was updated too.

Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-01 15:53:22 -08:00
Dan Carpenter 85b5aaa624 rbd: passing wrong variable to bvec_kunmap_irq()
We should be passing "buf" here insead of "bv".  This is tricky because
it's not the same as kmap() and kunmap().  GCC does warn about it if you
compile on i386 with CONFIG_HIGHMEM.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:38:25 -07:00
Dan Carpenter b8d0638a98 rbd: null vs ERR_PTR
ceph_alloc_page_vector() returns ERR_PTR(-ENOMEM) on errors.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:38:24 -07:00
Yehuda Sadeh f4cf3deef4 block: rbd: removing unnecessary test
rbd_get_segment() can't return a negative value, we don't need to check
the return output.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-10-20 15:38:20 -07:00
Vasiliy Kulikov 28f259b7cd block: rbd: fixed may leaks
rbd_client_create() doesn't free rbdc, this leads to many leaks.

seg_len in rbd_do_op() is unsigned, so (seg_len < 0) makes no sense.
Also if fixed check fails then seg_name is leaked.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-10-20 15:38:19 -07:00
Yehuda Sadeh 602adf4002 rbd: introduce rados block device (rbd), based on libceph
The rados block device (rbd), based on osdblk, creates a block device
that is backed by objects stored in the Ceph distributed object storage
cluster.  Each device consists of a single metadata object and data
striped over many data objects.

The rbd driver supports read-only snapshots.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:38:13 -07:00