OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Guo Chao	b7a1da695f	loopdev: ignore negative offset when calculate loop device size Negative offset may cause loop device size larger than backing file size. $ fallocate -l 1M a $ losetup --offset 0xffffffffffff0000 /dev/loop0 a $ blockdev --getsize64 /dev/loop0 1114112 $ ls -l a -rw-r--r-- 1 root root 1048576 Jan 23 12:46 a $ cat /dev/loop0 cat: /dev/loop0: Input/output error It makes no sense to do that. Only apply offset when it's positive. Fix a typo in the comment by the way. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: M. Hindess <hindessm@uk.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2013-02-22 10:43:22 +01:00
Guo Chao	b1a6650406	loopdev: remove an user triggerable oops When loopdev is built as module and we pass an invalid parameter, loop_init() will return directly without deregister misc device, which will cause an oops when insert loop module next time because we left some garbage in the misc device list. Test case: sudo modprobe loop max_part=1024 (failed due to invalid parameter) sudo modprobe loop (oops) Clean up nicely to avoid such oops. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: M. Hindess <hindessm@uk.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2013-02-22 10:43:22 +01:00
Guo Chao	7b0576a3d8	loopdev: move common code into loop_figure_size() Update block device size in accord with gendisk size and let userspace know the change in loop_figure_size(). This is a clean up to remove common code of loop_figure_size()'s two callers. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: M. Hindess <hindessm@uk.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2013-02-22 10:43:22 +01:00
Guo Chao	541c742a75	loopdev: update block device size in loop_set_status() Loop device driver sometimes fails to impose the size limit on the device. Keep issuing following two commands: losetup --offset 7517244416 --sizelimit 3224971264 /dev/loop0 backed_file blockdev --getsize64 /dev/loop0 blockdev reports file size instead of sizelimit several out of 100 times. The problems are: - losetup set up the device in two ioctl: LOOP_SET_FD and LOOP_SET_STATUS64. - LOOP_SET_STATUS64 only update size of gendisk. Block device size will be updated lazily when device comes to use. If udev rushes in between the two ioctl, it will bring in a block device whose size is backing file size. If the device is not released after LOOP_SET_STATUS64 ioctl, blockdev will not see the updated size. Update block size in LOOP_SET_STATUS64 ioctl. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Reported-by: M. Hindess <hindessm@uk.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2013-02-22 10:43:22 +01:00
Guo Chao	5370019dc2	loopdev: fix a deadlock bd_mutex and lo_ctl_mutex can be held in different order. Path #1: blkdev_open blkdev_get __blkdev_get (hold bd_mutex) lo_open (hold lo_ctl_mutex) Path #2: blkdev_ioctl lo_ioctl (hold lo_ctl_mutex) lo_set_capacity (hold bd_mutex) Lockdep does not report it, because path #2 actually holds a subclass of lo_ctl_mutex. This subclass seems creep into the code by mistake. The patch author actually just mentioned it in the changelog, see commit `f028f3b2` ("loop: fix circular locking in loop_clr_fd()"), also see: http://marc.info/?l=linux-kernel&m=123806169129727&w=2 Path #2 hold bd_mutex to call bd_set_size(), I've protected it with i_mutex in a previous patch, so drop bd_mutex at this site. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: M. Hindess <hindessm@uk.ibm.com> Cc: Nikanth Karthikesan <knikanth@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2013-02-22 10:43:21 +01:00
Cong Ding	7414d4f64b	drivers/block/swim3.c: fix null pointer dereference The use of pointer fs should be after the null check. Signed-off-by: Cong Ding <dinggnu@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2013-02-22 10:42:46 +01:00
Linus Torvalds	06991c28f3	Driver core patches for 3.9-rc1 Here is the big driver core merge for 3.9-rc1 There are two major series here, both of which touch lots of drivers all over the kernel, and will cause you some merge conflicts: - add a new function called devm_ioremap_resource() to properly be able to check return values. - remove CONFIG_EXPERIMENTAL If you need me to provide a merged tree to handle these resolutions, please let me know. Other than those patches, there's not much here, some minor fixes and updates. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlEmV0cACgkQMUfUDdst+yncCQCfbmnQZju7kzWXk6PjdFuKspT9 weAAoMCzcAtEzzc4LXuUxxG/sXBVBCjW =yWAQ -----END PGP SIGNATURE----- Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg Kroah-Hartman: "Here is the big driver core merge for 3.9-rc1 There are two major series here, both of which touch lots of drivers all over the kernel, and will cause you some merge conflicts: - add a new function called devm_ioremap_resource() to properly be able to check return values. - remove CONFIG_EXPERIMENTAL Other than those patches, there's not much here, some minor fixes and updates" Fix up trivial conflicts * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits) base: memory: fix soft/hard_offline_page permissions drivercore: Fix ordering between deferred_probe and exiting initcalls backlight: fix class_find_device() arguments TTY: mark tty_get_device call with the proper const values driver-core: constify data for class_find_device() firmware: Ignore abort check when no user-helper is used firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER firmware: Make user-mode helper optional firmware: Refactoring for splitting user-mode helper code Driver core: treat unregistered bus_types as having no devices watchdog: Convert to devm_ioremap_resource() thermal: Convert to devm_ioremap_resource() spi: Convert to devm_ioremap_resource() power: Convert to devm_ioremap_resource() mtd: Convert to devm_ioremap_resource() mmc: Convert to devm_ioremap_resource() mfd: Convert to devm_ioremap_resource() media: Convert to devm_ioremap_resource() iommu: Convert to devm_ioremap_resource() drm: Convert to devm_ioremap_resource() ...	2013-02-21 12:05:51 -08:00
Linus Torvalds	e177bb587e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k update from Geert Uytterhoeven. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: Sort out !CONFIG_MMU_SUN3 vs. CONFIG_HAS_DMA swim: Add missing spinlock init	2013-02-20 14:27:00 -08:00
Jens Axboe	d4308febf3	Merge branch 'stable/for-jens-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.9/drivers Konrad writes: Please git pull the following branch: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-jens-3.9 which has bug-fixes that did not make it in v3.8. They all are marked as material for the stable tree as well. There are two bug-fixes for the code that has been in there for some time (that is the Jan's fix and one of mine). And there are two bug-fixes for the persistent grant feature that debuted in v3.8 for xen blk[back\|front]end.	2013-02-20 08:26:06 +01:00
Alex Elder	4c7a08c83a	Merge branch 'testing' of github.com:ceph/ceph-client into into linux-3.8-ceph	2013-02-19 19:21:08 -06:00
Alex Elder	903bb32e89	libceph: drop return value from page vector copy routines The return values provided for ceph_copy_to_page_vector() and ceph_copy_from_page_vector() serve no purpose, so get rid of them. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-19 19:14:05 -06:00
Alex Elder	23ed6e13b3	rbd: ignore result of ceph_copy_from_page_vector() The result of ceph_copy_from_page_vector() is simply the length argument it is provided. This is called by rbd_obj_method_sync(), which returns the result if it's non-negative. But we always either ignore or overwrite that return value. So explicitly ignore what's returned by the copy function, and have rbd_obj_method_sync() always return either a negative errno or 0. We also return the result of ceph_copy_from_page_vector() in rbd_obj_read_sync(). There we still want to return the number of bytes transferred, but we can use the value we already have in hand rather than what ceph_copy_from_page_vector() provides. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-19 19:14:05 -06:00
Alex Elder	1ceae7ef0f	rbd: prevent bytes transferred overflow In rbd_obj_read_sync(), verify the number of bytes transferred won't exceed what can be represented by a size_t before using it to indicate the number of bytes to copy to the result buffer. (The real motivation for this is to prepare for the next patch.) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-19 19:14:04 -06:00
Alex Elder	fbfab53966	libceph: allow STAT osd operations Add support for CEPH_OSD_OP_STAT operations in the osd client and in rbd. This operation sends no data to the osd; everything required is encoded in identity of the target object. The result will be ENOENT if the object doesn't exist. If it does exist and no other error occurs the server returns the size and last modification time of the target object as output data (in little endian format). The size is a 64 bit unsigned and the time is ceph_timespec structure (two unsigned 32-bit integers, representing a seconds and nanoseconds value). This resolves: http://tracker.ceph.com/issues/4007 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-19 19:14:03 -06:00
Alex Elder	ef06f4d32a	rbd: add parentheses to object request iterator macros The for_each_obj_request*() macros should parenthesize their uses of the ireq parameter. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-19 19:14:02 -06:00
Roger Pau Monne	087ffecdaa	xen-blkback: use balloon pages for persistent grants With current persistent grants implementation we are not freeing the persistent grants after we disconnect the device. Since grant map operations change the mfn of the allocated page, and we can no longer pass it to __free_page without setting the mfn to a sane value, use balloon grant pages instead, as the gntdev device does. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: stable@vger.kernel.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-02-19 15:17:21 -05:00
Konrad Rzeszutek Wilk	f84adf4921	xen-blkfront: drop the use of llist_for_each_entry_safe Replace llist_for_each_entry_safe with a while loop. llist_for_each_entry_safe can trigger a bug in GCC 4.1, so it's best to remove it and use a while loop and do the deletion manually. Specifically this bug can be triggered by hot-unplugging a disk, either by doing xm block-detach or by save/restore cycle. BUG: unable to handle kernel paging request at fffffffffffffff0 IP: [<ffffffffa0047223>] blkif_free+0x63/0x130 [xen_blkfront] The crash call trace is: ... bad_area_nosemaphore+0x13/0x20 do_page_fault+0x25e/0x4b0 page_fault+0x25/0x30 ? blkif_free+0x63/0x130 [xen_blkfront] blkfront_resume+0x46/0xa0 [xen_blkfront] xenbus_dev_resume+0x6c/0x140 pm_op+0x192/0x1b0 device_resume+0x82/0x1e0 dpm_resume+0xc9/0x1a0 dpm_resume_end+0x15/0x30 do_suspend+0x117/0x1e0 When drilling down to the assembler code, on newer GCC it does .L29: cmpq $-16, %r12 #, persistent_gnt check je .L30 #, out of the loop .L25: ... code in the loop testq %r13, %r13 # n je .L29 #, back to the top of the loop cmpq $-16, %r12 #, persistent_gnt check movq 16(%r12), %r13 # <variable>.node.next, n jne .L25 #, back to the top of the loop .L30: While on GCC 4.1, it is: L78: ... code in the loop testq %r13, %r13 # n je .L78 #, back to the top of the loop movq 16(%rbx), %r13 # <variable>.node.next, n jmp .L78 #, back to the top of the loop Which basically means that the exit loop condition instead of being: &(pos)->member != NULL; is: ; which makes the loop unbound. Since xen-blkfront is the only user of the llist_for_each_entry_safe macro remove it from llist.h. Orabug: 16263164 CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-02-19 15:17:08 -05:00
Konrad Rzeszutek Wilk	01c681d4c7	xen/blkback: Don't trust the handle from the frontend. The 'handle' is the device that the request is from. For the life-time of the ring we copy it from a request to a response so that the frontend is not surprised by it. But we do not need it - when we start processing I/Os we have our own 'struct phys_req' which has only most essential information about the request. In fact the 'vbd_translate' ends up over-writing the preq.dev with a value from the backend. This assignment of preq.dev with the 'handle' value is superfluous so lets not do it. Cc: stable@vger.kernel.org Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-02-19 15:17:03 -05:00
Jan Beulich	9d092603cc	xen-blkback: do not leak mode property "be->mode" is obtained from xenbus_read(), which does a kmalloc() for the message body. The short string is never released, so do it along with freeing "be" itself, and make sure the string isn't kept when backend_changed() doesn't complete successfully (which made it desirable to slightly re-structure that function, so that the error cleanup can be done in one place). Reported-by: Olaf Hering <olaf@aepfle.de> CC: stable@vger.kernel.org Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-02-19 15:16:52 -05:00
Philip J Kelleher	c206c70924	block: IBM RamSan 70/80 driver fixes This patch includes the following driver fixes for the IBM RamSan 70/80 driver: o Changed the creg_ctrl lock from a mutex to a spinlock. o Added a count check for ioctl calls. o Removed unnecessary casting of void pointers. o Made every function static that needed to be. o Added comments to explain things more thoroughly. Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2013-02-18 21:35:59 +01:00
Alex Elder	3c663bbdcd	libceph: kill ceph_osdc_create_event() "one_shot" parameter There is only one caller of ceph_osdc_create_event(), and it provides 0 as its "one_shot" argument. Get rid of that argument and just use 0 in its place. Replace the code in handle_watch_notify() that executes if one_shot is nonzero in the event with a BUG_ON() call. While modifying "osd_client.c", give handle_watch_notify() static scope. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-18 12:20:00 -06:00
Linus Torvalds	11e7651432	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc Pull sparc fixes from David Miller: "A couple small fixes for sparc including some THP brown-paper-bag material: 1) During the merging of all the THP support for various architectures, sparc missed adding a HAVE_ARCH_TRANSPARENT_HUGEPAGE to it's Kconfig, oops. 2) Sparc needs to be mindful of hugepages in get_user_pages_fast(). 3) Fix memory leak in SBUS probe, from Cong Ding. 4) The sunvdc virtual disk client driver has a test of the bitmask of vdisk server supported operations which was off by one bit" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sunvdc: Fix off-by-one in generic_request(). sparc64: Fix get_user_pages_fast() wrt. THP. sparc64: Add missing HAVE_ARCH_TRANSPARENT_HUGEPAGE. sparc: kernel/sbus.c: fix memory leakage	2013-02-15 12:05:57 -08:00
David S. Miller	f4d9605434	sunvdc: Fix off-by-one in generic_request(). The 'operations' bitmap corresponds one-for-one with the operation codes, no adjustment is necessary. Reported-by: Mark Kettenis <mark.kettenis@xs4all.nl> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-14 11:49:01 -08:00
Jens Axboe	ec8edc764e	Merge branch 'delete-xt-disk' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux into for-3.9/drivers Paul writes: Please pull the following to get the removal of the original IBM PC-XT hard disk driver from the block layer (drivers/block/xd.c). As near as I can tell, it hasn't seen a run time fix in over a dozen years, and with drive sizes of 10-20MB, and performance of about 128kB/s maximum, it is no surprise that it has been completely unused for well over a decade. The removal was originally posted[1] well over a month ago, and since then, there has been nobody objecting to the removal, aside from someone who had mistakenly confused it with a completely different driver (hd.c)	2013-02-14 16:29:34 +01:00
Alex Elder	077413082f	rbd: add barriers near done flag operations Somehow, I missed this little item in Documentation/atomic_ops.txt: * WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! * Create and use some helper functions that include the proper memory barriers for manipulating the done field. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:11 -08:00
Alex Elder	a14ea269dd	rbd: turn off interrupts for open/remove locking This commit: bc7a62ee5 rbd: prevent open for image being removed added checking for removing rbd before allowing an open, and used the same request spinlock for protecting that and updating the open count as is used for the request queue. However it used the non-irq protected version of the spinlocks. Fix that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:11 -08:00
Alex Elder	9cbb1d7268	libceph: don't require r_num_pages for bio requests There is a check in the completion path for osd requests that ensures the number of pages allocated is enough to hold the amount of incoming data expected. For bio requests coming from rbd the "number of pages" is not really meaningful (although total length would be). So stop requiring that nr_pages be supplied for bio requests. This is done by checking whether the pages pointer is null before checking the value of nr_pages. Note that this value is passed on to the messenger, but there it's only used for debugging--it's never used for validation. While here, change another spot that used r_pages in a debug message inappropriately, and also invalidate the r_con_filling_msg pointer after dropping a reference to it. This resolves: http://tracker.ceph.com/issues/3875 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:11 -08:00
Alex Elder	1e32d34cfa	rbd: don't take extra bio reference for osd client Currently, if the OSD client finds an osd request has had a bio list attached to it, it drops a reference to it (or rather, to the first entry on that list) when the request is released. The code that added that reference (i.e., the rbd client) is therefore required to take an extra reference to that first bio structure. The osd client doesn't really do anything with the bio pointer other than transfer it from the osd request structure to outgoing (for writes) and ingoing (for reads) messages. So it really isn't the right place to be taking or dropping references. Furthermore, the rbd client already holds references to all bio structures it passes to the osd client, and holds them until the request is completed. So there's no need for this extra reference whatsoever. So remove the bio_put() call in ceph_osdc_release_request(), as well as its matching bio_get() call in rbd_osd_req_create(). This change could lead to a crash if old libceph.ko was used with new rbd.ko. Add a compatibility check at rbd initialization time to avoid this possibilty. This resolves: http://tracker.ceph.com/issues/3798 and http://tracker.ceph.com/issues/3799 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:11 -08:00
Alex Elder	b82d167be6	rbd: prevent open for image being removed An open request for a mapped rbd image can arrive while removal of that mapping is underway. We need to prevent such an open request from succeeding. (It appears that Maciej Galkiewicz ran into this problem.) Define and use a "removing" flag to indicate a mapping is getting removed. Set it in the remove path after verifying nothing holds the device open. And check it in the open path before allowing the open to proceed. Acquire the rbd device's lock around each of these spots to avoid any races accessing the flags and open_count fields. This addresses: http://tracker.newdream.net/issues/3427 Reported-by: Maciej Galkiewicz <maciejgalkiewicz@ragnarson.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:10 -08:00
Alex Elder	6d292906f8	rbd: define flags field, use it for exists flag Define a new rbd device flags field, manipulated using bit operations. Replace the use of the current "exists" flag with a bit in this new "flags" field. Add a little commentary about the "exists" flag, which does not need to be manipulated atomically. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:10 -08:00
Alex Elder	8eb8756530	rbd: don't drop watch requests on completion When we register an osd request to linger, it means that request will stay around (under control of the osd client) until we've unregistered it. We do that for an rbd image's header object, and we keep a pointer to the object request associated with it. Keep a reference to the watch object request for as long as it is registered to linger. Drop it again after we've removed the linger registration. This resolves: http://tracker.ceph.com/issues/3937 (Note: this originally came about because the osd client was issuing a callback more than once. But that behavior will be changing soon, documented in tracker issue 3967.) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:10 -08:00
Alex Elder	25dcf954c3	rbd: decrement obj request count when deleting Decrement the obj_request_count value when deleting an object request from its image request's list. Rearrange a few lines in the surrounding code. This resolves: http://tracker.ceph.com/issues/3940 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:10 -08:00
Alex Elder	975241afcb	rbd: track object rather than osd request for watch Switch to keeping track of the object request pointer rather than the osd request used to watch the rbd image header object. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:10 -08:00
Alex Elder	6977c3f983	rbd: unregister linger in watch sync routine Move the code that unregisters an rbd device's lingering header object watch request into rbd_dev_header_watch_sync(), so it occurs in the same function that originally sets up that request. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:09 -08:00
Alex Elder	9f20e02a53	rbd: get rid of rbd_req_sync_exec() Get rid rbd_req_sync_exec() because it is no longer used. That eliminates the last use of rbd_req_sync_op(), so get rid of that too. And finally, that leaves rbd_do_request() unreferenced, so get rid of that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:09 -08:00
Alex Elder	36be9a7618	rbd: implement sync method with new code Reimplement synchronous object method calls using the new request tracking code. Use the name rbd_obj_method_sync() Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:09 -08:00
Alex Elder	cf81b60e4b	rbd: send notify ack asynchronously When we receive notification of a change to an rbd image's header object we need to refresh our information about the image (its size and snapshot context). Once we have refreshed our rbd image we need to acknowledge the notification. This acknowledgement was previously done synchronously, but there's really no need to wait for it to complete. Change it so the caller doesn't wait for the notify acknowledgement request to complete. And change the name to reflect it's no longer synchronous. This resolves: http://tracker.newdream.net/issues/3877 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:09 -08:00
Alex Elder	5ae9db81b4	rbd: get rid of rbd_req_sync_notify_ack() Get rid rbd_req_sync_notify_ack() because it is no longer used. As a result rbd_simple_req_cb() becomes unreferenced, so get rid of that too. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:09 -08:00
Alex Elder	b8d70035b3	rbd: use new code for notify ack Use the new object request tracking mechanism for handling a notify_ack request. Move the callback function below the definition of this so we don't have to do a pre-declaration. This resolves: http://tracker.newdream.net/issues/3754 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:08 -08:00
Alex Elder	ecf7a0318b	rbd: get rid of rbd_req_sync_watch() Get rid of rbd_req_sync_watch(), because it is no longer used. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:08 -08:00
Alex Elder	9969ebc5af	rbd: implement watch/unwatch with new code Implement a new function to set up or tear down a watch event for an mapped rbd image header using the new request code. Create a new object request type "nodata" to handle this. And define rbd_osd_trivial_callback() which simply marks a request done. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:08 -08:00
Alex Elder	86ea43bfcb	rbd: get rid of rbd_req_sync_read() Delete rbd_req_sync_read() is no longer used, so get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:08 -08:00
Alex Elder	788e2df3b9	rbd: implement sync object read with new code Reimplement the synchronous read operation used for reading a version 1 header using the new request tracking code. Name the resulting function rbd_obj_read_sync() to better reflect that it's a full object operation, not an object request. To do this, implement a new OBJ_REQUEST_PAGES object request type. This implements a new mechanism to allow the caller to wait for completion for an rbd_obj_request by calling rbd_obj_request_wait(). This partially resolves: http://tracker.newdream.net/issues/3755 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:08 -08:00
Alex Elder	7d250b949a	rbd: kill rbd_req_coll and rbd_request The two remaining callers of rbd_do_request() always pass a null collection pointer, so the "coll" and "coll_index" parameters are not needed. There is no other use of that data structure, so it can be eliminated. Deleting them means there is no need to allocate a rbd_request structure for the callback function. And since that's the only use of that structure, it too can be eliminated. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:07 -08:00
Alex Elder	2250a71b59	rbd: kill rbd_rq_fn() and all other related code Now that the request function has been replaced by one using the new request management data structures the old one can go away. Deleting it makes rbd_dev_do_request() no longer needed, and deleting that makes other functions unneeded, and so on. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:07 -08:00
Alex Elder	bf0d5f503d	rbd: new request tracking code This patch fully implements the new request tracking code for rbd I/O requests. Each I/O request to an rbd image will get an rbd_image_request structure allocated to track it. This provides access to all information about the original request, as well as access to the set of one or more object requests that are initiated as a result of the image request. An rbd_obj_request structure defines a request sent to a single osd object (possibly) as part of an rbd image request. An rbd object request refers to a ceph_osd_request structure built up to represent the request; for now it will contain a single osd operation. It also provides space to hold the result status and the version of the object when the osd request completes. An rbd_obj_request structure can also stand on its own. This will be used for reading the version 1 header object, for issuing acknowledgements to event notifications, and for making object method calls. All rbd object requests now complete asynchronously with respect to the osd client--they supply a common callback routine. This resolves: http://tracker.newdream.net/issues/3741 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-02-13 18:29:07 -08:00
Jean Delvare	243eeb7890	swim: Add missing spinlock init It doesn't seem this spinlock was properly initialized. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Finn Thain <fthain@telegraphics.com.au> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>	2013-02-09 14:23:33 +01:00
Jens Axboe	e5e9fdaad4	rsxx: add slab.h include to dma.c kbuild test robot says: tree: git://git.kernel.dk/linux-block.git for-3.9/drivers head: `1262e24a59` commit: `8722ff8cdb` [6/8] block: IBM RamSan 70/80 device driver config: make ARCH=alpha allyesconfig All error/warnings: drivers/block/rsxx/dma.c: In function 'rsxx_complete_dma': >> drivers/block/rsxx/dma.c:251:2: error: implicit declaration of function 'kmem_cache_free' [-Werror=implicit-function-declaration] drivers/block/rsxx/dma.c: In function 'rsxx_queue_discard': >> drivers/block/rsxx/dma.c:567:2: error: implicit declaration of function 'kmem_cache_alloc' [-Werror=implicit-function-declaration] >> drivers/block/rsxx/dma.c:567:6: warning: assignment makes pointer from integer without a cast [enabled by default] drivers/block/rsxx/dma.c: In function 'rsxx_queue_dma': >> drivers/block/rsxx/dma.c:601:6: warning: assignment makes pointer from integer without a cast [enabled by default] drivers/block/rsxx/dma.c: In function 'rsxx_dma_init': >> drivers/block/rsxx/dma.c:985:2: error: implicit declaration of function 'KMEM_CACHE' [-Werror=implicit-function-declaration] >> drivers/block/rsxx/dma.c:985:29: error: 'rsxx_dma' undeclared (first use in this function) drivers/block/rsxx/dma.c:985:29: note: each undeclared identifier is reported only once for each function it appears in >> drivers/block/rsxx/dma.c:985:39: error: 'SLAB_HWCACHE_ALIGN' undeclared (first use in this function) drivers/block/rsxx/dma.c: In function 'rsxx_dma_cleanup': >> drivers/block/rsxx/dma.c:995:2: error: implicit declaration of function 'kmem_cache_destroy' [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors Signed-off-by: Jens Axboe <axboe@kernel.dk>	2013-02-07 10:04:21 +01:00
Heiko Carstens	1262e24a59	drivers/block/mtip32xx: add missing GENERIC_HARDIRQS dependency The MTIP32XX driver calls devm_request_irq() and therefore needs a GENERIC_HARDIRQS dependency to prevent building it on s390. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2013-02-07 07:37:29 +01:00
Linus Torvalds	2110cf029a	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block layer updates from Jens Axboe: "I've got a few bits pending for 3.8 final, that I better get sent out. It's all been sitting for a while, I consider it safe. It contains: - Two bug fixes for mtip32xx, fixing a driver hang and a crash. - A few-liner protocol error fix for drbd. - A few fixes for the xen block front/back driver, fixing a potential data corruption issue. - A race fix for disk_clear_events(), causing spurious warnings. Out of the Chrome OS base. - A deadlock fix for disk_clear_events(), moving it to the a unfreezable workqueue. Also from the Chrome OS base." * 'for-linus' of git://git.kernel.dk/linux-block: drbd: fix potential protocol error and resulting disconnect/reconnect mtip32xx: fix for crash when the device surprise removed during rebuild mtip32xx: fix for driver hang after a command timeout block: prevent race/cleanup block: remove deadlock in disk_clear_events xen-blkfront: handle bvecs with partial data llist/xen-blkfront: implement safe version of llist_for_each_entry xen-blkback: implement safe iterator for the list of persistent grants	2013-02-07 08:38:33 +11:00
Stephen Rothwell	82bed4d5f8	block: remove new __devinit/exit annotations on ramsam driver Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2013-02-06 09:34:20 +01:00
josh.h.morris@us.ibm.com	8722ff8cdb	block: IBM RamSan 70/80 device driver This patch includes the device driver for the IBM RamSan family of PCI SSD flash storage cards. This driver will include support for the RamSan 70 and 80. The driver presents a block device for device I/O. Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2013-02-05 14:16:05 +01:00
Alex Elder	969e5aa3b0	Merge branch 'testing' of github.com:ceph/ceph-client into v3.8-rc5-testing	2013-01-30 07:54:34 -06:00
Greg Kroah-Hartman	422d26b6ec	Merge 3.8-rc5 into driver-core-next This resolves a gpio driver merge issue pointed out in linux-next. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-25 21:06:30 -08:00
Alex Elder	c04306471a	rbd: don't retry setting up header watch When an rbd image is initially mapped a watch event is registered so we can do something if the header object changes. The code that does this currently loops if initiating the watch request results in an ERANGE error. The osds will never return ERANGE, so there's no reason to do this loop, so get rid of it. This resolves: http://tracker.newdream.net/issues/3860 Note that the problem this loop was intended to solve is a race between collecting image header information and setting up the watch on the header object. The real fix for that problem is described here: http://tracker.newdream.net/issues/3871 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-25 17:33:37 -06:00
Alex Elder	38901e0f24	rbd: check for overflow in rbd_get_num_segments() The return type of rbd_get_num_segments() is int, but the values it operates on are u64. Although it's not likely, there's no guarantee the result won't exceed what can be respresented in an int. The function is already designed to return -ERANGE on error, so just add this possible overflow as another reason to return that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-25 17:33:14 -06:00
Alex Elder	98571b5aa7	rbd: small changes A few very minor changes to the rbd code: - RBD_MAX_OPT_LEN is unused, so get rid of it - Consolidate rbd options definitions - Make rbd_segment_name() return pointer to const char Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-25 17:32:46 -06:00
Jens Axboe	1383923d19	Merge branch 'for-jens' of git://git.drbd.org/linux-drbd into for-linus	2013-01-22 08:22:11 -07:00
Kees Cook	0cb3d9c6ba	drivers/block/paride: remove depends on CONFIG_EXPERIMENTAL The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: Tim Waugh <tim@cyberelk.net> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-21 14:52:42 -08:00
Lars Ellenberg	2681f7f6ce	drbd: fix potential protocol error and resulting disconnect/reconnect When we notice a disk failure on the receiving side, we stop sending it new incoming writes. Depending on exact timing of various events, the same transfer log epoch could end up containing both replicated (before we noticed the failure) and local-only requests (after we noticed the failure). The sanity checks in tl_release(), called when receiving a P_BARRIER_ACK, check that the ack'ed transfer log epoch matches the expected epoch, and the number of contained writes matches the number of ack'ed writes. In this case, they counted both replicated and local-only writes, but the peer only acknowledges those it has seen. We get a mismatch, resulting in a protocol error and disconnect/reconnect cycle. Messages logged are "BAD! BarrierAck #%u received with n_writes=%u, expected n_writes=%u!\n" A similar issue can also be triggered when starting a resync while having a healthy replication link, by invalidating one side, forcing a full sync, or attaching to a diskless node. Fix this by closing the current epoch if the state changes in a way that would cause the replication intent of the next write. Epochs now contain either only non-replicated, or only replicated writes. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2013-01-21 22:58:36 +01:00
Linus Torvalds	226364766f	Various minor fixes, but a slightly more complex one to fix the per-cpu overload problem introduced recently by kvm id changes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJQ/IaJAAoJENkgDmzRrbjxOjAQAIrI9+Jo3Lsxk1v9gXeo9xn2 ST4LNv7/oW2+3NFBOkKsGVpcXe1JtGySIXyx9k+dELPa5xe4Rs4HE3pHQj/VoEx8 FKz3oUXSHkuh+paKuFXvZ2u/z0/FI99GmqHPObvGQ4iS3hTXAibzO83yYYPxwApq Zq4kof/dAcVVPLm8fGVAMPA2Rbh/WmjDfrIv8gv71QkDjtRLzcr40VIgky5cvu7V FWcBl4/DVoKkGnDPsLDhLK9QGqgBGhFIlNIcVX4Jv50DiCibOyzdjeUXYxMftoGr Rw56hHwGpPdqbRIjBkR071vIl/mlXTmxIv+d77vZNBin2MIBwAzCQXo8I1/HojCK /wKhI+RFj0J5DaDo/BTB80cmI3X2oah5sRUebW6vd9HjunhFFndg4mVeDNPa0E0+ F72xWlj79BjdIOuD06TLg6Tg2klL49nC8bUc0wrsh6onEjhd9v7Cp/X/rxi5cKYW eEv3oLkKwUHoheF9gBlpnT0Yyl/HpFe+nemblzj/ybRKnk4A5vtJqV9eZnqoOS16 lgIkKOpgXT9dzSom2EL/f4sMCeLLYC44DQwOvxNKt/BdMY0r5y8OLaJORXQGfEDF Ztvu2G8PmELxV0B3JZcGR/zOcKxpOBsrGoVn0/EQIul3A/0C0ID7i5zwJAyX6LP7 V+6vyF2eHMf10tB0rbfB =SpOo -----END PGP SIGNATURE----- Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module fixes and a virtio block fix from Rusty Russell: "Various minor fixes, but a slightly more complex one to fix the per-cpu overload problem introduced recently by kvm id changes." * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: module: put modules in list much earlier. module: add new state MODULE_STATE_UNFORMED. module: prevent warning when finit_module a 0 sized file virtio-blk: Don't free ida when disk is in use	2013-01-20 16:44:28 -08:00
Alex Elder	e0b49868d3	rbd: fix type of snap_id in rbd_dev_v2_snap_info() The type of the snap_id local variable is defined with the wrong byte order. Fix that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:35:55 -06:00
Alex Elder	8b84de7940	rbd: assign watch request more directly Both rbd_req_sync_op() and rbd_do_request() have a "linger" parameter, which is the address of a pointer that should refer to the osd request structure used to issue a request to an osd. Only one case ever supplies a non-null "linger" argument: an CEPH_OSD_OP_WATCH start. And in that one case it is assigned &rbd_dev->watch_request. Within rbd_do_request() (where the assignment ultimately gets made) we know the rbd_dev and therefore its watch_request field. We also know whether the op being sent is CEPH_OSD_OP_WATCH start. Stop opaquely passing down the "linger" pointer, and instead just assign the value directly inside rbd_do_request() when it's needed. This makes it unnecessary for rbd_req_sync_watch() to make arrangements to hold a value that's not available until a bit later. This more clearly separates setting up a watch request from submitting it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:59 -06:00
Alex Elder	5efea49a98	rbd: move remaining osd op setup into rbd_osd_req_op_create() The two remaining osd ops used by rbd are CEPH_OSD_OP_WATCH and CEPH_OSD_OP_NOTIFY_ACK. Move the setup of those operations into rbd_osd_req_op_create(), and get rid of rbd_create_rw_op() and rbd_destroy_op(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:59 -06:00
Alex Elder	2647ba3810	rbd: move call osd op setup into rbd_osd_req_op_create() Move the initialization of the CEPH_OSD_OP_CALL operation into rbd_osd_req_op_create(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:59 -06:00
Alex Elder	8d23bf2909	rbd: don't assign extent info in rbd_req_sync_op() Move the assignment of the extent offset and length and payload length out of rbd_req_sync_op() and into its caller in the one spot where a read (and note--no write) operation might be initiated. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:58 -06:00
Alex Elder	c561191813	rbd: don't assign extent info in rbd_do_request() In rbd_do_request() there's a sort of last-minute assignment of the extent offset and length and payload length for read and write operations. Move those assignments into the caller (in those spots that might initiate read or write operations) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:58 -06:00
Alex Elder	1821665749	rbd: don't leak rbd_req for rbd_req_sync_notify_ack() When rbd_req_sync_notify_ack() calls rbd_do_request() it supplies rbd_simple_req_cb() as its callback function. Because the callback is supplied, an rbd_req structure gets allocated and populated so it can be used by the callback. However rbd_simple_req_cb() is not freeing (or even using) the rbd_req structure, so it's getting leaked. Since rbd_simple_req_cb() has no need for the rbd_req structure, just avoid allocating one for this case. Of the three calls to rbd_do_request(), only the one from rbd_do_op() needs the rbd_req structure, and that call can be distinguished from the other two because it supplies a non-null rbd_collection pointer. So fix this leak by only allocating the rbd_req structure if a non-null "coll" value is provided to rbd_do_request(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:58 -06:00
Alex Elder	2e53c6c379	rbd: don't leak rbd_req on synchronous requests When rbd_do_request() is called it allocates and populates an rbd_req structure to hold information about the osd request to be sent. This is done for the benefit of the callback function (in particular, rbd_req_cb()), which uses this in processing when the request completes. Synchronous requests provide no callback function, in which case rbd_do_request() waits for the request to complete before returning. This case is not handling the needed free of the rbd_req structure like it should, so it is getting leaked. Note however that the synchronous case has no need for the rbd_req structure at all. So rather than simply freeing this structure for synchronous requests, just don't allocate it to begin with. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:58 -06:00
Alex Elder	907703d050	rbd: combine rbd sync watch/unwatch functions The rbd_req_sync_watch() and rbd_req_sync_unwatch() functions are nearly identical. Combine them into a single function with a flag indicating whether a watch is to be initiated or torn down. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:58 -06:00
Alex Elder	0903e875ca	rbd: use a common layout for each device Each osd message includes a layout structure, and for rbd it is always the same (at least for osd's in a given pool). Initialize a layout structure when an rbd_dev gets created and just copy that into osd requests for the rbd image. Replace an assertion that was done when initializing the layout structures with code that catches and handles anything that would trigger the assertion as soon as it is identified. This precludes that (bad) condition from ever occurring. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:58 -06:00
Alex Elder	47dba7ba26	rbd: don't bother calculating file mapping When rbd_do_request() has a request to process it initializes a ceph file layout structure and uses it to compute offsets and limits for the range of the request using ceph_calc_file_object_mapping(). The layout used is fixed, and is based on RBD_MAX_OBJ_ORDER (30). It sets the layout's object size and stripe unit to be 1 GB (2^30), and sets the stripe count to be 1. The job of ceph_calc_file_object_mapping() is to determine which of a sequence of objects will contain data covered by range, and within that object, at what offset the range starts. It also truncates the length of the range at the end of the selected object if necessary. This is needed for ceph fs, but for rbd it really serves no purpose. It does its own blocking of images into objects, echo of which is (1 << obj_order) in size, and as a result it ignores the "bno" value returned by ceph_calc_file_object_mapping(). In addition, by the point a request has reached this function, it is already destined for a single rbd object, and its length will not exceed that object's extent. Because of this, and because the mapping will result in blocking up the range using an integer multiple of the image's object order, ceph_calc_file_object_mapping() will never change the offset or length values defined by the request. In other words, this call is a big no-op for rbd data requests. There is one exception. We read the header object using this function, and in that case we will not have already limited the request size. However, the header is a single object (not a file or rbd image), and should not be broken into pieces anyway. So in fact we should not be calling ceph_calc_file_object_mapping() when operating on the header object. So... Don't call ceph_calc_file_object_mapping() in rbd_do_request(), because useless for image data and incorrect to do sofor the image header. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:58 -06:00
Alex Elder	e01e79273b	rbd: open code rbd_calc_raw_layout() This patch gets rid of rbd_calc_raw_layout() by simply open coding it in its one caller. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:58 -06:00
Alex Elder	0829661863	rbd: pull in ceph_calc_raw_layout() This is the first in a series of patches aimed at eliminating the use of ceph_calc_raw_layout() by rbd. It simply pulls in a copy of that function and renames it rbd_calc_raw_layout(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:58 -06:00
Alex Elder	30573d6803	rbd: assume single op in a request We now know that every of rbd_req_sync_op() passes an array of exactly one operation, as evidenced by all callers passing 1 as its num_op argument. So get rid of that argument, assuming a single op. Similarly, we now know that all callers of rbd_do_request() pass 1 as the num_op value, so that parameter can be eliminated as well. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:57 -06:00
Alex Elder	139b4318ad	rbd: there is really only one op Throughout the rbd code there are spots where it appears we can handle an osd request containing more than one osd request op. But that is only the way it appears. In fact, currently only one operation at a time can be supported, and supporting more than one will require much more than fleshing out the support that's there now. This patch changes names to make it perfectly clear that anywhere we're dealing with a block of ops, we're in fact dealing with exactly one of them. We'll be able to simplify some things as a result. When multiple op support is implemented, we can update things again accordingly. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:57 -06:00
Alex Elder	ae7ca4a35b	libceph: pass num_op with ops Both ceph_osdc_alloc_request() and ceph_osdc_build_request() are provided an array of ceph osd request operations. Rather than just passing the number of operations in the array, the caller is required append an additional zeroed operation structure to signal the end of the array. All callers know the number of operations at the time these functions are called, so drop the silly zero entry and supply that number directly. As a result, get_num_ops() is no longer needed. This also means that ceph_osdc_alloc_request() never uses its ops argument, so that can be dropped. Also rbd_create_rw_ops() no longer needs to add one to reserve room for the additional op. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:57 -06:00
Alex Elder	d07c09589f	rbd: pass num_op with ops array Add a num_op parameter to rbd_do_request() and rbd_req_sync_op() to indicate the number of entries in the array. The callers of these functions always know how many entries are in the array, so just pass that information down. This is in anticipation of eliminating the extra zero-filled entry in these ops arrays. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:57 -06:00
Alex Elder	54a5400721	libceph: don't set pages or bio in ceph_osdc_alloc_request() Only one of the two callers of ceph_osdc_alloc_request() provides page or bio data for its payload. And essentially all that function was doing with those arguments was assigning them to fields in the osd request structure. Simplify ceph_osdc_alloc_request() by having the caller take care of making those assignments Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 16:34:57 -06:00
Alex Elder	d178a9e740	libceph: don't set flags in ceph_osdc_alloc_request() The only thing ceph_osdc_alloc_request() really does with the flags value it is passed is assign it to the newly-created osd request structure. Do that in the caller instead. Both callers subsequently call ceph_osdc_build_request(), so have that function (instead of ceph_osdc_alloc_request()) issue a warning if a request comes through with neither the read nor write flags set. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:05 -06:00
Alex Elder	e75b45cf36	libceph: drop osdc from ceph_calc_raw_layout() The osdc parameter to ceph_calc_raw_layout() is not used, so get rid of it. Consequently, the corresponding parameter in calc_layout() becomes unused, so get rid of that as well. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:05 -06:00
Alex Elder	4d6b250bf1	libceph: drop snapid in ceph_calc_raw_layout() A snapshot id must be provided to ceph_calc_raw_layout() even though it is not needed at all for calculating the layout. Where the snapshot id is needed is when building the request message for an osd operation. Drop the snapid parameter from ceph_calc_raw_layout() and pass that value instead in ceph_osdc_build_request(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:05 -06:00
Alex Elder	0120be3c60	libceph: pass length to ceph_osdc_build_request() The len argument to ceph_osdc_build_request() is set up to be passed by address, but that function never updates its value so there's no need to do this. Tighten up the interface by passing the length directly. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:04 -06:00
Alex Elder	7c3d22cf16	rbd: don't bother setting snapid in rbd_do_request() For some reason, the snapid field of the osd request header is explicitly set to CEPH_NOSNAP in rbd_do_request(). Just a few lines later--with no code that would access this field in between--a call is made to ceph_calc_raw_layout() passing the snapid provided to rbd_do_request(), which encodes the snapid value it is provided into that field instead. In other words, there is no need to fill in CEPH_NOSNAP, and doing so suggests it might be necessary. Don't do that any more. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:03 -06:00
Alex Elder	25704ac9de	rbd: kill rbd_req_sync_op() snapc and snapid parameters The snapc and snapid parameters to rbd_req_sync_op() always take the values NULL and CEPH_NOSNAP, respectively. So just get rid of them and use those values where needed. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:02 -06:00
Alex Elder	07b2391fbb	rbd: drop flags parameter from rbd_req_sync_exec() All callers of rbd_req_sync_exec() pass CEPH_OSD_FLAG_READ as their flags argument. Delete that parameter and use CEPH_OSD_FLAG_READ within the function. If we find a need to support write operations we can add it back again. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:02 -06:00
Alex Elder	4775618d92	rbd: drop snapid parameter from rbd_req_sync_read() There is only one caller of rbd_req_sync_read(), and it passes CEPH_NOSNAP as the snapshot id argument. Delete that parameter and just use CEPH_NOSNAP within the function. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:01 -06:00
Alex Elder	af77f26caa	rbd: drop oid parameters from ceph_osdc_build_request() The last two parameters to ceph_osd_build_request() describe the object id, but the values passed always come from the osd request structure whose address is also provided. Get rid of those last two parameters. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:01 -06:00
Alex Elder	0ec8ce87f3	rbd: separate layout init Pull a block of code that initializes the layout structure in an osd request into its own function so it can be reused. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:01 -06:00
Alex Elder	a7b4c65f4f	rbd: only get snap context for write requests Right now we get the snapshot context for an rbd image (under protection of the header semaphore) for every request processed. There's no need to get the snap context if we're doing a read, so avoid doing so in that case. Note that we no longer need to hold the header semaphore to check the rbd_dev's existence flag. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:00 -06:00
Alex Elder	d78b650a59	rbd: make exists flag atomic The rbd_device->exists field can be updated asynchronously, changing from set to clear if a mapped snapshot disappears from the base image's snapshot context. Currently, value of the "exists" flag is only read and modified under protection of the header semaphore, but that will change with the next patch. Making it atomic ensures this won't be a problem because the a the non-existence of device will be immediately known. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:52:00 -06:00
Alex Elder	b395e8b5b8	rbd: a little more cleanup of rbd_rq_fn() Now that a big hunk in the middle of rbd_rq_fn() has been moved into its own routine we can simplify it a little more. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:51:51 -06:00
Alex Elder	cd323ac0eb	rbd: end request on error in rbd_do_request() caller Only one of the three callers of rbd_do_request() provide a collection structure to aggregate status. If an error occurs in rbd_do_request(), have the caller take care of calling rbd_coll_end_req() if necessary in that one spot. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:33:41 -06:00
Alex Elder	8295cda7ce	rbd: encapsulate handling for a single request In rbd_rq_fn(), requests are fetched from the block layer and each request is processed, looping through the request's list of bio's until they've all been consumed. Separate the handling for a single request into its own function to make it a bit easier to see what's going on. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 15:04:47 -06:00
Alex Elder	8986cb37b1	rbd: be picky about osd request status type The result field in a ceph osd reply header is a signed 32-bit type, but rbd code often casually uses int to represent it. The following changes the types of variables that handle this result value to be "s32" instead of "int" to be completely explicit about it. Only at the point we pass that result to __blk_end_request() does the type get converted to the plain old int defined for that interface. There is almost certainly no binary impact of this change, but I prefer to show the exact size and signedness of the value since we know it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>	2013-01-17 14:53:20 -06:00
Alex Elder	5f29ddd4f0	rbd: standardize ceph_osd_request variable names There are spots where a ceph_osds_request pointer variable is given the name "req". Since we're dealing with (at least) three types of requests (block layer, rbd, and osd), I find this slightly distracting. Change such instances to use "osd_req" consistently to make the abstraction represented a little more obvious. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 14:53:15 -06:00
Alex Elder	725afc97c9	rbd: standardize rbd_request variable names There are two names used for items of rbd_request structure type: "req" and "req_data". The former name is also used to represent items of pointers to struct ceph_osd_request. Change all variables that have these names so they are instead called "rbd_req" consistently. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 14:53:07 -06:00
Alex Elder	935dc89f3e	rbd: add warnings to rbd_dev_probe_update_spec() Josh suggested adding warnings to this function to help users diagnose problems. Other than memory allocatino errors, there are two places where errors can be returned. Both represent problems that should have been caught earlier, and as such might well have been handled with BUG_ON() calls. But if either ever did manage to happen, it will be reported. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 14:12:46 -06:00
Alex Elder	f5400b7a0e	rbd: add a warning in bio_chain_clone_range() Add a warning in bio_chain_clone_range() to help a user determine what exactly might have led to a failure. There is only one; please say something if you disagree with the following reasoning. There are three places this can return abnormally: - Initially, if there is nothing to clone. It turns out that right now this cannot happen anyway. The test is in place because the code below it doesn't work if those conditions don't hold. As such they could be assertions but since I can return a null to indicate an error I just do that instead. I have not added a warning here because it won't happen. - While processing bio's, if none remain but there are supposed to be more bytes to clone. Here I have added a warning. - If bio_clone_range() returns a null pointer. That function will have already produced a warning (at least the first time, via WARN_ON_ONCE()) to distinguish the cause of the error. The only exception is memory exhaustion, and I'd rather not pepper the code with warnings in all those spots. So no warning is added in that place. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-01-17 14:12:31 -06:00
Alex Elder	4fb5d67139	rbd: add warning messages for missing arguments Tell the user (via dmesg) what was wrong with the arguments provided via /sys/bus/rbd/add. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>	2013-01-17 14:10:21 -06:00

1 2 3 4 5 ...

3531 Commits