OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Ilya Dryomov	3f1af42ad0	libceph: enable large, variable-sized OSD requests Turn r_ops into a flexible array member to enable large, consisting of up to 16 ops, OSD requests. The use case is scattered writeback in cephfs and, as far as the kernel client is concerned, 16 is just a made up number. r_ops had size 3 for copyup+hint+write, but copyup is really a special case - it can only happen once. ceph_osd_request_cache is therefore stuffed with num_ops=2 requests, anything bigger than that is allocated with kmalloc(). req_mempool is backed by ceph_osd_request_cache, which means either num_ops=1 or num_ops=2 for use_mempool=true - all existing users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with that. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2016-03-25 18:51:43 +01:00
Yan, Zheng	7665d85b73	libceph: move r_reply_op_{len,result} into struct ceph_osd_req_op This avoids defining large array of r_reply_op_{len,result} in in struct ceph_osd_request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2016-03-25 18:51:42 +01:00
Linus Torvalds	1d02369dba	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Final round of fixes for this merge window - some of this has come up after the initial pull request, and some of it was put in a post-merge branch before the merge window. This contains: - Fix for a bad check for an error on dma mapping in the mtip32xx driver, from Alexey Khoroshilov. - A set of fixes for lightnvm, from Javier, Matias, and Wenwei. - An NVMe completion record corruption fix from Marta, ensuring that we read things in the right order. - Two writeback fixes from Tejun, marked for stable@ as well. - A blk-mq sw queue iterator fix from Thomas, fixing an oops for sparse CPU maps. They hit this in the hot plug/unplug rework" * 'for-linus' of git://git.kernel.dk/linux-block: nvme: avoid cqe corruption when update at the same time as read writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list() blk-mq: Use proper cpumask iterator mtip32xx: fix checks for dma mapping errors lightnvm: do not load L2P table if not supported lightnvm: do not reserve lun on l2p loading nvme: lightnvm: return ppa completion status lightnvm: add a bitmap of luns lightnvm: specify target's logical address area null_blk: add lightnvm null_blk device to the nullb_list	2016-03-24 20:00:44 -07:00
Linus Torvalds	f0691533b7	virtio/vhost: new features, performance improvements, cleanups This adds basic polling support for vhost. Reworks virtio to optionally use DMA API, fixing it on Xen. Balloon stats gained a new entry. Using the new napi_alloc_skb speeds up virtio net. virtio blk stats can now be read while another VCPU us busy inflating or deflating the balloon. Plus misc cleanups in various places. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJW7qRJAAoJECgfDbjSjVRpVNoH/A7z+lZ6nooSJ9fUBtAAlwit mE1VKi8g0G6naV1NVLFVe7hPAejExGiHfR3ZUrVoenJKj2yeW/DFojFC10YR/KTe ac7Imuc+owA3UOE/QpeGBs59+EEWKTZUYt6r8HSJVwoodeosw9v2ecP/Iwhbax8H a4V3HqOADjKnHg73R9o3u+bAgA1GrGYHeK0AfhCBSTNwlPdxkvf0463HgfOpM4nl /sNoFWO3vOyekk+loIk+jpmWVIoIfG2NFzW4lPwEPkfqUBX7r0ei/NR23hIqHL7r QZ6vMj1Ew9qctUONbJu4kXjuV2Vk9NhxwbDjoJtm8plKL2hz2prJynUEogkHh2g= =VMD0 -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio/vhost updates from Michael Tsirkin: "New features, performance improvements, cleanups: - basic polling support for vhost - rework virtio to optionally use DMA API, fixing it on Xen - balloon stats gained a new entry - using the new napi_alloc_skb speeds up virtio net - virtio blk stats can now be read while another VCPU is busy inflating or deflating the balloon plus misc cleanups in various places" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_net: replace netdev_alloc_skb_ip_align() with napi_alloc_skb() vhost_net: basic polling support vhost: introduce vhost_vq_avail_empty() vhost: introduce vhost_has_work() virtio_balloon: Allow to resize and update the balloon stats in parallel virtio_balloon: Use a workqueue instead of "vballoon" kthread virtio/s390: size of SET_IND payload virtio/s390: use dev_to_virtio vhost: rename vhost_init_used() vhost: rename cross-endian helpers virtio_blk: VIRTIO_BLK_F_WCE->VIRTIO_BLK_F_FLUSH vring: Use the DMA API on Xen virtio_pci: Use the DMA API if enabled virtio_mmio: Use the DMA API if enabled virtio: Add improved queue allocation API virtio_ring: Support DMA APIs vring: Introduce vring_use_dma_api() s390/dma: Allow per device dma ops alpha/dma: use common noop dma ops dma: Provide simple noop dma ops	2016-03-20 13:28:18 -07:00
Linus Torvalds	814a2bf957	Merge branch 'akpm' (patches from Andrew) Merge second patch-bomb from Andrew Morton: - a couple of hotfixes - the rest of MM - a new timer slack control in procfs - a couple of procfs fixes - a few misc things - some printk tweaks - lib/ updates, notably to radix-tree. - add my and Nick Piggin's old userspace radix-tree test harness to tools/testing/radix-tree/. Matthew said it was a godsend during the radix-tree work he did. - a few code-size improvements, switching to __always_inline where gcc screwed up. - partially implement character sets in sscanf * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) sscanf: implement basic character sets lib/bug.c: use common WARN helper param: convert some "on"/"off" users to strtobool lib: add "on"/"off" support to kstrtobool lib: update single-char callers of strtobool() lib: move strtobool() to kstrtobool() include/linux/unaligned: force inlining of byteswap operations include/uapi/linux/byteorder, swab: force inlining of some byteswap operations include/asm-generic/atomic-long.h: force inlining of some atomic_long operations usb: common: convert to use match_string() helper ide: hpt366: convert to use match_string() helper ata: hpt366: convert to use match_string() helper power: ab8500: convert to use match_string() helper power: charger_manager: convert to use match_string() helper drm/edid: convert to use match_string() helper pinctrl: convert to use match_string() helper device property: convert to use match_string() helper lib/string: introduce match_string() helper radix-tree tests: add test for radix_tree_iter_next radix-tree tests: add regression3 test ...	2016-03-18 19:26:54 -07:00
Alexey Khoroshilov	5173cb814b	mtip32xx: fix checks for dma mapping errors exec_drive_taskfile() checks for dma mapping errors by comparison returned address with zero, while pci_dma_mapping_error() should be used. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-18 18:10:59 -07:00
Wenwei Tao	3681c85dff	null_blk: add lightnvm null_blk device to the nullb_list After register null_blk devices into lightnvm, we forget to add these devices to the the nullb_list, makes them invisible to the null_blk driver. Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com> Fixes: `a514379b0c` ("null_blk: oops when initializing without lightnvm") Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-18 18:10:37 -07:00
Linus Torvalds	237045fc3c	Merge branch 'for-4.6/drivers' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "This is the block driver pull request for this merge window. It sits on top of for-4.6/core, that was just sent out. This contains: - A set of fixes for lightnvm. One from Alan, fixing an overflow, and the rest from the usual suspects, Javier and Matias. - A set of fixes for nbd from Markus and Dan, and a fixup from Arnd for correct usage of the signed 64-bit divider. - A set of bug fixes for the Micron mtip32xx, from Asai. - A fix for the brd discard handling from Bart. - Update the maintainers entry for cciss, since that hardware has transferred ownership. - Three bug fixes for bcache from Eric Wheeler. - Set of fixes for xen-blk{back,front} from Jan and Konrad. - Removal of the cpqarray driver. It has been disabled in Kconfig since 2013, and we were initially scheduled to remove it in 3.15. - Various updates and fixes for NVMe, with the most important being: - Removal of the per-device NVMe thread, replacing that with a watchdog timer instead. From Christoph. - Exposing the namespace WWID through sysfs, from Keith. - Set of cleanups from Ming Lin. - Logging the controller device name instead of the underlying PCI device name, from Sagi. - And a bunch of fixes and optimizations from the usual suspects in this area" * 'for-4.6/drivers' of git://git.kernel.dk/linux-block: (49 commits) NVMe: Expose ns wwid through single sysfs entry drivers:block: cpqarray clean up brd: Fix discard request processing cpqarray: remove it from the kernel cciss: update MAINTAINERS NVMe: Remove unused sq_head read in completion path bcache: fix cache_set_flush() NULL pointer dereference on OOM bcache: cleaned up error handling around register_cache() bcache: fix race of writeback thread starting before complete initialization NVMe: Create discard zero quirk white list nbd: use correct div_s64 helper mtip32xx: remove unneeded variable in mtip_cmd_timeout() lightnvm: generalize rrpc ppa calculations lightnvm: remove struct nvm_dev->total_blocks lightnvm: rename ->nr_pages to ->nr_sects lightnvm: update closed list outside of intr context xen/blback: Fit the important information of the thread in 17 characters lightnvm: fold get bb tbl when using dual/quad plane mode lightnvm: fix up nonsensical configure overrun checking xen-blkback: advertise indirect segment support earlier ...	2016-03-18 17:13:31 -07:00
Joonsoo Kim	fe896d1878	mm: introduce page reference manipulation functions The success of CMA allocation largely depends on the success of migration and key factor of it is page reference count. Until now, page reference is manipulated by direct calling atomic functions so we cannot follow up who and where manipulate it. Then, it is hard to find actual reason of CMA allocation failure. CMA allocation should be guaranteed to succeed so finding offending place is really important. In this patch, call sites where page reference is manipulated are converted to introduced wrapper function. This is preparation step to add tracepoint to each page reference manipulation function. With this facility, we can easily find reason of CMA allocation failure. There is no functional change in this patch. In addition, this patch also converts reference read sites. It will help a second step that renames page._count to something else and prevents later attempt to direct access to it (Suggested by Andrew). Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-17 15:09:34 -07:00
Linus Torvalds	70477371dc	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "Here is the crypto update for 4.6: API: - Convert remaining crypto_hash users to shash or ahash, also convert blkcipher/ablkcipher users to skcipher. - Remove crypto_hash interface. - Remove crypto_pcomp interface. - Add crypto engine for async cipher drivers. - Add akcipher documentation. - Add skcipher documentation. Algorithms: - Rename crypto/crc32 to avoid name clash with lib/crc32. - Fix bug in keywrap where we zero the wrong pointer. Drivers: - Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver. - Add PIC32 hwrng driver. - Support BCM6368 in bcm63xx hwrng driver. - Pack structs for 32-bit compat users in qat. - Use crypto engine in omap-aes. - Add support for sama5d2x SoCs in atmel-sha. - Make atmel-sha available again. - Make sahara hashing available again. - Make ccp hashing available again. - Make sha1-mb available again. - Add support for multiple devices in ccp. - Improve DMA performance in caam. - Add hashing support to rockchip" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits) crypto: qat - remove redundant arbiter configuration crypto: ux500 - fix checks of error code returned by devm_ioremap_resource() crypto: atmel - fix checks of error code returned by devm_ioremap_resource() crypto: qat - Change the definition of icp_qat_uof_regtype hwrng: exynos - use __maybe_unused to hide pm functions crypto: ccp - Add abstraction for device-specific calls crypto: ccp - CCP versioning support crypto: ccp - Support for multiple CCPs crypto: ccp - Remove check for x86 family and model crypto: ccp - memset request context to zero during import lib/mpi: use "static inline" instead of "extern inline" lib/mpi: avoid assembler warning hwrng: bcm63xx - fix non device tree compatibility crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode. crypto: qat - The AE id should be less than the maximal AE number lib/mpi: Endianness fix crypto: rockchip - add hash support for crypto engine in rk3288 crypto: xts - fix compile errors crypto: doc - add skcipher API documentation crypto: doc - update AEAD AD handling ...	2016-03-17 11:22:54 -07:00
Arnd Bergmann	dec63a4dec	paride: make 'verbose' parameter an 'int' again gcc-6.0 found an ancient bug in the paride driver, which had a "module_param(verbose, bool, 0);" since before 2.6.12, but actually uses it to accept '0', '1' or '2' as arguments: drivers/block/paride/pd.c: In function 'pd_init_dev_parms': drivers/block/paride/pd.c:298:29: warning: comparison of constant '1' with boolean expression is always false [-Wbool-compare] #define DBMSG(msg) ((verbose>1)?(msg):NULL) In 2012, Rusty did a cleanup patch that also changed the type of the variable to 'bool', which introduced what is now a gcc warning. This changes the type back to 'int' and adapts the module_param() line instead, so it should work as documented in case anyone ever cares about running the ancient driver with debugging. Fixes: `90ab5ee941` ("module_param: make bool parameters really bool (drivers & misc)") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Rusty Russell <rusty@rustcorp.com.au> Cc: Tim Waugh <tim@cyberelk.net> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Cc: Jens Axboe <axboe@fb.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Valentin Rothberg	98347a7d8a	drivers:block: cpqarray clean up Commit `d436641439` ("cpqarray: remove it from the kernel") removes the Kconfig option BLK_CPQ_DA and cpqarray. Remove the dead build rule in the Makefile. Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-15 15:59:47 -07:00
Bart Van Assche	5e4298be45	brd: Fix discard request processing Avoid that discard requests with size => PAGE_SIZE fail with -EIO. Refuse discard requests if the discard size is not a multiple of the page size. Fixes: `2dbe549576` ("brd: Refuse improperly aligned discard requests") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Jan Kara <jack@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robert Elliot <elliott@hp.com> Cc: stable <stable@vger.kernel.org> # v4.4+ Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-15 14:10:29 -07:00
Jens Axboe	d436641439	cpqarray: remove it from the kernel We disabled the ability to enable this driver back in October of 2013, we should be able to safely remove it at this point. The initial goal was to remove it in 3.15, so now is the time. Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-14 09:06:01 -06:00
Arnd Bergmann	5e454c67fc	nbd: use correct div_s64 helper The do_div() macro now checks its arguments for the correct type, and refuses anything other than u64, so we get a warning about nbd_ioctl passing in an loff_t: drivers/block/nbd.c: In function '__nbd_ioctl': drivers/block/nbd.c:757:77: error: comparison of distinct pointer types lacks a cast [-Werror] This changes the nbd code to use div_s64() instead, which takes a signed argument. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `37091fdd83` ("nbd: Create size change events for userspace") Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-04 17:20:12 -07:00
Jens Axboe	90beb2e7a0	mtip32xx: remove unneeded variable in mtip_cmd_timeout() We always return BLK_EH_RESET_TIMER, so no point in storing that in an integer. Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-04 08:15:48 -07:00
Konrad Rzeszutek Wilk	fa3184b898	xen/blback: Fit the important information of the thread in 17 characters The processes names are truncated to 17, while we had the length of the process as name 20 - which meant that while we filled it out with various details - the last 3 characters (which had the queue number) never surfaced to the user-space. To simplify this and be able to fit the device name, domain id, and the queue number we remove the 'blkback' from the name. Prior to this patch the device name is "blkback.<domid>.<name>" for example: blkback.8.xvda, blkback.11.hda. With the multiqueue block backend we add "-%d" for the queue. But sadly this is already way past the limit so it gets stripped. Possible solution had been identified by Ian: http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg03516.html " If you are pressed for space then the "xvd" is probably a bit redundant in a string which starts blkbk. The guest may not even call the device xvdN (iirc BSD has another prefix) any how, so having blkback say so seems of limited use anyway. Since this seems to not include a partition number how does this work in the split partition scheme? (i.e. one where the guest is given xvda1 and xvda2 rather than xvda with a partition table) [It will be 'blkback.8.xvda1', and 'blkback.11.xvda2'] Perhaps something derived from one of the schemes in http://xenbits.xen.org/docs/unstable/misc/vbd-interface.txt might be a better fit? After a bit of discussion (see http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg01588.html) we settled on dropping the "blback" part. This will make it possible to have the <domid>.<name>-<queue>: [1.xvda-0] [1.xvda-1] And we enough space to make it go up to: [32100.xvdfg9-5] Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-03-03 14:45:54 -07:00
Jan Beulich	5a7058450c	xen-blkback: advertise indirect segment support earlier There's no reason to defer this until the connect phase, and in fact there are frontend implementations expecting this to be available earlier. Move it into the probe function. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-03-03 14:45:53 -07:00
Jan Beulich	14e710fe78	xen-blkfront: rename indirect descriptor parameter "max" is rather ambiguous and carries pretty little meaning, the more that there are also "max_queues" and "max_ring_page_order". Make this "max_indirect_segments" instead, and at once change the type from int to uint (to match the respective variable's type). Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-03-03 14:45:53 -07:00
Asai Thambi SP	008e56d200	mtip32xx: Cleanup queued requests after surprise removal Fail all pending requests after surprise removal of a drive. Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-03 09:08:44 -07:00
Asai Thambi SP	abb0ccd185	mtip32xx: Implement timeout handler Added timeout handler. Replaced blk_mq_end_request() with blk_mq_complete_request() to avoid double completion of a request. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-03 09:08:44 -07:00
Asai Thambi SP	aae4a03386	mtip32xx: Handle FTL rebuild failure state during device initialization Allow device initialization to finish gracefully when it is in FTL rebuild failure state. Also, recover device out of this state after successfully secure erasing it. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-03 09:08:43 -07:00
Asai Thambi SP	51c6570eb9	mtip32xx: Handle safe removal during IO Flush inflight IOs using fsync_bdev() when the device is safely removed. Also, block further IOs in device open function. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-03 09:08:43 -07:00
Asai Thambi SP	59cf70e236	mtip32xx: Fix for rmmod crash when drive is in FTL rebuild When FTL rebuild is in progress, alloc_disk() initializes the disk but device node will be created by add_disk() only after successful completion of FTL rebuild. So, skip deletion of device node in removal path when FTL rebuild is in progress. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-03 09:08:43 -07:00
Asai Thambi SP	d8a18d2d8f	mtip32xx: Avoid issuing standby immediate cmd during FTL rebuild Prevent standby immediate command from being issued in remove, suspend and shutdown paths, while drive is in FTL rebuild process. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-03 09:08:43 -07:00
Asai Thambi SP	5b7e0a8ac8	mtip32xx: Print exact time when an internal command is interrupted Print exact time when an internal command is interrupted. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-03 09:08:43 -07:00
Asai Thambi SP	e35b94738a	mtip32xx: Remove unwanted code from taskfile error handler Remove setting and clearing MTIP_PF_EH_ACTIVE_BIT flag in mtip_handle_tfe() as they are redundant. Also avoid waking up service thread from mtip_handle_tfe() because it is already woken up in case of taskfile error. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-03 09:08:43 -07:00
Asai Thambi SP	cfc05bd313	mtip32xx: Fix broken service thread handling Service thread does not detect the need for taskfile error hanlding. Fixed the flag condition to process taskfile error. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-03 09:08:43 -07:00
Michael S. Tsirkin	592002f55e	virtio_blk: VIRTIO_BLK_F_WCE->VIRTIO_BLK_F_FLUSH Latest virtio spec says the feature bit name is VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_WCE is the legacy name. virtio blk header says exactly the reverse - fix that and update driver code to match. Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2016-03-02 17:01:59 +02:00
Markus Pargmann	37091fdd83	nbd: Create size change events for userspace The userspace needs to know when nbd devices are ready for use. Currently no events are created for the userspace which doesn't work for systemd. See the discussion here: https://github.com/systemd/systemd/pull/358 This patch uses a central point to setup the nbd-internal sizes. A ioctl to set a size does not lead to a visible size change. The size of the block device will be kept at 0 until nbd is connected. As soon as it connects, the size will be changed to the real value and a uevent is created. When disconnecting, the blockdevice is set to 0 size and another uevent is generated. Signed-off-by: Markus Pargmann <mpa@pengutronix.de>	2016-02-15 10:35:47 +01:00
Matias Bjørling	a514379b0c	null_blk: oops when initializing without lightnvm If the LightNVM subsystem is not compiled into the kernel, and the null_blk device driver requests lightnvm to be initialized. The call to nvm_register fails and the null_add_dev function cleans up the initialization. However, at this point the null block device has already been added to the nullb_list and thus a second cleanup will occur when the function has returned, that leads to a double call to blk_cleanup_queue. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-02-11 08:56:09 -07:00
Jiri Kosina	09954bad44	floppy: refactor open() flags handling In case /dev/fdX is open with O_NDELAY / O_NONBLOCK, floppy_open() immediately succeeds, without performing any further media / controller preparations. That's "correct" wrt. the NODELAY flag, but is hardly correct wrt. the rest of the floppy driver, that is not really O_NONBLOCK ready, at all. Therefore it's not too surprising, that subsequent attempts to work with the filedescriptor produce bad results. Namely, syzkaller tool has been able to livelock mmap() on the returned fd to keep waiting on the page unlock bit forever. Quite frankly, I have trouble defining what non-blocking behavior would be for floppies. Is waiting ages for the driver to actually succeed reading a sector blocking operation? Is waiting for drive motor to start blocking operation? How about in case of virtualized floppies? One option would be returning EWOULDBLOCK in case O_NDLEAY / O_NONBLOCK is being passed to open(). That has a theoretical potential of breaking some arcane and archaic userspace though. Let's take a more conservative aproach, and accept the O_NDLEAY flag, and let the driver behave as usual. While at it, clean up a bit handling of !(mode & (FMODE_READ\|FMODE_WRITE)) case and return EINVAL instead of succeeding as well. Spotted by syzkaller tool. Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2016-02-06 23:00:22 +01:00
Dan Streetman	da6ccaaa79	nbd: ratelimit error msgs after socket close Make the "Attempted send on closed socket" error messages generated in nbd_request_handler() ratelimited. When the nbd socket is shutdown, the nbd_request_handler() function emits an error message for every request remaining in its queue. If the queue is large, this will spam a large amount of messages to the log. There's no need for a separate error message for each request, so this patch ratelimits it. In the specific case this was found, the system was virtual and the error messages were logged to the serial port, which overwhelmed it. Fixes: `4d48a542b4` ("nbd: fix I/O hang on disconnected nbds") Signed-off-by: Dan Streetman <dan.streetman@canonical.com> Signed-off-by: Markus Pargmann <mpa@pengutronix.de>	2016-02-05 08:55:15 +01:00
Markus Pargmann	d02cf53107	nbd: Move flag parsing to a function nbd changes properties of the blockdevice depending on flags that were received. This patch moves this flag parsing into a separate function nbd_parse_flags(). Signed-off-by: Markus Pargmann <mpa@pengutronix.de>	2016-02-05 08:52:33 +01:00
Markus Pargmann	0e4f0f6f63	nbd: Cleanup reset of nbd and bdev after a disconnect Group all variables that are reset after a disconnect into reset functions. This patch adds two of these functions, nbd_reset() and nbd_bdev_reset(). Signed-off-by: Markus Pargmann <mpa@pengutronix.de>	2016-02-05 08:52:32 +01:00
Markus Pargmann	1f7b5cf1be	nbd: Timeouts are not user requested disconnects It may be useful to know in the client that a connection timed out. The current code returns success for a timeout. This patch reports the error code -ETIMEDOUT for a timeout. Signed-off-by: Markus Pargmann <mpa@pengutronix.de>	2016-02-05 08:52:31 +01:00
Markus Pargmann	23272a6754	nbd: Remove signal usage As discussed on the mailing list, the usage of signals for timeout handling has a lot of potential issues. The nbd driver used for some time signals for timeouts. These signals where able to get the threads out of the blocking socket operations. This patch removes all signal usage and uses a socket shutdown instead. The socket descriptor itself is cleared later when the whole nbd device is closed. The tasks_lock is removed as we do not depend on this anymore. Instead a new lock for the socket is introduced so we can safely work with the socket in the timeout handler outside of the two main threads. Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de>	2016-02-05 08:52:25 +01:00
Matias Bjørling	bf64318564	lightnvm: allow to force mm initialization System block allows the device to initialize with its configured media manager. The system blocks is written to disk, and read again when media manager is determined. For this to work, the backend must store the data. Device drivers, such as null_blk, does not have any backend storage. This patch allows the media manager to be initialized without a storage backend. It also fix incorrect configuration of capabilities in null_blk, as it does not support get/set bad block interface. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-02-04 09:19:45 -07:00
Markus Pargmann	27ea43fe2a	nbd: Fix debugfs error handling Static checker complains about the implemented error handling. It is indeed wrong. We don't care about the return values of created debugfs files. We only have to check the return values of created dirs for NULL pointer. If we use a null pointer as parent directory for files, this may lead to debugfs files in wrong places. Signed-off-by: Markus Pargmann <mpa@pengutronix.de>	2016-02-03 11:02:56 +01:00
Jens Axboe	90b90d06db	Merge branch 'for-4.5/for-jens' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/linux-block into for-linus Locking fix from Jiri	2016-02-01 09:08:39 -07:00
Jiri Kosina	a0c80efe59	floppy: fix lock_fdc() signal handling floppy_revalidate() doesn't perform any error handling on lock_fdc() result. lock_fdc() might actually be interrupted by a signal (it waits for fdc becoming non-busy interruptibly). In such case, floppy_revalidate() proceeds as if it had claimed the lock, but it fact it doesn't. In case of multiple threads trying to open("/dev/fdX"), this leads to serious corruptions all over the place, because all of a sudden there is no critical section protection (that'd otherwise be guaranteed by locked fd) whatsoever. While at this, fix the fact that the 'interruptible' parameter to lock_fdc() doesn't make any sense whatsoever, because we always wait interruptibly anyway. Most of the lock_fdc() callsites do properly handle error (and propagate EINTR), but floppy_revalidate() and floppy_check_events() don't. Fix this. Spotted by 'syzkaller' tool. Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2016-02-01 11:19:17 +01:00
Jens Axboe	ed0ae43c9d	Merge branch 'stable/for-jens-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus	2016-01-30 22:04:52 -07:00
Bob Liu	3db70a8532	xen/blkfront: realloc ring info in blkif_resume Need to reallocate ring info in the resume path, because info->rinfo was freed in blkif_free(). And 'multi-queue-max-queues' backend reports may have been changed. Signed-off-by: Bob Liu <bob.liu@oracle.com> Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-29 22:02:34 -05:00
Herbert Xu	9534d67195	drbd: Use shash and ahash This patch replaces uses of the long obsolete hash interface with either shash (for non-SG users) or ahash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-01-27 20:36:08 +08:00
Herbert Xu	84a2c9319d	block: cryptoloop - Use new skcipher interface This patch replaces uses of blkcipher with the new skcipher interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-01-27 20:35:43 +08:00
Linus Torvalds	00e3f5cc30	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "The two main changes are aio support in CephFS, and a series that fixes several issues in the authentication key timeout/renewal code. On top of that are a variety of cleanups and minor bug fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: remove outdated comment libceph: kill off ceph_x_ticket_handler::validity libceph: invalidate AUTH in addition to a service ticket libceph: fix authorizer invalidation, take 2 libceph: clear messenger auth_retry flag if we fault libceph: fix ceph_msg_revoke() libceph: use list_for_each_entry_safe ceph: use i_size_{read,write} to get/set i_size ceph: re-send AIO write request when getting -EOLDSNAP error ceph: Asynchronous IO support ceph: Avoid to propagate the invalid page point ceph: fix double page_unlock() in page_mkwrite() rbd: delete an unnecessary check before rbd_dev_destroy() libceph: use list_next_entry instead of list_entry_next ceph: ceph_frag_contains_value can be boolean ceph: remove unused functions in ceph_frag.h	2016-01-24 12:34:13 -08:00
Linus Torvalds	cc673757e2	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull final vfs updates from Al Viro: - The ->i_mutex wrappers (with small prereq in lustre) - a fix for too early freeing of symlink bodies on shmem (they need to be RCU-delayed) (-stable fodder) - followup to dedupe stuff merged this cycle * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: abort dedupe loop if fatal signals are pending make sure that freeing shmem fast symlinks is RCU-delayed wrappers for ->i_mutex access lustre: remove unused declaration	2016-01-23 12:24:56 -08:00
Tetsuo Handa	1d5cfdb076	tree wide: use kvfree() than conditional kfree()/vfree() There are many locations that do if (memory_was_allocated_by_vmalloc) vfree(ptr); else kfree(ptr); but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory using is_vmalloc_addr(). Unless callers have special reasons, we can replace this branch with kvfree(). Please check and reply if you found problems. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Jan Kara <jack@suse.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net> Acked-by: David Rientjes <rientjes@google.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Boris Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-22 17:02:18 -08:00
Al Viro	5955102c99	wrappers for ->i_mutex access parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-01-22 18:04:28 -05:00
Linus Torvalds	0a13daedf7	Merge branch 'for-4.5/lightnvm' of git://git.kernel.dk/linux-block Pull lightnvm fixes and updates from Jens Axboe: "This should have been part of the drivers branch, but it arrived a bit late and wasn't based on the official core block driver branch. So they got a small scolding, but got a pass since it's still new. Hence it's in a separate branch. This is mostly pure fixes, contained to lightnvm/, and minor feature additions" * 'for-4.5/lightnvm' of git://git.kernel.dk/linux-block: (26 commits) lightnvm: ensure that nvm_dev_ops can be used without CONFIG_NVM lightnvm: introduce factory reset lightnvm: use system block for mm initialization lightnvm: introduce ioctl to initialize device lightnvm: core on-disk initialization lightnvm: introduce mlc lower page table mappings lightnvm: add mccap support lightnvm: manage open and closed blocks separately lightnvm: fix missing grown bad block type lightnvm: reference rrpc lun in rrpc block lightnvm: introduce nvm_submit_ppa lightnvm: move rq->error to nvm_rq->error lightnvm: support multiple ppas in nvm_erase_ppa lightnvm: move the pages per block check out of the loop lightnvm: sectors first in ppa list lightnvm: fix locking and mempool in rrpc_lun_gc lightnvm: put block back to gc list on its reclaim fail lightnvm: check bi_error in gc lightnvm: return the get_bb_tbl return value lightnvm: refactor end_io functions for sync ...	2016-01-21 19:01:55 -08:00
Linus Torvalds	641203549a	Merge branch 'for-4.5/drivers' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "This is the block driver pull request for 4.5, with the exception of NVMe, which is in a separate branch and will be posted after this one. This pull request contains: - A set of bcache stability fixes, which have been acked by Kent. These have been used and tested for more than a year by the community, so it's about time that they got in. - A set of drbd updates from the drbd team (Andreas, Lars, Philipp) and Markus Elfring, Oleg Drokin. - A set of fixes for xen blkback/front from the usual suspects, (Bob, Konrad) as well as community based fixes from Kiri, Julien, and Peng. - A 2038 time fix for sx8 from Shraddha, with a fix from me. - A small mtip32xx cleanup from Zhu Yanjun. - A null_blk division fix from Arnd" * 'for-4.5/drivers' of git://git.kernel.dk/linux-block: (71 commits) null_blk: use sector_div instead of do_div mtip32xx: restrict variables visible in current code module xen/blkfront: Fix crash if backend doesn't follow the right states. xen/blkback: Fix two memory leaks. xen/blkback: make st_ statistics per ring xen/blkfront: Handle non-indirect grant with 64KB pages xen-blkfront: Introduce blkif_ring_get_request xen-blkback: clear PF_NOFREEZE for xen_blkif_schedule() xen/blkback: Free resources if connect_ring failed. xen/blocks: Return -EXX instead of -1 xen/blkback: make pool of persistent grants and free pages per-queue xen/blkback: get the number of hardware queues/rings from blkfront xen/blkback: pseudo support for multi hardware queues/rings xen/blkback: separate ring information out of struct xen_blkif xen/blkfront: correct setting for xen_blkif_max_ring_order xen/blkfront: make persistent grants pool per-queue xen/blkfront: Remove duplicate setting of ->xbdev. xen/blkfront: Cleanup of comments, fix unaligned variables, and syntax errors. xen/blkfront: negotiate number of queues/rings to be used with backend xen/blkfront: split per device io_lock ...	2016-01-21 18:19:38 -08:00
Markus Elfring	1761b22966	rbd: delete an unnecessary check before rbd_dev_destroy() The rbd_dev_destroy() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2016-01-21 19:36:07 +01:00
Linus Torvalds	7c24d9f3b2	Merge branch 'for-4.5/core' of git://git.kernel.dk/linux-block Pull core block updates from Jens Axboe: "We don't have a lot of core changes this time around, it's mostly in drivers, which will come in a subsequent pull. The cores changes include: - blk-mq - Prep patch from Christoph, changing blk_mq_alloc_request() to take flags instead of just using gfp_t for sleep/nosleep. - Doc patch from me, clarifying the difference between legacy and blk-mq for timer usage. - Fixes from Raghavendra for memory-less numa nodes, and a reuse of CPU masks. - Cleanup from Geliang Tang, using offset_in_page() instead of open coding it. - From Ilya, rename request_queue slab to it reflects what it holds, and a fix for proper use of bdgrab/put. - A real fix for the split across stripe boundaries from Keith. We yanked a broken version of this from 4.4-rc final, this one works. - From Mike Krinkin, emit a trace message when we split. - From Wei Tang, two small cleanups, not explicitly clearing memory that is already cleared" * 'for-4.5/core' of git://git.kernel.dk/linux-block: block: use bd{grab,put}() instead of open-coding block: split bios to max possible length block: add call to split trace point blk-mq: Avoid memoryless numa node encoded in hctx numa_node blk-mq: Reuse hardware context cpumask for tags blk-mq: add a flags parameter to blk_mq_alloc_request Revert "blk-flush: Queue through IO scheduler when flush not required" block: clarify blk_add_timer() use case for blk-mq bio: use offset_in_page macro block: do not initialise statics to 0 or NULL block: do not initialise globals to 0 or NULL block: rename request_queue slab cache	2016-01-19 15:03:34 -08:00
Dan Williams	34c0fd540e	mm, dax, pmem: introduce pfn_t For the purpose of communicating the optional presence of a 'struct page' for the pfn returned from ->direct_access(), introduce a type that encapsulates a page-frame-number plus flags. These flags contain the historical "page_link" encoding for a scatterlist entry, but can also denote "device memory". Where "device memory" is a set of pfns that are not part of the kernel's linear mapping by default, but are accessed via the same memory controller as ram. The motivation for this new type is large capacity persistent memory that needs struct page entries in the 'memmap' to support 3rd party DMA (i.e. O_DIRECT I/O with a persistent memory source/target). However, we also need it in support of maintaining a list of mapped inodes which need to be unmapped at driver teardown or freeze_bdev() time. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave@sr71.net> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Jerome Marchand	17ec4cd985	zram: don't call idr_remove() from zram_remove() The use of idr_remove() is forbidden in the callback functions of idr_for_each(). It is therefore unsafe to call idr_remove in zram_remove(). This patch moves the call to idr_remove() from zram_remove() to hot_remove_store(). In the detroy_devices() path, idrs are removed by idr_destroy(). This solves an use-after-free detected by KASan. [akpm@linux-foundation.org: fix coding stype, per Sergey] Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> [4.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Linus Torvalds	875fc4f5dd	Merge branch 'akpm' (patches from Andrew) Merge first patch-bomb from Andrew Morton: - A few hotfixes which missed 4.4 becasue I was asleep. cc'ed to -stable - A few misc fixes - OCFS2 updates - Part of MM. Including pretty large changes to page-flags handling and to thp management which have been buffered up for 2-3 cycles now. I have a lot of MM material this time. [ It turns out the THP part wasn't quite ready, so that got dropped from this series - Linus ] * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits) zsmalloc: reorganize struct size_class to pack 4 bytes hole mm/zbud.c: use list_last_entry() instead of list_tail_entry() zram/zcomp: do not zero out zcomp private pages zram: pass gfp from zcomp frontend to backend zram: try vmalloc() after kmalloc() zram/zcomp: use GFP_NOIO to allocate streams mm: add tracepoint for scanning pages drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64 mm/page_isolation: use macro to judge the alignment mm: fix noisy sparse warning in LIBCFS_ALLOC_PRE() mm: rework virtual memory accounting include/linux/memblock.h: fix ordering of 'flags' argument in comments mm: move lru_to_page to mm_inline.h Documentation/filesystems: describe the shared memory usage/accounting memory-hotplug: don't BUG() in register_memory_resource() hugetlb: make mm and fs code explicitly non-modular mm/swapfile.c: use list_for_each_entry_safe in free_swap_count_continuations mm: /proc/pid/clear_refs: no need to clear VM_SOFTDIRTY in clear_soft_dirty_pmd() mm: make sure isolate_lru_page() is never called for tail page vmstat: make vmstat_updater deferrable again and shut down on idle ...	2016-01-15 11:41:44 -08:00
Sergey Senozhatsky	e02d238c98	zram/zcomp: do not zero out zcomp private pages Do not __GFP_ZERO allocated zcomp ->private pages. We keep allocated streams around and use them for read/write requests, so we supply a zeroed out ->private to compression algorithm as a scratch buffer only once -- the first time we use that stream. For the rest of IO requests served by this stream ->private usually contains some temporarily data from the previous requests. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 11:40:52 -08:00
Minchan Kim	75d8947a36	zram: pass gfp from zcomp frontend to backend Each zcomp backend uses own gfp flag but it's pointless because the context they could be called is driven by upper layer(ie, zcomp frontend). As well, zcomp frondend could call them in different context. One context(ie, zram init part) is it should be better to make sure successful allocation other context(ie, further stream allocation part for accelarating I/O speed) is just optional so let's pass gfp down from driver (ie, zcomp frontend) like normal MM convention. [sergey.senozhatsky@gmail.com: add missing __vmalloc zero and highmem gfps] Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 11:40:51 -08:00
Kyeongdon Kim	d913897aba	zram: try vmalloc() after kmalloc() When we're using LZ4 multi compression streams for zram swap, we found out page allocation failure message in system running test. That was not only once, but a few(2 - 5 times per test). Also, some failure cases were continually occurring to try allocation order 3. In order to make parallel compression private data, we should call kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order 2/3 size memory to allocate in that time, page allocation fails. This patch makes to use vmalloc() as fallback of kmalloc(), this prevents page alloc failure warning. After using this, we never found warning message in running test, also It could reduce process startup latency about 60-120ms in each case. For reference a call trace : Binder_1: page allocation failure: order:3, mode:0x10c0d0 CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20 Call trace: dump_backtrace+0x0/0x270 show_stack+0x10/0x1c dump_stack+0x1c/0x28 warn_alloc_failed+0xfc/0x11c __alloc_pages_nodemask+0x724/0x7f0 __get_free_pages+0x14/0x5c kmalloc_order_trace+0x38/0xd8 zcomp_lz4_create+0x2c/0x38 zcomp_strm_alloc+0x34/0x78 zcomp_strm_multi_find+0x124/0x1ec zcomp_strm_find+0xc/0x18 zram_bvec_rw+0x2fc/0x780 zram_make_request+0x25c/0x2d4 generic_make_request+0x80/0xbc submit_bio+0xa4/0x15c __swap_writepage+0x218/0x230 swap_writepage+0x3c/0x4c shrink_page_list+0x51c/0x8d0 shrink_inactive_list+0x3f8/0x60c shrink_lruvec+0x33c/0x4cc shrink_zone+0x3c/0x100 try_to_free_pages+0x2b8/0x54c __alloc_pages_nodemask+0x514/0x7f0 __get_free_pages+0x14/0x5c proc_info_read+0x50/0xe4 vfs_read+0xa0/0x12c SyS_read+0x44/0x74 DMA: 33974kB (MC) 268kB (RC) 016kB 032kB 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 0*4096kB = 13796kB [minchan@kernel.org: change vmalloc gfp and adding comment about gfp] [sergey.senozhatsky@gmail.com: tweak comments and styles] Signed-off-by: Kyeongdon Kim <kyeongdon.kim@lge.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 11:40:51 -08:00
Sergey Senozhatsky	3d5fe03a3e	zram/zcomp: use GFP_NOIO to allocate streams We can end up allocating a new compression stream with GFP_KERNEL from within the IO path, which may result is nested (recursive) IO operations. That can introduce problems if the IO path in question is a reclaimer, holding some locks that will deadlock nested IOs. Allocate streams and working memory using GFP_NOIO flag, forbidding recursive IO and FS operations. An example: inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. git/20158 [HC0[0]:SC0[0]:HE1:SE1] takes: (jbd2_handle){+.+.?.}, at: start_this_handle+0x4ca/0x555 {IN-RECLAIM_FS-W} state was registered at: __lock_acquire+0x8da/0x117b lock_acquire+0x10c/0x1a7 start_this_handle+0x52d/0x555 jbd2__journal_start+0xb4/0x237 __ext4_journal_start_sb+0x108/0x17e ext4_dirty_inode+0x32/0x61 __mark_inode_dirty+0x16b/0x60c iput+0x11e/0x274 __dentry_kill+0x148/0x1b8 shrink_dentry_list+0x274/0x44a prune_dcache_sb+0x4a/0x55 super_cache_scan+0xfc/0x176 shrink_slab.part.14.constprop.25+0x2a2/0x4d3 shrink_zone+0x74/0x140 kswapd+0x6b7/0x930 kthread+0x107/0x10f ret_from_fork+0x3f/0x70 irq event stamp: 138297 hardirqs last enabled at (138297): debug_check_no_locks_freed+0x113/0x12f hardirqs last disabled at (138296): debug_check_no_locks_freed+0x33/0x12f softirqs last enabled at (137818): __do_softirq+0x2d3/0x3e9 softirqs last disabled at (137813): irq_exit+0x41/0x95 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(jbd2_handle); <Interrupt> lock(jbd2_handle); * DEADLOCK * 5 locks held by git/20158: #0: (sb_writers#7){.+.+.+}, at: [<ffffffff81155411>] mnt_want_write+0x24/0x4b #1: (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffff81145087>] lock_rename+0xd9/0xe3 #2: (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8114f8e2>] lock_two_nondirectories+0x3f/0x6b #3: (&sb->s_type->i_mutex_key#11/4){+.+.+.}, at: [<ffffffff8114f909>] lock_two_nondirectories+0x66/0x6b #4: (jbd2_handle){+.+.?.}, at: [<ffffffff811e31db>] start_this_handle+0x4ca/0x555 stack backtrace: CPU: 2 PID: 20158 Comm: git Not tainted 4.1.0-rc7-next-20150615-dbg-00016-g8bdf555-dirty #211 Call Trace: dump_stack+0x4c/0x6e mark_lock+0x384/0x56d mark_held_locks+0x5f/0x76 lockdep_trace_alloc+0xb2/0xb5 kmem_cache_alloc_trace+0x32/0x1e2 zcomp_strm_alloc+0x25/0x73 [zram] zcomp_strm_multi_find+0xe7/0x173 [zram] zcomp_strm_find+0xc/0xe [zram] zram_bvec_rw+0x2ca/0x7e0 [zram] zram_make_request+0x1fa/0x301 [zram] generic_make_request+0x9c/0xdb submit_bio+0xf7/0x120 ext4_io_submit+0x2e/0x43 ext4_bio_write_page+0x1b7/0x300 mpage_submit_page+0x60/0x77 mpage_map_and_submit_buffers+0x10f/0x21d ext4_writepages+0xc8c/0xe1b do_writepages+0x23/0x2c __filemap_fdatawrite_range+0x84/0x8b filemap_flush+0x1c/0x1e ext4_alloc_da_blocks+0xb8/0x117 ext4_rename+0x132/0x6dc ? mark_held_locks+0x5f/0x76 ext4_rename2+0x29/0x2b vfs_rename+0x540/0x636 SyS_renameat2+0x359/0x44d SyS_rename+0x1e/0x20 entry_SYSCALL_64_fastpath+0x12/0x6f [minchan@kernel.org: add stable mark] Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Kyeongdon Kim <kyeongdon.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 11:40:51 -08:00
Linus Torvalds	7d1fc01afc	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: floppy: make local variable non-static exynos: fixes an incorrect header guard dt-bindings: fixes some incorrect header guards cpufreq-dt: correct dead link in documentation cpufreq: ARM big LITTLE: correct dead link in documentation treewide: Fix typos in printk Documentation: filesystem: Fix typo in fs/eventfd.c fs/super.c: use && instead of & for warn_on condition Documentation: fix sysfs-ptp lib: scatterlist: fix Kconfig description	2016-01-14 17:04:19 -08:00
Linus Torvalds	1289ace5b4	SCSI misc on 20160113 This pull includes driver updates from the usual suspects (bfa, arcmsr, scsi_dh_alua, lpfc, storvsc, cxlflash). The major change is the addition of the hisi_sas driver, which is an ARM platform device for SAS. The other change of note is an enormous style transformation to the atp870u driver (which is our worst written SCSI driver). Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABAgAGBQJWluqqAAoJEDeqqVYsXL0McuwH/1oqvFOagsvoDcwDyNAUR/eW VAH454ndIJ0eSXORNfA7ko3ZQKa53x1WN9eKr+RHI7lpGCjwBz2MjnvQsnKISvXp K0owkJTcAAF+Wdq7rdNlm1VlQHuLvG8TMTnno+NY3CtxCR2yiRWlctkNkjr0rWUv leXJkXZSThkkiY/rEDZZXee8Ajwac87QT+ELEqCT2HueGZD+J8s59JpsOtZdt6Bj n94ydOuct8hF3Xt3pdu1oDRpWpoJIyjHtYhdrvzSiKKBHTWtuq1oN0Cwnp0qtEDD X3K1Mr0yBmAjTOsK+y+bZnJ1y7qJLLt5ZHmVixkzFWujXPNbrIsyYkV5eI432XA= =ggNi -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull first round of SCSI updates from James Bottomley: "This includes driver updates from the usual suspects (bfa, arcmsr, scsi_dh_alua, lpfc, storvsc, cxlflash). The major change is the addition of the hisi_sas driver, which is an ARM platform device for SAS. The other change of note is an enormous style transformation to the atp870u driver (which is our worst written SCSI driver)" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (169 commits) cxlflash: Enable device id for future IBM CXL adapter cxlflash: Resolve oops in wait_port_offline cxlflash: Fix to resolve cmd leak after host reset cxlflash: Removed driver date print cxlflash: Fix to avoid virtual LUN failover failure cxlflash: Fix to escalate LINK_RESET also on port 1 storvsc: Tighten up the interrupt path storvsc: Refactor the code in storvsc_channel_init() storvsc: Properly support Fibre Channel devices storvsc: Fix a bug in the layout of the hv_fc_wwn_packet mvsas: Add SGPIO support to Marvell 94xx mpt3sas: A correction in unmap_resources hpsa: Add box and bay information for enclosure devices hpsa: Change SAS transport devices to bus 0. hpsa: fix path_info_show cciss: print max outstanding commands as a hex value scsi_debug: Increase the reported optimal transfer length lpfc: Update version to 11.0.0.10 for upstream patch set lpfc: Use kzalloc instead of kmalloc lpfc: Delete unnecessary checks before the function call "mempool_destroy" ...	2016-01-13 19:37:36 -08:00
Arnd Bergmann	e93d12ae3b	null_blk: use sector_div instead of do_div Dividing a sector_t number should be done using sector_div rather than do_div to optimize the 32-bit sector_t case, and with the latest do_div optimizations, we now get a compile-time warning for this: arch/arm/include/asm/div64.h:32:95: note: expected 'uint64_t * {aka long long unsigned int }' but argument is of type 'sector_t {aka long unsigned int *}' drivers/block/null_blk.c:521:81: warning: comparison of distinct pointer types lacks a cast This changes the newly added code to use sector_div. It is a simplified version of the original patch, as Linus Torvalds pointed out that we should not be using an expensive division function in the first place. This version was suggested by Matias Bjorling. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Matias Bjorling <m@bjorling.me> Fixes: `b2b7e00148` ("null_blk: register as a LightNVM device") Signed-off-by: Jens Axboe <axboe@fb.com>	2016-01-13 15:10:34 -07:00
Jens Axboe	038a75afc5	Merge branch 'stable/for-jens-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-4.5/drivers Konrad writes: The pull is based on converting the backend driver into an multiqueue driver and exposing more than one queue to the frontend. As such we had to modify the frontend and also fix a bunch of bugs around this. The original work is based on Arianna Avanzini's work as an OPW intern. Bob took over the work and had been massaging it for quite some time. Also included are are features to 64KB page support for ARM and various bug-fixes.	2016-01-13 08:20:36 -07:00
Matias Bjørling	91276162de	lightnvm: refactor end_io functions for sync To implement sync I/O support within the LightNVM core, the end_io functions are refactored to take an end_io function pointer instead of testing for initialized media manager, followed by calling its end_io function. Sync I/O can then be implemented using a callback that signal I/O completion. This is similar to the logic found in blk_to_execute_io(). By implementing it this way, the underlying device I/Os submission logic is abstracted away from core, targets, and media managers. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-01-12 08:21:16 -07:00
Al Viro	263a3df18f	nbd: use ->compat_ioctl() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-01-08 21:20:32 -05:00
Al Viro	6108209c4a	Merge branch 'for-linus' into work.misc	2016-01-08 21:20:11 -05:00
Zhu Yanjun	9e35fdcb9c	mtip32xx: restrict variables visible in current code module The modified variables are only used in the file mtip32xx.c. As such, the static keyword is inserted to define that object to be only visible to the current code module during compilation. Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-01-08 11:47:53 -07:00
James Bottomley	abaee091a1	Merge branch 'jejb-scsi' into misc	2016-01-07 15:51:13 -08:00
Al Viro	820351f05b	rsxx: don't open-code memdup_user() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-01-06 08:25:24 -05:00
Al Viro	8ed6010d50	mtip32xx: don't open-code memdup_user() [folded a fix by Dan Carpenter] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-01-06 08:24:39 -05:00
Colin Ian King	a8036dfba9	cciss: print max outstanding commands as a hex value The max outstanding commands is being printed with a 0x prefix to suggest it is a hex value, when in fact the integer decimal %d format specifier is being used and this is a bit confusing. Use %x instead to match the proceeding 0x prefix. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-01-04 19:45:01 -05:00
Konrad Rzeszutek Wilk	c31ecf6c12	xen/blkfront: Fix crash if backend doesn't follow the right states. We have split the setting up of all the resources in two steps: 1) talk_to_blkback - which figures out the num_ring_pages (from the default value of zero), sets up shadow and so 2) blkfront_connect - does the real part of filling out the internal structures. The problem is if we bypass the 1) step and go straight to 2) and call blkfront_setup_indirect where we use the macro BLK_RING_SIZE - which returns an negative value (because sz is zero - since num_ring_pages is zero - since it has never been set). We can fix this by making sure that we always have called talk_to_blkback before going to blkfront_connect. Or we could set in blkfront_probe info->nr_ring_pages = 1 to have a default value. But that looks odd - as we haven't actually negotiated any ring size. This patch changes XenbusStateConnected state to detect if we haven't done the initial handshake - and if so continue on as if were in XenbusStateInitWait state. We also roll the error recovery (freeing the structure) into talk_to_blkback error path - which is safe since that function is only called from blkback_changed. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 12:21:26 -05:00
Bob Liu	93bb277f97	xen/blkback: Fix two memory leaks. This patch fixs two memleaks: backtrace: [<ffffffff817ba5e8>] kmemleak_alloc+0x28/0x50 [<ffffffff81205e3b>] kmem_cache_alloc+0xbb/0x1d0 [<ffffffff81534028>] xen_blkbk_probe+0x58/0x230 [<ffffffff8146adb6>] xenbus_dev_probe+0x76/0x130 [<ffffffff81511716>] driver_probe_device+0x166/0x2c0 [<ffffffff815119bc>] __device_attach_driver+0xac/0xb0 [<ffffffff8150fa57>] bus_for_each_drv+0x67/0x90 [<ffffffff81511ab7>] __device_attach+0xc7/0x120 [<ffffffff81511b23>] device_initial_probe+0x13/0x20 [<ffffffff8151059a>] bus_probe_device+0x9a/0xb0 [<ffffffff8150f0a1>] device_add+0x3b1/0x5c0 [<ffffffff8150f47e>] device_register+0x1e/0x30 [<ffffffff8146a9e8>] xenbus_probe_node+0x158/0x170 [<ffffffff8146abaf>] xenbus_dev_changed+0x1af/0x1c0 [<ffffffff8146b1bb>] backend_changed+0x1b/0x20 [<ffffffff81468ca6>] xenwatch_thread+0xb6/0x160 unreferenced object 0xffff880007ba8ef8 (size 224): backtrace: [<ffffffff817ba5e8>] kmemleak_alloc+0x28/0x50 [<ffffffff81205c73>] __kmalloc+0xd3/0x1e0 [<ffffffff81534d87>] frontend_changed+0x2c7/0x580 [<ffffffff8146af12>] xenbus_otherend_changed+0xa2/0xb0 [<ffffffff8146b2c0>] frontend_changed+0x10/0x20 [<ffffffff81468ca6>] xenwatch_thread+0xb6/0x160 [<ffffffff810d3e97>] kthread+0xd7/0xf0 [<ffffffff817c4a9f>] ret_from_fork+0x3f/0x70 [<ffffffffffffffff>] 0xffffffffffffffff unreferenced object 0xffff8800048dcd38 (size 224): The first leak is caused by not put() the be->blkif reference which we had gotten in xen_blkif_alloc(), while the second is us not freeing blkif->rings in the right place. Signed-off-by: Bob Liu <bob.liu@oracle.com> Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 12:21:26 -05:00
Bob Liu	db6fbc1067	xen/blkback: make st_ statistics per ring Make st_* statistics per ring and the VBD sysfs would iterate over all the rings. Note: xenvbd_sysfs_delif() is called in xen_blkbk_remove() before all rings are torn down, so it's safe. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- v2: Aligned the variables on the same column.	2016-01-04 12:21:25 -05:00
Julien Grall	6cc5683390	xen/blkfront: Handle non-indirect grant with 64KB pages The minimal size of request in the block framework is always PAGE_SIZE. It means that when 64KB guest is support, the request will at least be 64KB. Although, if the backend doesn't support indirect descriptor (such as QDISK in QEMU), a ring request is only able to accommodate 11 segments of 4KB (i.e 44KB). The current frontend is assuming that an I/O request will always fit in a ring request. This is not true any more when using 64KB page granularity and will therefore crash during boot. On ARM64, the ABI is completely neutral to the page granularity used by the domU. The guest has the choice between different page granularity supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB). This can't be enforced by the hypervisor and therefore it's possible to run guests using different page granularity. So we can't mandate the block backend to support indirect descriptor when the frontend is using 64KB page granularity and have to fix it properly in the frontend. The solution exposed below is based on modifying directly the frontend guest rather than asking the block framework to support smaller size (i.e < PAGE_SIZE). This is because the change is the block framework are not trivial as everything seems to relying on a struct page (see [1]). Although, it may be possible that someone succeed to do it in the future and we would therefore be able to use it. Given that a block request may not fit in a single ring request, a second request is introduced for the data that cannot fit in the first one. This means that the second ring request should never be used on Linux if the page size is smaller than 44KB. To achieve the support of the extra ring request, the block queue size is divided by two. Therefore, the ring will always contain enough space to accommodate 2 ring requests. While this will reduce the overall performance, it will make the implementation more contained. The way forward to get better performance is to implement in the backend either indirect descriptor or multiple grants ring. Note that the parameters blk_queue_max_ helpers haven't been updated. The block code will set the mimimum size supported and we may be able to support directly any change in the block framework that lower down the minimal size of a request. [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.html Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 12:21:25 -05:00
Julien Grall	2e073969d5	xen-blkfront: Introduce blkif_ring_get_request The code to get a request is always the same. Therefore we can factorize it in a single function. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 12:21:24 -05:00
Jiri Kosina	a6e7af1288	xen-blkback: clear PF_NOFREEZE for xen_blkif_schedule() xen_blkif_schedule() kthread calls try_to_freeze() at the beginning of every attempt to purge the LRU. This operation can't ever succeed though, as the kthread hasn't marked itself as freezable. Before (hopefully eventually) kthread freezing gets converted to fileystem freezing, we'd rather mark xen_blkif_schedule() freezable (as it can generate I/O during suspend). Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 12:21:24 -05:00
Konrad Rzeszutek Wilk	2d0382fac1	xen/blkback: Free resources if connect_ring failed. With the multi-queue support we could fail at setting up some of the rings and fail the connection. That meant that all resources tied to rings[0..n-1] (where n is the ring that failed to be setup). Eventually the frontend will switch to the states and we will call xen_blkif_disconnect. However we do not want to be at the mercy of the frontend deciding when to change states. This allows us to do the cleanup right away and freeing resources. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 12:21:07 -05:00
Konrad Rzeszutek Wilk	bde21f73b9	xen/blocks: Return -EXX instead of -1 Lets return sensible values instead of -1. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 12:21:07 -05:00
Bob Liu	d4bf0065b7	xen/blkback: make pool of persistent grants and free pages per-queue Make pool of persistent grants and free pages per-queue/ring instead of per-device to get better scalability. Test was done based on null_blk driver: dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk" domu: v4.2-rc8 16vcpus 10GB [test] rw=read direct=1 ioengine=libaio bs=4k time_based runtime=30 filename=/dev/xvdb numjobs=16 iodepth=64 iodepth_batch=64 iodepth_batch_complete=64 group_reporting Results: iops1: After patch "xen/blkfront: make persistent grants per-queue". iops2: After this patch. Queues: 1 4 8 16 Iops orig(k): 810 1064 780 700 Iops1(k): 810 1230(~20%) 1024(~20%) 850(~20%) Iops2(k): 810 1410(~35%) 1354(~75%) 1440(~100%) With 4 queues after this commit we can get ~75% increase in IOPS, and performance won't drop if increasing queue numbers. Please find the respective chart in this link: https://www.dropbox.com/s/agrcy2pbzbsvmwv/iops.png?dl=0 Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 12:21:06 -05:00
Bob Liu	d62d860003	xen/blkback: get the number of hardware queues/rings from blkfront Backend advertises "multi-queue-max-queues" to front, also get the negotiated number from "multi-queue-num-queues" written by blkfront. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 12:21:06 -05:00
Konrad Rzeszutek Wilk	2fb1ef4f12	xen/blkback: pseudo support for multi hardware queues/rings Preparatory patch for multiple hardware queues (rings). The number of rings is unconditionally set to 1, larger number will be enabled in "xen/blkback: get the number of hardware queues/rings from blkfront". Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- v2: Align variables in the structures.	2016-01-04 12:21:05 -05:00
Bob Liu	597957000a	xen/blkback: separate ring information out of struct xen_blkif Split per ring information to an new structure "xen_blkif_ring", so that one vbd device can be associated with one or more rings/hardware queues. Introduce 'pers_gnts_lock' to protect the pool of persistent grants since we may have multi backend threads. This patch is a preparation for supporting multi hardware queues/rings. Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- v2: Align the variables in the structure.	2016-01-04 12:21:05 -05:00
Peng Fan	45fc82642e	xen/blkfront: correct setting for xen_blkif_max_ring_order According to this piece code: " pr_info("Invalid max_ring_order (%d), will use default max: %d.\n", xen_blkif_max_ring_order, XENBUS_MAX_RING_GRANT_ORDER); " if xen_blkif_max_ring_order is bigger that XENBUS_MAX_RING_GRANT_ORDER, need to set xen_blkif_max_ring_order using XENBUS_MAX_RING_GRANT_ORDER, but not 0. Signed-off-by: Peng Fan <van.freenix@gmail.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 12:21:04 -05:00
Bob Liu	73716df7da	xen/blkfront: make persistent grants pool per-queue Make persistent grants per-queue/ring instead of per-device, so that we can drop the 'dev_lock' and get better scalability. Test was done based on null_blk driver: dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk" domu: v4.2-rc8 16vcpus 10GB [test] rw=read direct=1 ioengine=libaio bs=4k time_based runtime=30 filename=/dev/xvdb numjobs=16 iodepth=64 iodepth_batch=64 iodepth_batch_complete=64 group_reporting Queues: 1 4 8 16 Iops orig(k): 810 1064 780 700 Iops patched(k): 810 1230(~20%) 1024(~20%) 850(~20%) Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 12:21:03 -05:00
Bob Liu	75f070b396	xen/blkfront: Remove duplicate setting of ->xbdev. We do the same exact operations a bit earlier in the function. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 12:21:03 -05:00
Konrad Rzeszutek Wilk	6f03a7ff89	xen/blkfront: Cleanup of comments, fix unaligned variables, and syntax errors. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 12:21:02 -05:00
Bob Liu	28d949bcc2	xen/blkfront: negotiate number of queues/rings to be used with backend The max number of hardware queues for xen/blkfront is set by parameter 'max_queues'(default 4), while it is also capped by the max value that the xen/blkback exposes through XenStore key 'multi-queue-max-queues'. The negotiated number is the smaller one and would be written back to xenstore as "multi-queue-num-queues", blkback needs to read this negotiated number. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 12:21:02 -05:00
Bob Liu	11659569f7	xen/blkfront: split per device io_lock After patch "xen/blkfront: separate per ring information out of device info", per-ring data is protected by a per-device lock ('io_lock'). This is not a good way and will effect the scalability, so introduce a per-ring lock ('ring_lock'). The old 'io_lock' is renamed to 'dev_lock' which protects the ->grants list and ->persistent_gnts_c which are shared by all rings. Note that in 'blkfront_probe' the 'blkfront_info' is setup via kzalloc so setting ->persistent_gnts_c to zero is not needed. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 12:21:01 -05:00
Bob Liu	3df0e50599	xen/blkfront: pseudo support for multi hardware queues/rings Preparatory patch for multiple hardware queues (rings). The number of rings is unconditionally set to 1, larger number will be enabled in patch "xen/blkfront: negotiate number of queues/rings to be used with backend" so as to make review easier. Note that blkfront_gather_backend_features does not call blkfront_setup_indirect anymore (as that needs to be done per ring). That means that in blkif_recover/blkif_connect we have to do it in a loop (bounded by nr_rings). Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 12:20:57 -05:00
Al Viro	e4e85bb091	cciss: switch to memdup_user_nul() all we do to buffer is strncmp()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-01-04 10:27:50 -05:00
Bob Liu	81f3516157	xen/blkfront: separate per ring information out of device info Split per ring information to a new structure "blkfront_ring_info". A ring is the representation of a hardware queue, every vbd device can associate with one or more rings depending on how many hardware queues/rings to be used. This patch is a preparation for supporting real multi hardware queues/rings. We also add a backpointer to 'struct blkfront_info' (dev_info) which is not needed (we could use containers_of) but further patch ("xen/blkfront: pseudo support for multi hardware queues/rings") will make allocation of 'blkfront_ring_info' dynamic. Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2016-01-04 09:56:02 -05:00
Jens Axboe	48cc661e7f	null_blk: use async queue restart helper If null_blk is run in NULL_IRQ_TIMER mode and with queue_mode NULL_Q_RQ, we need to restart the queue from the hrtimer interrupt. We can't directly invoke the request_fn from that context, so punt the queue run to async kblockd context. Tested-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-28 13:07:09 -07:00
Jens Axboe	39fc8830eb	sx8: use real time for the command seconds Commit `8182503df1` used monotonic time, but if the adapter is using the seconds for logging entries, then we'll get duplicate entries if the system is rebooted. Use real time instead. Reported-by: Arnd Bergmann <arnd@arndb.de> Fixes: `8182503df1` ("block: sx8.c: Replace timeval with ktime_t") Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-23 08:42:59 -07:00
Linus Torvalds	24bc3ea5df	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block layer fixes from Jens Axboe: "Three small fixes for 4.4 final. Specifically: - The segment issue fix from Junichi, where the old IO path does a bio limit split before potentially bouncing the pages. We need to do that in the right order, to ensure that limitations are met. - A NVMe surprise removal IO hang fix from Keith. - A use-after-free in null_blk, introduced by a previous patch in this series. From Mike Krinkin" * 'for-linus' of git://git.kernel.dk/linux-block: null_blk: fix use-after-free error block: ensure to split after potentially bouncing a bio NVMe: IO ending fixes on surprise removal	2015-12-22 16:00:25 -08:00
Mike Krinkin	e827120146	null_blk: fix use-after-free error blk_end_request_all may free request, so we need to save request_queue pointer before blk_end_request_all call. The problem was introduced in commit `cf8ecc5a84` ("null_blk: guarantee device restart in all irq modes") and causes general protection fault with slab poisoning enabled. Fixes: `cf8ecc5a84` ("null_blk: guarantee device restart in all irq modes") Signed-off-by: Mike Krinkin <krinkin.m.u@gmail.com> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 10:42:48 -07:00
Shraddha Barke	8182503df1	block: sx8.c: Replace timeval with ktime_t 32-bit systems using 'struct timeval' will break in the year 2038, in order to avoid that replace the code with more appropriate types. This patch replaces timeval with 64 bit ktime_t which is y2038 safe. Since st->timestamp is only interested in seconds, directly using time64_t here. Function ktime_get_seconds is used since it uses monotonic instead of real time and thus will not cause overflow. Signed-off-by: Shraddha Barke <shraddha.6596@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 10:19:46 -07:00
Linus Torvalds	3273cba195	xen: bug fixes for 4.4-rc5 - XSA-155 security fixes to backend drivers. - XSA-157 security fixes to pciback. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJWdDrXAAoJEFxbo/MsZsTR3N0H/0Lvz6MWBARCje7livbz7nqE PS0Bea+2yAfNhCDDiDlpV0lor8qlyfWDF6lGhLjItldAzahag3ZDKDf1Z/lcQvhf 3MwFOcOVZE8lLtvLT6LGnPuehi1Mfdi1Qk1/zQhPhsq6+FLPLT2y+whmBihp8mMh C12f7KRg5r3U7eZXNB6MEtGA0RFrOp0lBdvsiZx3qyVLpezj9mIe0NueQqwY3QCS xQ0fILp/x2EnZNZuzgghFTPRxMAx5ReOezgn9Rzvq4aThD+irz1y6ghkYN4rG2s2 tyYOTqBnjJEJEQ+wmYMhnfCwVvDffztG+uI9hqN31QFJiNB0xsjSWFCkDAWchiU= =Argz -----END PGP SIGNATURE----- Merge tag 'for-linus-4.4-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: - XSA-155 security fixes to backend drivers. - XSA-157 security fixes to pciback. * tag 'for-linus-4.4-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen-pciback: fix up cleanup path when alloc fails xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set. xen/pciback: For XEN_PCI_OP_disable_msi[\|x] only disable if device has MSI(X) enabled. xen/pciback: Do not install an IRQ handler for MSI interrupts. xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled xen/pciback: Save xen_pci_op commands before processing it xen-scsiback: safely copy requests xen-blkback: read from indirect descriptors only once xen-blkback: only read request operation from shared ring once xen-netback: use RING_COPY_REQUEST() throughout xen-netback: don't use last request to determine minimum Tx credit xen: Add RING_COPY_REQUEST() xen/x86/pvh: Use HVM's flush_tlb_others op xen: Resume PMU from non-atomic context xen/events/fifo: Consume unprocessed events when a CPU dies	2015-12-18 12:24:52 -08:00
Roger Pau Monné	1877914910	xen-blkback: read from indirect descriptors only once Since indirect descriptors are in memory shared with the frontend, the frontend could alter the first_sect and last_sect values after they have been validated but before they are recorded in the request. This may result in I/O requests that overflow the foreign page, possibly overwriting local pages when the I/O request is executed. When parsing indirect descriptors, only read first_sect and last_sect once. This is part of XSA155. CC: stable@vger.kernel.org Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2015-12-18 10:00:37 -05:00

1 2 3 4 5 ...

5163 Commits