OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
James Bottomley	0917ac4f53	Merge remote-tracking branch 'mkp-scsi/4.11/scsi-fixes' into fixes	2017-03-29 10:10:30 -04:00
peter chang	bf33f87dd0	scsi: sg: check length passed to SG_NEXT_CMD_LEN The user can control the size of the next command passed along, but the value passed to the ioctl isn't checked against the usable max command size. Cc: <stable@vger.kernel.org> Signed-off-by: Peter Chang <dpf@google.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-03-16 19:46:33 -04:00
Dave Jiang	11bac80004	mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf ->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to take a vma and vmf parameter when the vma already resides in vmf. Remove the vma parameter to simplify things. [arnd@arndb.de: fix ARM build] Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:54 -08:00
Linus Torvalds	772c8f6f3b	for-4.11/linus-merge-signed -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCAAGBQJYqeb8AAoJEPfTWPspceCmB3UP/3UtcPrzEm8w2cxB9MaWhZN3 J+jiwlO4vaqhm2HVzQtoJqfaqRlud/iDx5cIXE2S7FnIM54ZKs3CANbKu8X+b1zm eJije3zMI8A8qyftigbz6a/Y2kWE4ZqFEc9WU5CWawfTl3ImCVUi8+F5X0wOLU/h r50zAQOEyURH4G5usNl9q0olF6FonJ82AcYm1iJ0QP2wYWZRJauC0rRn8IT93tyK bZPHnGKdkd7km8yi3zr2GNWOfuZZuA0HWAaF4qfrHPZQ883gITFAUIlFb1f+2TNl DkQzRrBB2wPWPnlbfb9KejMkvL94hflzsLb5rHt835DyVXFRyjxsgyAI8A+LPGSz vqZ3rsbWj6H4F9z2CkZ+T+AP/ZSWDNjwc0RXPm9HYdR5CDeTxIUVvnFQ44YNsmTv Xd5BKrUJ2oKegAxQG6zcuFx23p8JzhT70l+mNrMdtyeKnDD9FRdDvhKG9AHeTipn o/DnGivhS3UMQoQ7D68KOO+kuhLDeo7my5XGsnjzMO/iHqg++7IP2HyYYs/Ba4qZ cYaCtSDQW71Zt0vsqa6dvPuXBveu4h8Qh8R7uAGjSGS9IAFFb4Cab2tiUdISE6PE YnMWzY+G6pT8imlLVOL5/QFuo2Q4pUsaL0AHpXMCN9TZnQtbqXa8eqwnKnQ0m2KN 7ut0IYYEPaYUX5xFn1K6 =z7AL -----END PGP SIGNATURE----- Merge tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block Pull block layer updates from Jens Axboe: - blk-mq scheduling framework from me and Omar, with a port of the deadline scheduler for this framework. A port of BFQ from Paolo is in the works, and should be ready for 4.12. - Various fixups and improvements to the above scheduling framework from Omar, Paolo, Bart, me, others. - Cleanup of the exported sysfs blk-mq data into debugfs, from Omar. This allows us to export more information that helps debug hangs or performance issues, without cluttering or abusing the sysfs API. - Fixes for the sbitmap code, the scalable bitmap code that was migrated from blk-mq, from Omar. - Removal of the BLOCK_PC support in struct request, and refactoring of carrying SCSI payloads in the block layer. This cleans up the code nicely, and enables us to kill the SCSI specific parts of struct request, shrinking it down nicely. From Christoph mainly, with help from Hannes. - Support for ranged discard requests and discard merging, also from Christoph. - Support for OPAL in the block layer, and for NVMe as well. Mainly from Scott Bauer, with fixes/updates from various others folks. - Error code fixup for gdrom from Christophe. - cciss pci irq allocation cleanup from Christoph. - Making the cdrom device operations read only, from Kees Cook. - Fixes for duplicate bdi registrations and bdi/queue life time problems from Jan and Dan. - Set of fixes and updates for lightnvm, from Matias and Javier. - A few fixes for nbd from Josef, using idr to name devices and a workqueue deadlock fix on receive. Also marks Josef as the current maintainer of nbd. - Fix from Josef, overwriting queue settings when the number of hardware queues is updated for a blk-mq device. - NVMe fix from Keith, ensuring that we don't repeatedly mark and IO aborted, if we didn't end up aborting it. - SG gap merging fix from Ming Lei for block. - Loop fix also from Ming, fixing a race and crash between setting loop status and IO. - Two block race fixes from Tahsin, fixing request list iteration and fixing a race between device registration and udev device add notifiations. - Double free fix from cgroup writeback, from Tejun. - Another double free fix in blkcg, from Hou Tao. - Partition overflow fix for EFI from Alden Tondettar. * tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block: (156 commits) nvme: Check for Security send/recv support before issuing commands. block/sed-opal: allocate struct opal_dev dynamically block/sed-opal: tone down not supported warnings block: don't defer flushes on blk-mq + scheduling blk-mq-sched: ask scheduler for work, if we failed dispatching leftovers blk-mq: don't special case flush inserts for blk-mq-sched blk-mq-sched: don't add flushes to the head of requeue queue blk-mq: have blk_mq_dispatch_rq_list() return if we queued IO or not block: do not allow updates through sysfs until registration completes lightnvm: set default lun range when no luns are specified lightnvm: fix off-by-one error on target initialization Maintainers: Modify SED list from nvme to block Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN uapi: sed-opal fix IOW for activate lsp to use correct struct cdrom: Make device operations read-only elevator: fix loading wrong elevator type for blk-mq devices cciss: switch to pci_irq_alloc_vectors block/loop: fix race between I/O and set_status blk-mq-sched: don't hold queue_lock when calling exit_icq block: set make_request_fn manually in blk_mq_update_nr_hw_queues ...	2017-02-21 10:57:33 -08:00
Al Viro	137d01df51	Fix missing sanity check in /dev/sg What happens is that a write to /dev/sg is given a request with non-zero ->iovec_count combined with zero ->dxfer_len. Or with ->dxferp pointing to an array full of empty iovecs. Having write permission to /dev/sg shouldn't be equivalent to the ability to trigger BUG_ON() while holding spinlocks... Found by Dmitry Vyukov and syzkaller. [ The BUG_ON() got changed to a WARN_ON_ONCE(), but this fixes the underlying issue. - Linus ] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-19 09:54:31 -08:00
Christoph Hellwig	aebf526b53	block: fold cmd_type into the REQ_OP_ space Instead of keeping two levels of indirection for requests types, fold it all into the operations. The little caveat here is that previously cmd_type only applied to struct request, while the request and bio op fields were set to plain REQ_OP_READ/WRITE even for passthrough operations. Instead this patch adds new REQ_OP_* for SCSI passthrough and driver private requests, althought it has to add two for each so that we can communicate the data in/out nature of the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-01-31 14:00:44 -07:00
Christoph Hellwig	82ed4db499	block: split scsi_request out of struct request And require all drivers that want to support BLOCK_PC to allocate it as the first thing of their private data. To support this the legacy IDE and BSG code is switched to set cmd_size on their queues to let the block layer allocate the additional space. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-01-27 15:08:35 -07:00
Al Viro	128394eff3	sg_write()/bsg_write() is not fit to be called under KERNEL_DS Both damn things interpret userland pointers embedded into the payload; worse, they are actually traversing those. Leaving aside the bad API design, this is very much _not_ safe to call with KERNEL_DS. Bail out early if that happens. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-12-22 23:03:42 -05:00
Paul Burton	f8630bd7e2	scsi: sg: Use mult_frac, drop MULDIV macro The MULDIV macro is essentially a duplicate of the more standard mult_frac macro. Replace use of MULDIV with mult_frac & drop the duplication. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-08-30 22:18:59 -04:00
Paul Burton	b9b6e80ad3	scsi: sg: Avoid overflow when USER_HZ > HZ Calculating the maximum timeout that a user can set via the SG_SET_TIMEOUT ioctl involves multiplying INT_MAX by USER_HZ/HZ. If USER_HZ is larger than HZ then this results in an overflow when performed as a 32 bit integer calculation, resulting in compiler warnings such as the following: drivers/scsi/sg.c: In function 'sg_ioctl': drivers/scsi/sg.c:91:67: warning: integer overflow in expression [-Woverflow] #define MULDIV(X,MUL,DIV) ((((X % DIV) * MUL) / DIV) + ((X / DIV) * MUL)) ^ drivers/scsi/sg.c:887:14: note: in expansion of macro 'MULDIV' if (val >= MULDIV (INT_MAX, USER_HZ, HZ)) ^ drivers/scsi/sg.c:91:67: warning: integer overflow in expression [-Woverflow] #define MULDIV(X,MUL,DIV) ((((X % DIV) * MUL) / DIV) + ((X / DIV) * MUL)) ^ drivers/scsi/sg.c:888:13: note: in expansion of macro 'MULDIV' val = MULDIV (INT_MAX, USER_HZ, HZ); ^ Avoid this overflow by performing the (constant) arithmetic on 64 bit integers, which ensures that overflow from multiplying the 32 bit values cannot occur. When converting the result back to a 32 bit integer use min_t to ensure that we don't simply truncate a value beyond INT_MAX to a 32 bit integer, but instead use INT_MAX where the result was larger than it. As the values are all compile time constant the 64 bit arithmetic should have no runtime cost. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-08-30 22:18:10 -04:00
Douglas Gilbert	5ecee0a3ee	sg: fix dxferp in from_to case One of the strange things that the original sg driver did was let the user provide both a data-out buffer (it followed the sg_header+cdb) _and_ specify a reply length greater than zero. What happened was that the user data-out buffer was copied into some kernel buffers and then the mid level was told a read type operation would take place with the data from the device overwriting the same kernel buffers. The user would then read those kernel buffers back into the user space. From what I can tell, the above action was broken by commit `fad7f01e61` ("sg: set dxferp to NULL for READ with the older SG interface") in 2008 and syzkaller found that out recently. Make sure that a user space pointer is passed through when data follows the sg_header structure and command. Fix the abnormal case when a non-zero reply_len is also given. Fixes: `fad7f01e61` Cc: <stable@vger.kernel.org> #v2.6.28+ Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Ewan Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2016-03-09 20:41:04 -05:00
Kirill A. Shutemov	461c7fa126	drivers/scsi/sg.c: mark VMA as VM_IO to prevent migration Reduced testcase: #include <fcntl.h> #include <unistd.h> #include <sys/mman.h> #include <numaif.h> #define SIZE 0x2000 int main() { int fd; void *p; fd = open("/dev/sg0", O_RDWR); p = mmap(NULL, SIZE, PROT_EXEC, MAP_PRIVATE \| MAP_LOCKED, fd, 0); mbind(p, SIZE, 0, NULL, 0, MPOL_MF_MOVE); return 0; } We shouldn't try to migrate pages in sg VMA as we don't have a way to update Sg_scatter_hold::pages accordingly from mm core. Let's mark the VMA as VM_IO to indicate to mm core that the VMA is not migratable. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Doug Gilbert <dgilbert@interlog.com> Cc: David Rientjes <rientjes@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Shiraz Hashim <shashim@codeaurora.org> Cc: Hugh Dickins <hughd@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: syzkaller <syzkaller@googlegroups.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-02-03 08:28:43 -08:00
Calvin Owens	f3951a3709	sg: Fix double-free when drives detach during SG_IO In sg_common_write(), we free the block request and return -ENODEV if the device is detached in the middle of the SG_IO ioctl(). Unfortunately, sg_finish_rem_req() also tries to free srp->rq, so we end up freeing rq->cmd in the already free rq object, and then free the object itself out from under the current user. This ends up corrupting random memory via the list_head on the rq object. The most common crash trace I saw is this: ------------[ cut here ]------------ kernel BUG at block/blk-core.c:1420! Call Trace: [<ffffffff81281eab>] blk_put_request+0x5b/0x80 [<ffffffffa0069e5b>] sg_finish_rem_req+0x6b/0x120 [sg] [<ffffffffa006bcb9>] sg_common_write.isra.14+0x459/0x5a0 [sg] [<ffffffff8125b328>] ? selinux_file_alloc_security+0x48/0x70 [<ffffffffa006bf95>] sg_new_write.isra.17+0x195/0x2d0 [sg] [<ffffffffa006cef4>] sg_ioctl+0x644/0xdb0 [sg] [<ffffffff81170f80>] do_vfs_ioctl+0x90/0x520 [<ffffffff81258967>] ? file_has_perm+0x97/0xb0 [<ffffffff811714a1>] SyS_ioctl+0x91/0xb0 [<ffffffff81602afb>] tracesys+0xdd/0xe2 RIP [<ffffffff81281e04>] __blk_put_request+0x154/0x1a0 The solution is straightforward: just set srp->rq to NULL in the failure branch so that sg_finish_rem_req() doesn't attempt to re-free it. Additionally, since sg_rq_end_io() will never be called on the object when this happens, we need to free memory backing ->cmd if it isn't embedded in the object itself. KASAN was extremely helpful in finding the root cause of this bug. Signed-off-by: Calvin Owens <calvinowens@fb.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2015-11-02 23:51:25 -05:00
Al Viro	fdc81f45e9	sg_start_req(): use import_iovec() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:27:14 -04:00
Al Viro	451a2886b6	sg_start_req(): make sure that there's not too many elements in iovec unfortunately, allowing an arbitrary 16bit value means a possibility of overflow in the calculation of total number of pages in bio_map_user_iov() - we rely on there being no more than PAGE_SIZE members of sum in the first loop there. If that sum wraps around, we end up allocating too small array of pointers to pages and it's easy to overflow it in the second loop. X-Coverup: TINC (and there's no lumber cartel either) Cc: stable@vger.kernel.org # way, way back Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-11 22:27:13 -04:00
Al Viro	c0fec3a98b	Merge branch 'iocb' into for-next	2015-04-11 22:24:41 -04:00
Christoph Hellwig	e2e40f2c1e	fs: move struct kiocb to fs.h struct kiocb now is a generic I/O container, so move it to fs.h. Also do a #include diet for aio.h while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-03-25 20:28:11 -04:00
Linus Torvalds	c8c6c9ba39	SCSI misc on 20150221 This is a short patch set representing a couple of left overs from the merge window (debug leftover removal and MAINTAINER changes) plus one merge window regression (the local workqueue for hpsa) and a set of bug fixes for several issues (two for scsi-mq and the rest an assortment of long standing stuff, all cc'd to stable). Signed-off-by: James Bottomley <JBottomley@Parallels.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABAgAGBQJU6UPVAAoJEDeqqVYsXL0MsjcIAKRGhJQf8PAprBC/vByJcysJ 91VnXQcJb7Ypqicj6rpkRNX+5UpehLcWIVL0E1Q4KHdirvQv3b6icXhGmntyZdYZ URlhqDxKo9+Z+tNoeqVPNenSvVSAlfMNBRXfTo+oo1hpPUz5VrySmpmgEOuJrzXF qb1FMnRXebIFIo60QUA/7n+3zDBFZXW/IBY5lLO9/v7+fTe8wh5qNvXvf7DiOJ56 qPkWNpJC5vDyOHwTHYK+aM8kl5/x777DU/sx5ajitlyrH1cD9d69Zjj70IKo3P7G Y5dQA14kRnLJc5xnwBztHguESwGTnDCSti1owg0CvJWUZlcjxYkY/iXd8rAMGWc= =P5NR -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull misc SCSI patches from James Bottomley: "This is a short patch set representing a couple of left overs from the merge window (debug removal and MAINTAINER changes). Plus one merge window regression (the local workqueue for hpsa) and a set of bug fixes for several issues (two for scsi-mq and the rest an assortment of long standing stuff, all cc'd to stable)" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: sg: fix EWOULDBLOCK errors with scsi-mq sg: fix unkillable I/O wait deadlock with scsi-mq sg: fix read() error reporting wd719x: add missing .module to wd719x_template hpsa: correct compiler warnings introduced by hpsa-add-local-workqueue patch fixed invalid assignment of 64bit mask to host dma_boundary for scatter gather segment boundary limit. fcoe: Transition maintainership to Vasu am53c974: remove left-over debugging code	2015-02-21 19:16:42 -08:00
Tony Battersby	7772855a99	sg: fix EWOULDBLOCK errors with scsi-mq With scsi-mq enabled, userspace programs can get unexpected EWOULDBLOCK (a.k.a. EAGAIN) errors when submitting commands to the SCSI generic driver. Fix by calling blk_get_request() with GFP_KERNEL instead of GFP_ATOMIC. Note: to avoid introducing a potential deadlock, this patch should be applied after the patch titled "sg: fix unkillable I/O wait deadlock with scsi-mq". Cc: <stable@vger.kernel.org> # 3.17+ Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2015-02-17 06:57:54 -08:00
Tony Battersby	7568615c10	sg: fix unkillable I/O wait deadlock with scsi-mq When using the write()/read() interface for submitting commands, the SCSI generic driver does not call blk_put_request() on a completed SCSI command until userspace calls read() to get the command completion. Since scsi-mq uses a fixed number of preallocated requests, this makes it possible for userspace to exhaust the entire preallocated supply of requests. For places in the kernel that call blk_get_request() with GFP_KERNEL, this can cause the calling process to deadlock in a permanent unkillable I/O wait in blk_get_request() -> ... -> bt_get(). For places in the kernel that call blk_get_request() with GFP_ATOMIC, this can cause blk_get_request() always to return -EWOULDBLOCK. Note that these problems happen only if scsi-mq is enabled. Prevent the problems by calling blk_put_request() as soon as the SCSI command completes instead of waiting for userspace to call read(). Cc: <stable@vger.kernel.org> # 3.17+ Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2015-02-17 06:55:32 -08:00
Tony Battersby	3b524a683a	sg: fix read() error reporting Fix SCSI generic read() incorrectly returning success after detecting an error. Cc: <stable@vger.kernel.org> Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2015-02-15 10:36:55 -08:00
Linus Torvalds	3e12cefbe1	Merge branch 'for-3.20/core' of git://git.kernel.dk/linux-block Pull core block IO changes from Jens Axboe: "This contains: - A series from Christoph that cleans up and refactors various parts of the REQ_BLOCK_PC handling. Contributions in that series from Dongsu Park and Kent Overstreet as well. - CFQ: - A bug fix for cfq for realtime IO scheduling from Jeff Moyer. - A stable patch fixing a potential crash in CFQ in OOM situations. From Konstantin Khlebnikov. - blk-mq: - Add support for tag allocation policies, from Shaohua. This is a prep patch enabling libata (and other SCSI parts) to use the blk-mq tagging, instead of rolling their own. - Various little tweaks from Keith and Mike, in preparation for DM blk-mq support. - Minor little fixes or tweaks from me. - A double free error fix from Tony Battersby. - The partition 4k issue fixes from Matthew and Boaz. - Add support for zero+unprovision for blkdev_issue_zeroout() from Martin" * 'for-3.20/core' of git://git.kernel.dk/linux-block: (27 commits) block: remove unused function blk_bio_map_sg block: handle the null_mapped flag correctly in blk_rq_map_user_iov blk-mq: fix double-free in error path block: prevent request-to-request merging with gaps if not allowed blk-mq: make blk_mq_run_queues() static dm: fix multipath regression due to initializing wrong request cfq-iosched: handle failure of cfq group allocation block: Quiesce zeroout wrapper block: rewrite and split __bio_copy_iov() block: merge __bio_map_user_iov into bio_map_user_iov block: merge __bio_map_kern into bio_map_kern block: pass iov_iter to the BLOCK_PC mapping functions block: add a helper to free bio bounce buffer pages block: use blk_rq_map_user_iov to implement blk_rq_map_user block: simplify bio_map_kern block: mark blk-mq devices as stackable block: keep established cmd_flags when cloning into a blk-mq request block: add blk-mq support to blk_insert_cloned_request() block: require blk_rq_prep_clone() be given an initialized clone request blk-mq: add tag allocation policy ...	2015-02-12 14:13:23 -08:00
Kent Overstreet	26e49cfc7e	block: pass iov_iter to the BLOCK_PC mapping functions Make use of a new interface provided by iov_iter, backed by scatter-gather list of iovec, instead of the old interface based on sg_iovec. Also use iov_iter_advance() instead of manual iteration. This commit should contain only literal replacements, without functional changes. Cc: Christoph Hellwig <hch@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Doug Gilbert <dgilbert@interlog.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com> [dpark: add more description in commit message] Signed-off-by: Dongsu Park <dongsu.park@profitbricks.com> [hch: fixed to do a deep clone of the iov_iter, and to properly use the iov_iter direction] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-02-05 09:30:40 -07:00
Bart Van Assche	5af2e38242	sg: remove an unused variable The 'data_dir' variable is not used in sg_common_write(), hence remove this variable. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2015-02-02 09:57:44 -08:00
Christoph Hellwig	906d15fbd2	scsi: split scsi_nonblockable_ioctl The calling conventions for this function are bad as it could return -ENODEV both for a device not currently online and a not recognized ioctl. Add a new scsi_ioctl_block_when_processing_errors function that wraps scsi_block_when_processing_errors with the a special case for the SG_SCSI_RESET ioctl command, and handle the SG_SCSI_RESET case itself in scsi_ioctl. All callers of scsi_ioctl now must call the above helper to check for the EH state, so that the ioctl handler itself doesn't have to. Reported-by: Robert Elliott <Elliott@hp.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de>	2014-11-12 11:16:11 +01:00
Christoph Hellwig	176aa9d6ee	scsi: refactor scsi_reset_provider handling Pull the common code from the two callers into the function, and rename it to scsi_ioctl_reset. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de>	2014-11-12 11:16:10 +01:00
Hannes Reinecke	d811b848eb	scsi: use sdev as argument for sense code printing We should be using the standard dev_printk() variants for sense code printing. [hch: remove __scsi_print_sense call in xen-scsiback, Acked by Juergen] [hch: folded bracing fix from Dan Carpenter] Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Robert Elliott <elliott@hp.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-11-12 11:15:58 +01:00
Hannes Reinecke	22e0d99415	scsi: introduce sdev_prefix_printk() Like scmd_printk(), but the device name is passed in as a string. Can be used by eg ULDs which do not have access to the scsi_cmnd structure. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Robert Elliott <elliott@hp.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-11-12 11:15:57 +01:00
Douglas Gilbert	26cf591e6d	scsi: add SG_SCSI_RESET_NO_ESCALATE flag to SG_SCSI_RESET ioctl Further to a January 2013 thread titled: "[PATCH] SG_SCSI_RESET ioctl should only perform requested operation" by Jeremy Linton a patch (v3) is presented that expands the existing ioctl to include "no_escalate" versions to the existing resets. This requires no changes to SCSI low level drivers (LLDs); it adds several more finely tuned reset options to the user space. For example: /* This call remains the same, with the same escalating semantics * if the device (LU) reset fail. That is: on failure to try a * target reset and if that fails, try a bus reset, and if that fails * try a host (i.e. LLD) reset. / val = SG_SCSI_RESET_DEVICE; res = ioctl(<sg_or_block_fd>, SG_SCSI_RESET, &val); / What follows is a new option introduced by this patch series. Only * a device reset is attempted. If that fails then an appropriate * error code is provided. N.B. There is no reset escalation. */ val = SG_SCSI_RESET_DEVICE \| SG_SCSI_RESET_NO_ESCALATE; res = ioctl(<sg_or_block_fd>, SG_SCSI_RESET, &val); Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Jeremy Linton <jlinton@tributary.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-11-12 11:15:54 +01:00
Joe Lawrence	a492f07545	block,scsi: fixup blk_get_request dead queue scenarios The blk_get_request function may fail in low-memory conditions or during device removal (even if __GFP_WAIT is set). To distinguish between these errors, modify the blk_get_request call stack to return the appropriate ERR_PTR. Verify that all callers check the return status and consider IS_ERR instead of a simple NULL pointer check. For consistency, make a similar change to the blk_mq_alloc_request leg of blk_get_request. It may fail if the queue is dead, or the caller was unwilling to wait. Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com> Acked-by: Jiri Kosina <jkosina@suse.cz> [for pktdvd] Acked-by: Boaz Harrosh <bharrosh@panasas.com> [for osd] Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-08-28 10:03:46 -06:00
Christoph Hellwig	71e75c97f9	scsi: convert device_busy to atomic_t Avoid taking the queue_lock to check the per-device queue limit. Instead we do an atomic_inc_return early on to grab our slot in the queue, and if necessary decrement it after finishing all checks. Unlike the host and target busy counters this doesn't allow us to avoid the queue_lock in the request_fn due to the way the interface works, but it'll allow us to prepare for using the blk-mq code, which doesn't use the queue_lock at all, and it at least avoids a queue_lock round trip in scsi_device_unbusy, which is still important given how busy the queue_lock is. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>	2014-07-25 07:43:45 -04:00
Hannes Reinecke	95e159d6dd	scsi: Implement sg_printk() Update the sg driver to use dev_printk() variants instead of plain printk(); this will prefix logging messages with the appropriate device. Signed-off-by: Hannes Reinecke <hare@suse.de> Acked-by: Doug Gilbert <dgilbert@interlog.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-07-17 22:07:40 +02:00
Hannes Reinecke	9cb78c16f5	scsi: use 64-bit LUNs The SCSI standard defines 64-bit values for LUNs, and large arrays employing large or hierarchical LUN numbers become more and more common. So update the linux SCSI stack to use 64-bit LUN numbers. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Ewan Milne <emilne@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-07-17 22:07:37 +02:00
Douglas Gilbert	cc833acbee	sg: O_EXCL and other lock handling This addresses a problem reported by Vaughan Cao concerning the correctness of the O_EXCL logic in the sg driver. POSIX doesn't defined O_EXCL semantics on devices but "allow only one open file descriptor at a time per sg device" is a rough definition. The sg driver's semantics have been to wait on an open() when O_NONBLOCK is not given and there are O_EXCL headwinds. Nasty things can happen during that wait such as the device being detached (removed). So multiple locks are reworked in this patch making it large and hard to break down into digestible bits. This patch is against Linus's current git repository which doesn't include any sg patches sent in the last few weeks. Hence this patch touches as little as possible that it doesn't need to and strips out most SCSI_LOG_TIMEOUT() changes in v3 because Hannes said he was going to rework all that stuff. The sg3_utils package has several test programs written to test this patch. See examples/sg_tst_excl*.cpp . Not all the locks and flags in sg have been re-worked in this patch, notably sg_request::done . That can wait for a follow-up patch if this one meets with approval. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Hannes Reinecke <hare@suse.de>	2014-07-17 22:07:34 +02:00
Douglas Gilbert	16070cc189	sg: add SG_FLAG_Q_AT_TAIL flag When the SG_IO ioctl was copied into the block layer and later into the bsg driver, subtle differences emerged. One difference is the way injected commands are queued through the block layer (i.e. this is not SCSI device queueing nor SATA NCQ). Summarizing: - SG_IO in the block layer: blk_exec*(at_head=false) - sg SG_IO: at_head=true - bsg SG_IO: at_head=true Some time ago Boaz Harrosh introduced a sg v4 flag called BSG_FLAG_Q_AT_TAIL to override the bsg driver default. This patch does the equivalent for the sg driver. ChangeLog: Introduce SG_FLAG_Q_AT_TAIL flag to cause commands to be injected into the block layer with at_head=false. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-07-17 22:07:34 +02:00
Douglas Gilbert	65c26a0f39	sg: relax 16 byte cdb restriction - remove the 16 byte CDB (SCSI command) length limit from the sg driver by handling longer CDBs the same way as the bsg driver. Remove comment from sg.h public interface about the cmd_len field being limited to 16 bytes. - remove some dead code caused by this change - cleanup comment block at the top of sg.h, fix urls Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-07-17 22:07:33 +02:00
Akinobu Mita	46f69e6a6b	sg: prevent integer overflow when converting from sectors to bytes This prevents integer overflow when converting the request queue's max_sectors from sectors to bytes. However, this is a preparation for extending the data type of max_sectors in struct Scsi_Host and scsi_host_template. So, it is impossible to happen this integer overflow for now, because SCSI low-level drivers can not specify max_sectors greater than 0xffff due to the data type limitation. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-07-17 22:07:29 +02:00
Jens Axboe	f27b087b81	block: add blk_rq_set_block_pc() With the optimizations around not clearing the full request at alloc time, we are leaving some of the needed init for REQ_TYPE_BLOCK_PC up to the user allocating the request. Add a blk_rq_set_block_pc() that sets the command type to REQ_TYPE_BLOCK_PC, and properly initializes the members associated with this type of request. Update callers to use this function instead of manipulating rq->cmd_type directly. Includes fixes from Christoph Hellwig <hch@lst.de> for my half-assed attempt. Signed-off-by: Jens Axboe <axboe@fb.com>	2014-06-06 07:57:37 -06:00
James Bottomley	065b4a2f59	[SCSI] Revert "sg: use rwsem to solve race during exclusive open" This reverts commit `15b06f9a02`. This is one of four patches that was causing this bug [ 205.372823] ================================================ [ 205.372901] [ BUG: lock held when returning to user space! ] [ 205.372979] 3.12.0-rc6-hw-debug-pagealloc+ #67 Not tainted [ 205.373055] ------------------------------------------------ [ 205.373132] megarc.bin/5283 is leaving the kernel with locks still held! [ 205.373212] 1 lock held by megarc.bin/5283: [ 205.373285] #0: (&sdp->o_sem){.+.+..}, at: [<ffffffff8161e650>] sg_open+0x3a0/0x4d0 Cc: Vaughan Cao <vaughan.cao@oracle.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-10-25 10:59:54 +01:00
James Bottomley	98481ff0bb	[SCSI] Revert "sg: no need sg_open_exclusive_lock" This reverts commit `00b2d9d6d0`. This is one of four patches that was causing this bug [ 205.372823] ================================================ [ 205.372901] [ BUG: lock held when returning to user space! ] [ 205.372979] 3.12.0-rc6-hw-debug-pagealloc+ #67 Not tainted [ 205.373055] ------------------------------------------------ [ 205.373132] megarc.bin/5283 is leaving the kernel with locks still held! [ 205.373212] 1 lock held by megarc.bin/5283: [ 205.373285] #0: (&sdp->o_sem){.+.+..}, at: [<ffffffff8161e650>] sg_open+0x3a0/0x4d0 Cc: Vaughan Cao <vaughan.cao@oracle.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-10-25 10:59:32 +01:00
James Bottomley	bafc8ad82d	[SCSI] Revert "sg: checking sdp->detached isn't protected when open" This reverts commit `e32c9e6300`. This is one of four patches that was causing this bug [ 205.372823] ================================================ [ 205.372901] [ BUG: lock held when returning to user space! ] [ 205.372979] 3.12.0-rc6-hw-debug-pagealloc+ #67 Not tainted [ 205.373055] ------------------------------------------------ [ 205.373132] megarc.bin/5283 is leaving the kernel with locks still held! [ 205.373212] 1 lock held by megarc.bin/5283: [ 205.373285] #0: (&sdp->o_sem){.+.+..}, at: [<ffffffff8161e650>] sg_open+0x3a0/0x4d0 Cc: Vaughan Cao <vaughan.cao@oracle.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-10-25 10:59:02 +01:00
James Bottomley	c0d3b9c29e	[SCSI] Revert "sg: push file descriptor list locking down to per-device locking" This reverts commit `1f962ebcdf`. This is one of four patches that was causing this bug [ 205.372823] ================================================ [ 205.372901] [ BUG: lock held when returning to user space! ] [ 205.372979] 3.12.0-rc6-hw-debug-pagealloc+ #67 Not tainted [ 205.373055] ------------------------------------------------ [ 205.373132] megarc.bin/5283 is leaving the kernel with locks still held! [ 205.373212] 1 lock held by megarc.bin/5283: [ 205.373285] #0: (&sdp->o_sem){.+.+..}, at: [<ffffffff8161e650>] sg_open+0x3a0/0x4d0 Cc: Vaughan Cao <vaughan.cao@oracle.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-10-25 10:58:07 +01:00
Vaughan Cao	1f962ebcdf	[SCSI] sg: push file descriptor list locking down to per-device locking Push file descriptor list locking down to per-device locking. Let sg_index_lock only protect device lookup. sdp->detached is also set and checked with this lock held. Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-09-03 07:28:10 -07:00
Vaughan Cao	e32c9e6300	[SCSI] sg: checking sdp->detached isn't protected when open @detached is set under the protection of sg_index_lock. Without getting the lock, new sfp will be added during sg removal and there is no chance for it to be picked out. So check with sg_index_lock held in sg_add_sfp(). Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-09-03 07:28:09 -07:00
Vaughan Cao	00b2d9d6d0	[SCSI] sg: no need sg_open_exclusive_lock Open exclusive check is protected by o_sem, no need sg_open_exclusive_lock. @exclude is used to record which type of rwsem we are holding. Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-09-03 07:28:09 -07:00
Vaughan Cao	15b06f9a02	[SCSI] sg: use rwsem to solve race during exclusive open A race condition may happen if two threads are both trying to open the same sg with O_EXCL simultaneously. It's possible that they both find fsds list is empty and get_exclude(sdp) returns 0, then they both call set_exclude() and break out from wait_event_interruptible and resume open. Now use rwsem to protect this process. Exclusive open gets write lock and others get read lock. The lock will be held until file descriptor is closed. This also leads 'exclude' only a status rather than a check mark. Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-09-03 07:28:09 -07:00
Kent Overstreet	a27bb332c0	aio: don't include aio.h in sched.h Faster kernel compiles by way of fewer unnecessary includes. [akpm@linux-foundation.org: fix fallout] [akpm@linux-foundation.org: fix build] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-05-07 20:16:25 -07:00
Tejun Heo	b98c52b572	scsi: convert to idr_alloc() Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:18 -08:00
Konstantin Khlebnikov	314e51b985	mm: kill vma flag VM_RESERVED and mm->reserved_vm counter A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA, currently it lost original meaning but still has some effects: \| effect \| alternative flags -+------------------------+--------------------------------------------- 1\| account as reserved_vm \| VM_IO 2\| skip in core dump \| VM_IO, VM_DONTDUMP 3\| do not merge or expand \| VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP 4\| do not mlock \| VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP This patch removes reserved_vm counter from mm_struct. Seems like nobody cares about it, it does not exported into userspace directly, it only reduces total_vm showed in proc. Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND \| VM_DONTDUMP. remap_pfn_range() and io_remap_pfn_range() set VM_IO\|VM_DONTEXPAND\|VM_DONTDUMP. remap_vmalloc_range() set VM_DONTEXPAND \| VM_DONTDUMP. [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:19 +09:00
Jörn Engel	18b8ba6cfb	[SCSI] sg: constify sg_proc_leaf_arr Signed-off-by: Joern Engel <joern@logfs.org> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-05-17 10:08:57 +01:00

1 2 3 4

197 Commits