Commit Graph

191 Commits

Author SHA1 Message Date
Christoph Hellwig 82ed4db499 block: split scsi_request out of struct request
And require all drivers that want to support BLOCK_PC to allocate it
as the first thing of their private data.  To support this the legacy
IDE and BSG code is switched to set cmd_size on their queues to let
the block layer allocate the additional space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27 15:08:35 -07:00
Al Viro 128394eff3 sg_write()/bsg_write() is not fit to be called under KERNEL_DS
Both damn things interpret userland pointers embedded into the payload;
worse, they are actually traversing those.  Leaving aside the bad
API design, this is very much _not_ safe to call with KERNEL_DS.
Bail out early if that happens.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22 23:03:42 -05:00
Paul Burton f8630bd7e2 scsi: sg: Use mult_frac, drop MULDIV macro
The MULDIV macro is essentially a duplicate of the more standard
mult_frac macro. Replace use of MULDIV with mult_frac & drop the
duplication.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-08-30 22:18:59 -04:00
Paul Burton b9b6e80ad3 scsi: sg: Avoid overflow when USER_HZ > HZ
Calculating the maximum timeout that a user can set via the
SG_SET_TIMEOUT ioctl involves multiplying INT_MAX by USER_HZ/HZ. If
USER_HZ is larger than HZ then this results in an overflow when
performed as a 32 bit integer calculation, resulting in compiler
warnings such as the following:

  drivers/scsi/sg.c: In function 'sg_ioctl':
  drivers/scsi/sg.c:91:67: warning: integer overflow in expression [-Woverflow]
   #define MULDIV(X,MUL,DIV) ((((X % DIV) * MUL) / DIV) + ((X / DIV) * MUL))
                                                                     ^
  drivers/scsi/sg.c:887:14: note: in expansion of macro 'MULDIV'
     if (val >= MULDIV (INT_MAX, USER_HZ, HZ))
                ^
  drivers/scsi/sg.c:91:67: warning: integer overflow in expression [-Woverflow]
   #define MULDIV(X,MUL,DIV) ((((X % DIV) * MUL) / DIV) + ((X / DIV) * MUL))
                                                                     ^
  drivers/scsi/sg.c:888:13: note: in expansion of macro 'MULDIV'
         val = MULDIV (INT_MAX, USER_HZ, HZ);
               ^

Avoid this overflow by performing the (constant) arithmetic on 64 bit
integers, which ensures that overflow from multiplying the 32 bit values
cannot occur. When converting the result back to a 32 bit integer use
min_t to ensure that we don't simply truncate a value beyond INT_MAX to
a 32 bit integer, but instead use INT_MAX where the result was larger
than it. As the values are all compile time constant the 64 bit
arithmetic should have no runtime cost.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-08-30 22:18:10 -04:00
Douglas Gilbert 5ecee0a3ee sg: fix dxferp in from_to case
One of the strange things that the original sg driver did was let the
user provide both a data-out buffer (it followed the sg_header+cdb)
_and_ specify a reply length greater than zero. What happened was that
the user data-out buffer was copied into some kernel buffers and then
the mid level was told a read type operation would take place with the
data from the device overwriting the same kernel buffers. The user would
then read those kernel buffers back into the user space.

From what I can tell, the above action was broken by commit fad7f01e61
("sg: set dxferp to NULL for READ with the older SG interface") in 2008
and syzkaller found that out recently.

Make sure that a user space pointer is passed through when data follows
the sg_header structure and command.  Fix the abnormal case when a
non-zero reply_len is also given.

Fixes: fad7f01e61
Cc: <stable@vger.kernel.org> #v2.6.28+
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-03-09 20:41:04 -05:00
Kirill A. Shutemov 461c7fa126 drivers/scsi/sg.c: mark VMA as VM_IO to prevent migration
Reduced testcase:

    #include <fcntl.h>
    #include <unistd.h>
    #include <sys/mman.h>
    #include <numaif.h>

    #define SIZE 0x2000

    int main()
    {
        int fd;
        void *p;

        fd = open("/dev/sg0", O_RDWR);
        p = mmap(NULL, SIZE, PROT_EXEC, MAP_PRIVATE | MAP_LOCKED, fd, 0);
        mbind(p, SIZE, 0, NULL, 0, MPOL_MF_MOVE);
        return 0;
    }

We shouldn't try to migrate pages in sg VMA as we don't have a way to
update Sg_scatter_hold::pages accordingly from mm core.

Let's mark the VMA as VM_IO to indicate to mm core that the VMA is not
migratable.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Shiraz Hashim <shashim@codeaurora.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-03 08:28:43 -08:00
Calvin Owens f3951a3709 sg: Fix double-free when drives detach during SG_IO
In sg_common_write(), we free the block request and return -ENODEV if
the device is detached in the middle of the SG_IO ioctl().

Unfortunately, sg_finish_rem_req() also tries to free srp->rq, so we
end up freeing rq->cmd in the already free rq object, and then free
the object itself out from under the current user.

This ends up corrupting random memory via the list_head on the rq
object. The most common crash trace I saw is this:

  ------------[ cut here ]------------
  kernel BUG at block/blk-core.c:1420!
  Call Trace:
  [<ffffffff81281eab>] blk_put_request+0x5b/0x80
  [<ffffffffa0069e5b>] sg_finish_rem_req+0x6b/0x120 [sg]
  [<ffffffffa006bcb9>] sg_common_write.isra.14+0x459/0x5a0 [sg]
  [<ffffffff8125b328>] ? selinux_file_alloc_security+0x48/0x70
  [<ffffffffa006bf95>] sg_new_write.isra.17+0x195/0x2d0 [sg]
  [<ffffffffa006cef4>] sg_ioctl+0x644/0xdb0 [sg]
  [<ffffffff81170f80>] do_vfs_ioctl+0x90/0x520
  [<ffffffff81258967>] ? file_has_perm+0x97/0xb0
  [<ffffffff811714a1>] SyS_ioctl+0x91/0xb0
  [<ffffffff81602afb>] tracesys+0xdd/0xe2
    RIP [<ffffffff81281e04>] __blk_put_request+0x154/0x1a0

The solution is straightforward: just set srp->rq to NULL in the
failure branch so that sg_finish_rem_req() doesn't attempt to re-free
it.

Additionally, since sg_rq_end_io() will never be called on the object
when this happens, we need to free memory backing ->cmd if it isn't
embedded in the object itself.

KASAN was extremely helpful in finding the root cause of this bug.

Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-02 23:51:25 -05:00
Al Viro fdc81f45e9 sg_start_req(): use import_iovec()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:27:14 -04:00
Al Viro 451a2886b6 sg_start_req(): make sure that there's not too many elements in iovec
unfortunately, allowing an arbitrary 16bit value means a possibility of
overflow in the calculation of total number of pages in bio_map_user_iov() -
we rely on there being no more than PAGE_SIZE members of sum in the
first loop there.  If that sum wraps around, we end up allocating
too small array of pointers to pages and it's easy to overflow it in
the second loop.

X-Coverup: TINC (and there's no lumber cartel either)
Cc: stable@vger.kernel.org # way, way back
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:27:13 -04:00
Al Viro c0fec3a98b Merge branch 'iocb' into for-next 2015-04-11 22:24:41 -04:00
Christoph Hellwig e2e40f2c1e fs: move struct kiocb to fs.h
struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-25 20:28:11 -04:00
Linus Torvalds c8c6c9ba39 SCSI misc on 20150221
This is a short patch set representing a couple of left overs from the merge
 window (debug leftover removal and MAINTAINER changes) plus one merge window
 regression (the local workqueue for hpsa) and a set of bug fixes for several
 issues (two for scsi-mq and the rest an assortment of long standing stuff, all
 cc'd to stable).
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJU6UPVAAoJEDeqqVYsXL0MsjcIAKRGhJQf8PAprBC/vByJcysJ
 91VnXQcJb7Ypqicj6rpkRNX+5UpehLcWIVL0E1Q4KHdirvQv3b6icXhGmntyZdYZ
 URlhqDxKo9+Z+tNoeqVPNenSvVSAlfMNBRXfTo+oo1hpPUz5VrySmpmgEOuJrzXF
 qb1FMnRXebIFIo60QUA/7n+3zDBFZXW/IBY5lLO9/v7+fTe8wh5qNvXvf7DiOJ56
 qPkWNpJC5vDyOHwTHYK+aM8kl5/x777DU/sx5ajitlyrH1cD9d69Zjj70IKo3P7G
 Y5dQA14kRnLJc5xnwBztHguESwGTnDCSti1owg0CvJWUZlcjxYkY/iXd8rAMGWc=
 =P5NR
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull misc SCSI patches from James Bottomley:
 "This is a short patch set representing a couple of left overs from the
  merge window (debug removal and MAINTAINER changes).

  Plus one merge window regression (the local workqueue for hpsa) and a
  set of bug fixes for several issues (two for scsi-mq and the rest an
  assortment of long standing stuff, all cc'd to stable)"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  sg: fix EWOULDBLOCK errors with scsi-mq
  sg: fix unkillable I/O wait deadlock with scsi-mq
  sg: fix read() error reporting
  wd719x: add missing .module to wd719x_template
  hpsa: correct compiler warnings introduced by hpsa-add-local-workqueue patch
  fixed invalid assignment of 64bit mask to host dma_boundary for scatter gather segment boundary limit.
  fcoe: Transition maintainership to Vasu
  am53c974: remove left-over debugging code
2015-02-21 19:16:42 -08:00
Tony Battersby 7772855a99 sg: fix EWOULDBLOCK errors with scsi-mq
With scsi-mq enabled, userspace programs can get unexpected EWOULDBLOCK
(a.k.a. EAGAIN) errors when submitting commands to the SCSI generic
driver.  Fix by calling blk_get_request() with GFP_KERNEL instead of
GFP_ATOMIC.

Note: to avoid introducing a potential deadlock, this patch should be
applied after the patch titled "sg: fix unkillable I/O wait deadlock
with scsi-mq".

Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2015-02-17 06:57:54 -08:00
Tony Battersby 7568615c10 sg: fix unkillable I/O wait deadlock with scsi-mq
When using the write()/read() interface for submitting commands, the
SCSI generic driver does not call blk_put_request() on a completed SCSI
command until userspace calls read() to get the command completion.
Since scsi-mq uses a fixed number of preallocated requests, this makes
it possible for userspace to exhaust the entire preallocated supply of
requests.  For places in the kernel that call blk_get_request() with
GFP_KERNEL, this can cause the calling process to deadlock in a
permanent unkillable I/O wait in blk_get_request() -> ... -> bt_get().
For places in the kernel that call blk_get_request() with GFP_ATOMIC,
this can cause blk_get_request() always to return -EWOULDBLOCK.  Note
that these problems happen only if scsi-mq is enabled.  Prevent the
problems by calling blk_put_request() as soon as the SCSI command
completes instead of waiting for userspace to call read().

Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2015-02-17 06:55:32 -08:00
Tony Battersby 3b524a683a sg: fix read() error reporting
Fix SCSI generic read() incorrectly returning success after detecting an
error.

Cc: <stable@vger.kernel.org>
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2015-02-15 10:36:55 -08:00
Linus Torvalds 3e12cefbe1 Merge branch 'for-3.20/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe:
 "This contains:

   - A series from Christoph that cleans up and refactors various parts
     of the REQ_BLOCK_PC handling.  Contributions in that series from
     Dongsu Park and Kent Overstreet as well.

   - CFQ:
        - A bug fix for cfq for realtime IO scheduling from Jeff Moyer.
        - A stable patch fixing a potential crash in CFQ in OOM
          situations.  From Konstantin Khlebnikov.

   - blk-mq:
        - Add support for tag allocation policies, from Shaohua. This is
          a prep patch enabling libata (and other SCSI parts) to use the
          blk-mq tagging, instead of rolling their own.
        - Various little tweaks from Keith and Mike, in preparation for
          DM blk-mq support.
        - Minor little fixes or tweaks from me.
        - A double free error fix from Tony Battersby.

   - The partition 4k issue fixes from Matthew and Boaz.

   - Add support for zero+unprovision for blkdev_issue_zeroout() from
     Martin"

* 'for-3.20/core' of git://git.kernel.dk/linux-block: (27 commits)
  block: remove unused function blk_bio_map_sg
  block: handle the null_mapped flag correctly in blk_rq_map_user_iov
  blk-mq: fix double-free in error path
  block: prevent request-to-request merging with gaps if not allowed
  blk-mq: make blk_mq_run_queues() static
  dm: fix multipath regression due to initializing wrong request
  cfq-iosched: handle failure of cfq group allocation
  block: Quiesce zeroout wrapper
  block: rewrite and split __bio_copy_iov()
  block: merge __bio_map_user_iov into bio_map_user_iov
  block: merge __bio_map_kern into bio_map_kern
  block: pass iov_iter to the BLOCK_PC mapping functions
  block: add a helper to free bio bounce buffer pages
  block: use blk_rq_map_user_iov to implement blk_rq_map_user
  block: simplify bio_map_kern
  block: mark blk-mq devices as stackable
  block: keep established cmd_flags when cloning into a blk-mq request
  block: add blk-mq support to blk_insert_cloned_request()
  block: require blk_rq_prep_clone() be given an initialized clone request
  blk-mq: add tag allocation policy
  ...
2015-02-12 14:13:23 -08:00
Kent Overstreet 26e49cfc7e block: pass iov_iter to the BLOCK_PC mapping functions
Make use of a new interface provided by iov_iter, backed by
scatter-gather list of iovec, instead of the old interface based on
sg_iovec. Also use iov_iter_advance() instead of manual iteration.

This commit should contain only literal replacements, without
functional changes.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <dongsu.park@profitbricks.com>
[hch: fixed to do a deep clone of the iov_iter, and to properly use
      the iov_iter direction]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-02-05 09:30:40 -07:00
Bart Van Assche 5af2e38242 sg: remove an unused variable
The 'data_dir' variable is not used in sg_common_write(), hence
remove this variable.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 09:57:44 -08:00
Christoph Hellwig 906d15fbd2 scsi: split scsi_nonblockable_ioctl
The calling conventions for this function are bad as it could return
-ENODEV both for a device not currently online and a not recognized ioctl.

Add a new scsi_ioctl_block_when_processing_errors function that wraps
scsi_block_when_processing_errors with the a special case for the
SG_SCSI_RESET ioctl command, and handle the SG_SCSI_RESET case itself
in scsi_ioctl.  All callers of scsi_ioctl now must call the above helper
to check for the EH state, so that the ioctl handler itself doesn't
have to.

Reported-by: Robert Elliott <Elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-12 11:16:11 +01:00
Christoph Hellwig 176aa9d6ee scsi: refactor scsi_reset_provider handling
Pull the common code from the two callers into the function,
and rename it to scsi_ioctl_reset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-12 11:16:10 +01:00
Hannes Reinecke d811b848eb scsi: use sdev as argument for sense code printing
We should be using the standard dev_printk() variants for
sense code printing.

[hch: remove __scsi_print_sense call in xen-scsiback, Acked by Juergen]
[hch: folded bracing fix from Dan Carpenter]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 11:15:58 +01:00
Hannes Reinecke 22e0d99415 scsi: introduce sdev_prefix_printk()
Like scmd_printk(), but the device name is passed in as
a string. Can be used by eg ULDs which do not have access
to the scsi_cmnd structure.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 11:15:57 +01:00
Douglas Gilbert 26cf591e6d scsi: add SG_SCSI_RESET_NO_ESCALATE flag to SG_SCSI_RESET ioctl
Further to a January 2013 thread titled: "[PATCH] SG_SCSI_RESET ioctl
should only perform requested operation" by Jeremy Linton a patch (v3)
is presented that expands the existing ioctl to include "no_escalate"
versions to the existing resets. This requires no changes to SCSI low
level drivers (LLDs); it adds several more finely tuned reset options
to the user space. For example:

   /* This call remains the same, with the same escalating semantics
    * if the device (LU) reset fail. That is: on failure to try a
    * target reset and if that fails, try a bus reset, and if that fails
    * try a host (i.e. LLD) reset. */
   val = SG_SCSI_RESET_DEVICE;
   res = ioctl(<sg_or_block_fd>, SG_SCSI_RESET, &val);

   /* What follows is a new option introduced by this patch series. Only
    * a device reset is attempted. If that fails then an appropriate
    * error code is provided. N.B. There is no reset escalation. */
   val = SG_SCSI_RESET_DEVICE | SG_SCSI_RESET_NO_ESCALATE;
   res = ioctl(<sg_or_block_fd>, SG_SCSI_RESET, &val);

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Jeremy Linton <jlinton@tributary.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 11:15:54 +01:00
Joe Lawrence a492f07545 block,scsi: fixup blk_get_request dead queue scenarios
The blk_get_request function may fail in low-memory conditions or during
device removal (even if __GFP_WAIT is set). To distinguish between these
errors, modify the blk_get_request call stack to return the appropriate
ERR_PTR. Verify that all callers check the return status and consider
IS_ERR instead of a simple NULL pointer check.

For consistency, make a similar change to the blk_mq_alloc_request leg
of blk_get_request.  It may fail if the queue is dead, or the caller was
unwilling to wait.

Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
Acked-by: Jiri Kosina <jkosina@suse.cz> [for pktdvd]
Acked-by: Boaz Harrosh <bharrosh@panasas.com> [for osd]
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-28 10:03:46 -06:00
Christoph Hellwig 71e75c97f9 scsi: convert device_busy to atomic_t
Avoid taking the queue_lock to check the per-device queue limit.  Instead
we do an atomic_inc_return early on to grab our slot in the queue,
and if necessary decrement it after finishing all checks.

Unlike the host and target busy counters this doesn't allow us to avoid the
queue_lock in the request_fn due to the way the interface works, but it'll
allow us to prepare for using the blk-mq code, which doesn't use the
queue_lock at all, and it at least avoids a queue_lock round trip in
scsi_device_unbusy, which is still important given how busy the queue_lock
is.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25 07:43:45 -04:00
Hannes Reinecke 95e159d6dd scsi: Implement sg_printk()
Update the sg driver to use dev_printk() variants instead of
plain printk(); this will prefix logging messages with the
appropriate device.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Acked-by: Doug Gilbert <dgilbert@interlog.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:40 +02:00
Hannes Reinecke 9cb78c16f5 scsi: use 64-bit LUNs
The SCSI standard defines 64-bit values for LUNs, and large arrays
employing large or hierarchical LUN numbers become more and more
common.

So update the linux SCSI stack to use 64-bit LUN numbers.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:37 +02:00
Douglas Gilbert cc833acbee sg: O_EXCL and other lock handling
This addresses a problem reported by Vaughan Cao concerning
the correctness of the O_EXCL logic in the sg driver. POSIX
doesn't defined O_EXCL semantics on devices but "allow only
one open file descriptor at a time per sg device" is a rough
definition. The sg driver's semantics have been to wait
on an open() when O_NONBLOCK is not given and there are
O_EXCL headwinds. Nasty things can happen during that wait
such as the device being detached (removed). So multiple
locks are reworked in this patch making it large and hard
to break down into digestible bits.

This patch is against Linus's current git repository which
doesn't include any sg patches sent in the last few weeks.
Hence this patch touches as little as possible that it
doesn't need to and strips out most SCSI_LOG_TIMEOUT()
changes in v3 because Hannes said he was going to rework all
that stuff.

The sg3_utils package has several test programs written to
test this patch. See examples/sg_tst_excl*.cpp .

Not all the locks and flags in sg have been re-worked in
this patch, notably sg_request::done . That can wait for
a follow-up patch if this one meets with approval.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:07:34 +02:00
Douglas Gilbert 16070cc189 sg: add SG_FLAG_Q_AT_TAIL flag
When the SG_IO ioctl was copied into the block layer and
later into the bsg driver, subtle differences emerged.

One difference is the way injected commands are queued through
the block layer (i.e. this is not SCSI device queueing nor SATA
NCQ). Summarizing:
   - SG_IO in the block layer: blk_exec*(at_head=false)
   - sg SG_IO: at_head=true
   - bsg SG_IO: at_head=true

Some time ago Boaz Harrosh introduced a sg v4 flag called
BSG_FLAG_Q_AT_TAIL to override the bsg driver default.
This patch does the equivalent for the sg driver.

ChangeLog:
     Introduce SG_FLAG_Q_AT_TAIL flag to cause commands
     to be injected into the block layer with
     at_head=false.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:34 +02:00
Douglas Gilbert 65c26a0f39 sg: relax 16 byte cdb restriction
- remove the 16 byte CDB (SCSI command) length limit from the sg driver
   by handling longer CDBs the same way as the bsg driver. Remove comment
   from sg.h public interface about the cmd_len field being limited to 16
   bytes.
 - remove some dead code caused by this change
 - cleanup comment block at the top of sg.h, fix urls

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:33 +02:00
Akinobu Mita 46f69e6a6b sg: prevent integer overflow when converting from sectors to bytes
This prevents integer overflow when converting the request queue's
max_sectors from sectors to bytes.  However, this is a preparation for
extending the data type of max_sectors in struct Scsi_Host and
scsi_host_template.  So, it is impossible to happen this integer
overflow for now, because SCSI low-level drivers can not specify
max_sectors greater than 0xffff due to the data type limitation.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:29 +02:00
Jens Axboe f27b087b81 block: add blk_rq_set_block_pc()
With the optimizations around not clearing the full request at alloc
time, we are leaving some of the needed init for REQ_TYPE_BLOCK_PC
up to the user allocating the request.

Add a blk_rq_set_block_pc() that sets the command type to
REQ_TYPE_BLOCK_PC, and properly initializes the members associated
with this type of request. Update callers to use this function instead
of manipulating rq->cmd_type directly.

Includes fixes from Christoph Hellwig <hch@lst.de> for my half-assed
attempt.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-06-06 07:57:37 -06:00
James Bottomley 065b4a2f59 [SCSI] Revert "sg: use rwsem to solve race during exclusive open"
This reverts commit 15b06f9a02.

This is one of four patches that was causing this bug

[  205.372823] ================================================
[  205.372901] [ BUG: lock held when returning to user space! ]
[  205.372979] 3.12.0-rc6-hw-debug-pagealloc+ #67 Not tainted
[  205.373055] ------------------------------------------------
[  205.373132] megarc.bin/5283 is leaving the kernel with locks still held!
[  205.373212] 1 lock held by megarc.bin/5283:
[  205.373285]  #0:  (&sdp->o_sem){.+.+..}, at: [<ffffffff8161e650>] sg_open+0x3a0/0x4d0

Cc: Vaughan Cao <vaughan.cao@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-25 10:59:54 +01:00
James Bottomley 98481ff0bb [SCSI] Revert "sg: no need sg_open_exclusive_lock"
This reverts commit 00b2d9d6d0.

This is one of four patches that was causing this bug

[  205.372823] ================================================
[  205.372901] [ BUG: lock held when returning to user space! ]
[  205.372979] 3.12.0-rc6-hw-debug-pagealloc+ #67 Not tainted
[  205.373055] ------------------------------------------------
[  205.373132] megarc.bin/5283 is leaving the kernel with locks still held!
[  205.373212] 1 lock held by megarc.bin/5283:
[  205.373285]  #0:  (&sdp->o_sem){.+.+..}, at: [<ffffffff8161e650>] sg_open+0x3a0/0x4d0

Cc: Vaughan Cao <vaughan.cao@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-25 10:59:32 +01:00
James Bottomley bafc8ad82d [SCSI] Revert "sg: checking sdp->detached isn't protected when open"
This reverts commit e32c9e6300.

This is one of four patches that was causing this bug

[  205.372823] ================================================
[  205.372901] [ BUG: lock held when returning to user space! ]
[  205.372979] 3.12.0-rc6-hw-debug-pagealloc+ #67 Not tainted
[  205.373055] ------------------------------------------------
[  205.373132] megarc.bin/5283 is leaving the kernel with locks still held!
[  205.373212] 1 lock held by megarc.bin/5283:
[  205.373285]  #0:  (&sdp->o_sem){.+.+..}, at: [<ffffffff8161e650>] sg_open+0x3a0/0x4d0

Cc: Vaughan Cao <vaughan.cao@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-25 10:59:02 +01:00
James Bottomley c0d3b9c29e [SCSI] Revert "sg: push file descriptor list locking down to per-device locking"
This reverts commit 1f962ebcdf.

This is one of four patches that was causing this bug

[  205.372823] ================================================
[  205.372901] [ BUG: lock held when returning to user space! ]
[  205.372979] 3.12.0-rc6-hw-debug-pagealloc+ #67 Not tainted
[  205.373055] ------------------------------------------------
[  205.373132] megarc.bin/5283 is leaving the kernel with locks still held!
[  205.373212] 1 lock held by megarc.bin/5283:
[  205.373285]  #0:  (&sdp->o_sem){.+.+..}, at: [<ffffffff8161e650>] sg_open+0x3a0/0x4d0

Cc: Vaughan Cao <vaughan.cao@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-25 10:58:07 +01:00
Vaughan Cao 1f962ebcdf [SCSI] sg: push file descriptor list locking down to per-device locking
Push file descriptor list locking down to per-device locking. Let sg_index_lock
only protect device lookup.
sdp->detached is also set and checked with this lock held.

Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-09-03 07:28:10 -07:00
Vaughan Cao e32c9e6300 [SCSI] sg: checking sdp->detached isn't protected when open
@detached is set under the protection of sg_index_lock. Without getting the
lock, new sfp will be added during sg removal and there is no chance for it
to be picked out. So check with sg_index_lock held in sg_add_sfp().

Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-09-03 07:28:09 -07:00
Vaughan Cao 00b2d9d6d0 [SCSI] sg: no need sg_open_exclusive_lock
Open exclusive check is protected by o_sem, no need sg_open_exclusive_lock.
@exclude is used to record which type of rwsem we are holding.

Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-09-03 07:28:09 -07:00
Vaughan Cao 15b06f9a02 [SCSI] sg: use rwsem to solve race during exclusive open
A race condition may happen if two threads are both trying to open the same sg
with O_EXCL simultaneously. It's possible that they both find fsds list is
empty and get_exclude(sdp) returns 0, then they both call set_exclude() and
break out from wait_event_interruptible and resume open.

Now use rwsem to protect this process. Exclusive open gets write lock and
others get read lock. The lock will be held until file descriptor is closed.
This also leads 'exclude' only a status rather than a check mark.

Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-09-03 07:28:09 -07:00
Kent Overstreet a27bb332c0 aio: don't include aio.h in sched.h
Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 20:16:25 -07:00
Tejun Heo b98c52b572 scsi: convert to idr_alloc()
Convert to the much saner new idr interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:18 -08:00
Konstantin Khlebnikov 314e51b985 mm: kill vma flag VM_RESERVED and mm->reserved_vm counter
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:

 | effect                 | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump      | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP

This patch removes reserved_vm counter from mm_struct.  Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.

Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.

remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.

[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:19 +09:00
Jörn Engel 18b8ba6cfb [SCSI] sg: constify sg_proc_leaf_arr
Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-05-17 10:08:57 +01:00
Jörn Engel 37b9d1e001 [SCSI] sg: remove sg_mutex
With the exception of the detached field, sg_mutex no longer adds any
locking.  detached handling has been broken before and is still broken
and this patch does not seem to make things worse than they were to
begin with.

However, I have observed cases of tasks being blocked for >200s waiting
for sg_mutex.  So the removal clearly adds value for very little cost.

Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-05-17 10:08:57 +01:00
Jörn Engel 035d12e658 [SCSI] sg: completely protect sfds
sfds is protected by sg_index_lock - except for sg_open(), where it
isn't.  Change that and add some documentation.

Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-05-17 10:08:56 +01:00
Jörn Engel b499e5249e [SCSI] sg: protect sdp->exclude
Changes since v1: set_exclude now returns the new value, which gets
rid of the comma expression and the operator precedence bug.  Thanks
to Douglas for spotting it.

sdp->exclude was previously protected by the BKL.  The sg_mutex, which
replaced the BKL, only semi-protected it, as it was missing from
sg_release() and sg_proc_seq_show_debug().  Take an explicit spinlock
for it.

Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-05-17 10:08:55 +01:00
Jörn Engel 6acddc5e91 [SCSI] sg: prevent unwoken sleep
srp->done is protected by sfp->rq_list_lock everywhere, except for this
one case.  Result can be that the wake-up happens before the cacheline
with the changed srp->done has arrived, so the waiter can go back to
sleep and never be woken up again.

The wait_event_interruptible() means that anyone trying to debug this
unlikely race will likely notice everything working fine again, as the
next signal will unwedge things.  Evil.

Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-05-17 10:08:54 +01:00
Jörn Engel ebaf466be5 [SCSI] sg: remove closed flag
After sg_release() has been called, noone should be able to actually use
that filedescriptor anymore.  So if closed ever made a difference in the
past five years or so, it would have meant a bug.  Remove it.

Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
[jejb: fix up checkpatch warnings]
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-05-17 10:08:53 +01:00
Jörn Engel 3f0c6aba0b [SCSI] sg: use wait_event_interruptible()
Afaics the use of __wait_event_interruptible() as opposed to
wait_event_interruptible() is purely historic.  So let's follow the rest
of the kernel and check the condition before prepare_to_wait() - and
also make the code a bit nicer.

Signed-off-by: Joern Engel <joern@logfs.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-05-17 10:08:53 +01:00