The current BSG design tries to shoe-horn the transport-specific
passthrough commands into the overall framework for SCSI passthrough
requests. This has a couple problems:
- each passthrough queue has to set the QUEUE_FLAG_SCSI_PASSTHROUGH flag
despite not dealing with SCSI commands at all. Because of that these
queues could also incorrectly accept SCSI commands from in-kernel
users or through the legacy SCSI_IOCTL_SEND_COMMAND ioctl.
- the real SCSI bsg queues also incorrectly accept bsg requests of the
BSG_SUB_PROTOCOL_SCSI_TRANSPORT type
- the bsg transport code is almost unredable because it tries to reuse
different SCSI concepts for its own purpose.
This patch instead adds a new bsg_ops structure to handle the two cases
differently, and thus solves all of the above problems. Another side
effect is that the bsg-lib queues also don't need to embedd a
struct scsi_request anymore.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The SCSI PRE-FETCH (10 or 16) command is present both on hard disks
and some SSDs. It is useful when the address of the next block(s) to
be read is known but it is not following the LBA of the current READ
(so read-ahead won't help). It returns two "good" SCSI Status values.
If the requested blocks have fitted (or will most likely fit (when
the IMMED bit is set)) into the disk's cache, it returns CONDITION
MET. If it didn't (or will not) fit then it returns GOOD status.
The goal of this patch is to stop the SCSI subsystem treating the
CONDITION MET SCSI status as an error. The current state makes the
PRE-FETCH command effectively unusable via pass-throughs.
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch has been generated as follows:
for verb in set_unlocked clear_unlocked set clear; do
replace-in-files queue_flag_${verb} blk_queue_flag_${verb%_unlocked} \
$(git grep -lw queue_flag_${verb} drivers block/bsg*)
done
Except for protecting all queue flag changes with the queue lock
this patch does not change any functionality.
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch is mostly fixes for driver specific issues (nine of them)
and the storvsc performance improvement with interrupt handling which
was dropped from the previous fixes pull request. We also have two
regressions: one is a double call_rcu() in ATA error handling and the
other is a missed conversion to BLK_STS_OK in
__scsi_error_from_host_byte().
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWp7/dCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishSYAAP0RiYSR
oq3yLesh09tx0u+w2He8ylpdVizIzNMTjNE+BAD9GqeQWEvNaoGPUwNeMFJuwawX
hNjOttF1YOfFlsuoG94=
=9e1j
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is mostly fixes for driver specific issues (nine of them) and the
storvsc performance improvement with interrupt handling which was
dropped from the previous fixes pull request.
We also have two regressions: one is a double call_rcu() in ATA error
handling and the other is a missed conversion to BLK_STS_OK in
__scsi_error_from_host_byte()"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qedi: Fix kernel crash during port toggle
scsi: qla2xxx: Fix FC-NVMe LUN discovery
scsi: core: return BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()
scsi: core: Avoid that ATA error handling can trigger a kernel hang or oops
scsi: qla2xxx: ensure async flags are reset correctly
scsi: qla2xxx: do not check login_state if no loop id is assigned
scsi: qla2xxx: Fixup locking for session deletion
scsi: qla2xxx: Fix NULL pointer crash due to active timer for ABTS
scsi: mpt3sas: wait for and flush running commands on shutdown/unload
scsi: mpt3sas: fix oops in error handlers after shutdown/unload
scsi: storvsc: Spread interrupts when picking a channel for I/O requests
scsi: megaraid_sas: Do not use 32-bit atomic request descriptor for Ventura controllers
In scsi core, __scsi_queue_insert should just put request back on the
queue and retry using the same command as before. However, for blk-mq,
scsi_mq_requeue_cmd is employed here which will unprepare the
request. To align with the semantics of __scsi_queue_insert, use
blk_mq_requeue_request with kick_requeue_list == true and put the
reference of scsi_device.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When converting __scsi_error_from_host_byte() to BLK_STS error codes the
case DID_OK was forgotten, resulting in it always returning an error.
Fixes: 2a842acab1 ("block: introduce new block status code type")
Cc: Doug Gilbert <dgilbert@interlog.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Avoid that the recently introduced call_rcu() call in the SCSI core
triggers a double call_rcu() call.
Reported-by: Natanael Copa <ncopa@alpinelinux.org>
Reported-by: Damien Le Moal <damien.lemoal@wdc.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=198861
Fixes: 3bd6f43f5c ("scsi: core: Ensure that the SCSI error handler gets woken up")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Natanael Copa <ncopa@alpinelinux.org>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Alexandre Oliva <oliva@gnu.org>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Make scsi_test_unit_ready() send at most as many TURs as specified in
the 'retries' argument instead of retries * (retries + 1) / 2.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJadzbSAAoJEPfTWPspceCmt5QP/jo6MSsNVevAQOE75Jje+qa/
aF/BjHBdUmmI5WtPrtoz4igaJou7M2U0s8jdsc3c7uMw8dGTKc6ujIquSEn0wevY
faJPTjWzLum3y50gwRHcrHCQIlxOe5/f9rJevW4+q76aMP3aWKjO4bgBExH+2XnA
CaT+6d40skYt20Sy428H0yhVdDAMiQYXTeg4SssWQY9AvJSSiW7ax+vmP3r5BKpV
dXHggwgzqDuMwLZG80Tfg4GHGv5qisIrqLOCxtXNYHDNb/aDmbTFTO2jPgobT8gW
N2kWxsOkBayUdPw6Nt2Wlm4toQgR5GJGH04LH2vI5p4dp4Grvx/aFGvUbT7+sN1u
g/mmqsUUnYuO5AJ8XY2s2F7ezaT6v9x8BbLHuA2vz0r5GsdFVXctZ/bXgQqkmh9i
KLtfyOPldlczclVEuKL4xai1aXLcoBzDwyLxzbFp3+eAlhcgoSqxnMsE4fCJblCU
dfShDChu1SbBD6dyGx8sol9cT48RFj2tBtpfcYxFW/NJJOQoh9FTqPQetYQxQ72c
TadEf40hmw5Q2l0Hu5pwVbKHWUP0wn0VznkAOfT4VV1ysk93oExMbjgS2qh16xEZ
oQwFDQMk3D8BXI9VwH8gUUnypkhcooMhznxSC3BQxjGn/R+byp7QEPvxSEZz/4nD
BaBSbyAU5cpof+Eaqs4B
=qeDb
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20180204' of git://git.kernel.dk/linux-block
Pull more block updates from Jens Axboe:
"Most of this is fixes and not new code/features:
- skd fix from Arnd, fixing a build error dependent on sla allocator
type.
- blk-mq scheduler discard merging fixes, one from me and one from
Keith. This fixes a segment miscalculation for blk-mq-sched, where
we mistakenly think two segments are physically contigious even
though the request isn't carrying real data. Also fixes a bio-to-rq
merge case.
- Don't re-set a bit on the buffer_head flags, if it's already set.
This can cause scalability concerns on bigger machines and
workloads. From Kemi Wang.
- Add BLK_STS_DEV_RESOURCE return value to blk-mq, allowing us to
distuingish between a local (device related) resource starvation
and a global one. The latter might happen without IO being in
flight, so it has to be handled a bit differently. From Ming"
* tag 'for-linus-20180204' of git://git.kernel.dk/linux-block:
block: skd: fix incorrect linux/slab_def.h inclusion
buffer: Avoid setting buffer bits that are already set
blk-mq-sched: Enable merging discard bio into request
blk-mq: fix discard merge with scheduler attached
blk-mq: introduce BLK_STS_DEV_RESOURCE
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass all
hardened usercopy checks since these sizes cannot change at runtime.)
This new check is WARN-by-default, so any mistakes can be found over the
next several releases without breaking anyone's system.
The series has roughly the following sections:
- remove %p and improve reporting with offset
- prepare infrastructure and whitelist kmalloc
- update VFS subsystem with whitelists
- update SCSI subsystem with whitelists
- update network subsystem with whitelists
- update process memory with whitelists
- update per-architecture thread_struct with whitelists
- update KVM with whitelists and fix ioctl bug
- mark all other allocations as not whitelisted
- update lkdtm for more sensible test overage
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Kees Cook <kees@outflux.net>
iQIcBAABCgAGBQJabvleAAoJEIly9N/cbcAmO1kQAJnjVPutnLSbnUteZxtsv7W4
43Cggvokfxr6l08Yh3hUowNxZVKjhF9uwMVgRRg9Nl5WdYCN+vCQbHz+ZdzGJXKq
cGqdKWgexMKX+aBdNDrK7BphUeD46sH7JWR+a/lDV/BgPxBCm9i5ZZCgXbPP89AZ
NpLBji7gz49wMsnm/x135xtNlZ3dG0oKETzi7MiR+NtKtUGvoIszSKy5JdPZ4m8q
9fnXmHqmwM6uQFuzDJPt1o+D1fusTuYnjI7EgyrJRRhQ+BB3qEFZApXnKNDRS9Dm
uB7jtcwefJCjlZVCf2+PWTOEifH2WFZXLPFlC8f44jK6iRW2Nc+wVRisJ3vSNBG1
gaRUe/FSge68eyfQj5OFiwM/2099MNkKdZ0fSOjEBeubQpiFChjgWgcOXa5Bhlrr
C4CIhFV2qg/tOuHDAF+Q5S96oZkaTy5qcEEwhBSW15ySDUaRWFSrtboNt6ZVOhug
d8JJvDCQWoNu1IQozcbv6xW/Rk7miy8c0INZ4q33YUvIZpH862+vgDWfTJ73Zy9H
jR/8eG6t3kFHKS1vWdKZzOX1bEcnd02CGElFnFYUEewKoV7ZeeLsYX7zodyUAKyi
Yp5CImsDbWWTsptBg6h9nt2TseXTxYCt2bbmpJcqzsqSCUwOQNQ4/YpuzLeG0ihc
JgOmUnQNJWCTwUUw5AS1
=tzmJ
-----END PGP SIGNATURE-----
Merge tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardened usercopy whitelisting from Kees Cook:
"Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs.
To further restrict what memory is available for copying, this creates
a way to whitelist specific areas of a given slab cache object for
copying to/from userspace, allowing much finer granularity of access
control.
Slab caches that are never exposed to userspace can declare no
whitelist for their objects, thereby keeping them unavailable to
userspace via dynamic copy operations. (Note, an implicit form of
whitelisting is the use of constant sizes in usercopy operations and
get_user()/put_user(); these bypass all hardened usercopy checks since
these sizes cannot change at runtime.)
This new check is WARN-by-default, so any mistakes can be found over
the next several releases without breaking anyone's system.
The series has roughly the following sections:
- remove %p and improve reporting with offset
- prepare infrastructure and whitelist kmalloc
- update VFS subsystem with whitelists
- update SCSI subsystem with whitelists
- update network subsystem with whitelists
- update process memory with whitelists
- update per-architecture thread_struct with whitelists
- update KVM with whitelists and fix ioctl bug
- mark all other allocations as not whitelisted
- update lkdtm for more sensible test overage"
* tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits)
lkdtm: Update usercopy tests for whitelisting
usercopy: Restrict non-usercopy caches to size 0
kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
kvm: whitelist struct kvm_vcpu_arch
arm: Implement thread_struct whitelist for hardened usercopy
arm64: Implement thread_struct whitelist for hardened usercopy
x86: Implement thread_struct whitelist for hardened usercopy
fork: Provide usercopy whitelisting for task_struct
fork: Define usercopy region in thread_stack slab caches
fork: Define usercopy region in mm_struct slab caches
net: Restrict unwhitelisted proto caches to size 0
sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
sctp: Define usercopy region in SCTP proto slab cache
caif: Define usercopy region in caif proto slab cache
ip: Define usercopy region in IP proto slab cache
net: Define usercopy region in struct proto slab cache
scsi: Define usercopy region in scsi_sense_cache slab cache
cifs: Define usercopy region in cifs_request slab cache
vxfs: Define usercopy region in vxfs_inode slab cache
ufs: Define usercopy region in ufs_inode_cache slab cache
...
This is mostly updates of the usual driver suspects: arcmsr,
scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas,
hisi_sas. We also have a rework of the libsas hotplug handling to
make it more robust, a slew of 32 bit time conversions and fixes, and
a host of the usual minor updates and style changes. The biggest
potential for regressions is the libsas hotplug changes, but so far
they seem stable under testing.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWnH+5SYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishWxuAP0UvuJp
MNR/yU/wv/emSzOc48Ldwd7I0xD2XxSnloGUgwD+IGZZT5yNUQA1THCbm+en4hkB
WvyBieQs9qRit+2czd4=
=gJMf
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates of the usual driver suspects: arcmsr,
scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas,
hisi_sas.
We also have a rework of the libsas hotplug handling to make it more
robust, a slew of 32 bit time conversions and fixes, and a host of the
usual minor updates and style changes. The biggest potential for
regressions is the libsas hotplug changes, but so far they seem stable
under testing"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (313 commits)
scsi: qla2xxx: Fix logo flag for qlt_free_session_done()
scsi: arcmsr: avoid do_gettimeofday
scsi: core: Add VENDOR_SPECIFIC sense code definitions
scsi: qedi: Drop cqe response during connection recovery
scsi: fas216: fix sense buffer initialization
scsi: ibmvfc: Remove unneeded semicolons
scsi: hisi_sas: fix a bug in hisi_sas_dev_gone()
scsi: hisi_sas: directly attached disk LED feature for v2 hw
scsi: hisi_sas: devicetree: bindings: add LED feature for v2 hw
scsi: megaraid_sas: NVMe passthrough command support
scsi: megaraid: use ktime_get_real for firmware time
scsi: fnic: use 64-bit timestamps
scsi: qedf: Fix error return code in __qedf_probe()
scsi: devinfo: fix format of the device list
scsi: qla2xxx: Update driver version to 10.00.00.05-k
scsi: qla2xxx: Add XCB counters to debugfs
scsi: qla2xxx: Fix queue ID for async abort with Multiqueue
scsi: qla2xxx: Fix warning for code intentation in __qla24xx_handle_gpdb_event()
scsi: qla2xxx: Fix warning during port_name debug print
scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout()
...
This status is returned from driver to block layer if device related
resource is unavailable, but driver can guarantee that IO dispatch
will be triggered in future when the resource is available.
Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver
returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is
3 ms because both scsi-mq and nvmefc are using that magic value.
If a driver can make sure there is in-flight IO, it is safe to return
BLK_STS_DEV_RESOURCE because:
1) If all in-flight IOs complete before examining SCHED_RESTART in
blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
is run immediately in this case by blk_mq_dispatch_rq_list();
2) if there is any in-flight IO after/when examining SCHED_RESTART
in blk_mq_dispatch_rq_list():
- if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
- otherwise, this request will be dispatched after any in-flight IO is
completed via blk_mq_sched_restart()
3) if SCHED_RESTART is set concurently in context because of
BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
cases and make sure IO hang can be avoided.
One invariant is that queue will be rerun if SCHED_RESTART is set.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
SCSI sense buffers, stored in struct scsi_cmnd.sense and therefore
contained in the scsi_sense_cache slab cache, need to be copied to/from
userspace.
cache object allocation:
drivers/scsi/scsi_lib.c:
scsi_select_sense_cache(...):
return ... ? scsi_sense_isadma_cache : scsi_sense_cache
scsi_alloc_sense_buffer(...):
return kmem_cache_alloc_node(scsi_select_sense_cache(), ...);
scsi_init_request(...):
...
cmd->sense_buffer = scsi_alloc_sense_buffer(...);
...
cmd->req.sense = cmd->sense_buffer
example usage trace:
block/scsi_ioctl.c:
(inline from sg_io)
blk_complete_sghdr_rq(...):
struct scsi_request *req = scsi_req(rq);
...
copy_to_user(..., req->sense, len)
scsi_cmd_ioctl(...):
sg_io(...);
In support of usercopy hardening, this patch defines a region in
the scsi_sense_cache slab cache in which userspace copy operations
are allowed.
This region is known as the slab cache's usercopy region. Slab caches
can now check that each dynamically sized copy operation involving
cache-managed memory falls entirely within the slab's usercopy region.
Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
This patch does not change any functionality but makes the SCSI core
source code slightly easier to read.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 651a013649 ("scsi: scsi_transport_sas: switch to bsg-lib for
SMP passthrough") removed the only call to scsi_initialize_rq() from
outside the SCSI core. Hence unexport scsi_initialize_rq().
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If scsi_eh_scmd_add() is called concurrently with
scsi_host_queue_ready() while shost->host_blocked > 0 then it can
happen that neither function wakes up the SCSI error handler. Fix
this by making every function that decreases the host_busy counter
wake up the error handler if necessary and by protecting the
host_failed checks with the SCSI host lock.
Reported-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
References: https://marc.info/?l=linux-kernel&m=150461610630736
Fixes: commit 7466501608 ("scsi: convert host_busy to atomic_t")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: Konstantin Khorenko <khorenko@virtuozzo.com>
Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Before commit 0df21c86bd ("scsi: implement .get_budget and .put_budget
for blk-mq"), we run queue after 3ms if queue is idle and SCSI device
queue isn't ready, which is done in handling BLK_STS_RESOURCE. After
commit 0df21c86bd is introduced, queue won't be run any more under
this situation.
IO hang is observed when timeout happened, and this patch fixes the IO
hang issue by running queue after delay in scsi_dev_queue_ready, just
like non-mq. This issue can be triggered by the following script[1].
There is another issue which can be covered by running idle queue: when
.get_budget() is called on request coming from hctx->dispatch_list, if
one request just completes during .get_budget(), we can't depend on
SCSI's restart to make progress any more. This patch fixes the race too.
With this patch, we basically recover to previous behaviour (before
commit 0df21c86bd) of handling idle queue when running out of
resource.
[1] script for test/verify SCSI timeout
rmmod scsi_debug
modprobe scsi_debug max_queue=1
DEVICE=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename`
DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*`
echo "using scsi device $DEVICE"
echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth
echo "temporary write through" >$DISK_DIR/cache_type
echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts
echo none > /sys/block/$DEVICE/queue/scheduler
dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 &
sleep 5
echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts
wait
echo "SUCCESS"
Fixes: 0df21c86bd ("scsi: implement .get_budget and .put_budget for blk-mq")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In non-coherent DMA mode, kernel uses cache flushing operations to
maintain I/O coherency, so scsi's block queue should be aligned to the
value returned by dma_get_cache_alignment(). Otherwise, If a DMA buffer
and a kernel structure share a same cache line, and if the kernel
structure has dirty data, cache_invalidate (no writeback) will cause
data corruption.
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhc@lemote.com>
[hch: rebased and updated the comment and changelog]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas,
megaraid_sas, pm80xx, mpt3sas, be2iscsi, hpsa. and a host of minor
updates.
There's no major behaviour change or additions to the core in all of
this, so the potential for regressions should be small (biggest
potential being in the scsi error handler changes).
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJaCxtCAAoJEAVr7HOZEZN4d9EQAI+OHP6ss6zjKKC21c9jNPcH
NhLrNv37gHg/LA2VXeUEL9RGUjCGLIUrI4HsrxzkFAMLKP4TkshMs8/2RvczY+Sa
VpayPqVybEKLIS6ipQyM1SLIQff2nvtDVcN/T+8z1lkk45TrbA6ZGuwUwd2aJyEA
2V2wtg51ObnL0Nr9QPPll0JrtL1AnCZyRlu9XrwTZuuSBZwk93opIuuvbZm/3dVg
Ir4GSS4Y+PuHIfu4cxqdsPMdzRdY9I2me1YiE4jeFSn1/VTAjL4HBz7fO9eITT42
VhXSpDz1XvFsa9dJ0ubkqoALpJzCfOcBw+EuGvSydLEvOBoEVwMccdfaD9lT1zc5
L9e1Z5qqJoq7hTA6xTXCYfWG73I9HYvljtmc8yudKHhADOdnSTUXhaO6uBF0RNqD
OxPSA1RZwRx3c6lDOcK6BTtvLAkTEuYKdrWSKJi0w+QXJAyQ6etqbmsKpmPdRim7
Z4ZSpJFro2gyo9gcdJO0ykTG+z3U7Z/ay1sNgnuprsv+eU/QjUdlAPl18o79EkRf
H54zZggZ4wC6q/cFVVt4Vx+V+oqIeu38s7NDXS9UltLoTZPm2EzDW6pXd/38Z4Tf
a1oBAUET8kYLC90P8sVZxUIHZjITlpgDbyE2Lq00PMYXhk8S4IxF0aMN5RvVqzUv
+7N2HrHkSSgG1nhw1t+E
=3O85
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas,
megaraid_sas, pm80xx, mpt3sas, be2iscsi, hpsa. and a host of minor
updates.
There's no major behaviour change or additions to the core in all of
this, so the potential for regressions should be small (biggest
potential being in the scsi error handler changes)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (203 commits)
scsi: lpfc: Fix hard lock up NMI in els timeout handling.
scsi: mpt3sas: remove a stray KERN_INFO
scsi: mpt3sas: cleanup _scsih_pcie_enumeration_event()
scsi: aacraid: use timespec64 instead of timeval
scsi: scsi_transport_fc: add 64GBIT and 128GBIT port speed definitions
scsi: qla2xxx: Suppress a kernel complaint in qla_init_base_qpair()
scsi: mpt3sas: fix dma_addr_t casts
scsi: be2iscsi: Use kasprintf
scsi: storvsc: Avoid excessive host scan on controller change
scsi: lpfc: fix kzalloc-simple.cocci warnings
scsi: mpt3sas: Update mpt3sas driver version.
scsi: mpt3sas: Fix sparse warnings
scsi: mpt3sas: Fix nvme drives checking for tlr.
scsi: mpt3sas: NVMe drive support for BTDHMAPPING ioctl command and log info
scsi: mpt3sas: Add-Task-management-debug-info-for-NVMe-drives.
scsi: mpt3sas: scan and add nvme device after controller reset
scsi: mpt3sas: Set NVMe device queue depth as 128
scsi: mpt3sas: Handle NVMe PCIe device related events generated from firmware.
scsi: mpt3sas: API's to remove nvme drive from sml
scsi: mpt3sas: API 's to support NVMe drive addition to SML
...
Pull core block layer updates from Jens Axboe:
"This is the main pull request for block storage for 4.15-rc1.
Nothing out of the ordinary in here, and no API changes or anything
like that. Just various new features for drivers, core changes, etc.
In particular, this pull request contains:
- A patch series from Bart, closing the whole on blk/scsi-mq queue
quescing.
- A series from Christoph, building towards hidden gendisks (for
multipath) and ability to move bio chains around.
- NVMe
- Support for native multipath for NVMe (Christoph).
- Userspace notifications for AENs (Keith).
- Command side-effects support (Keith).
- SGL support (Chaitanya Kulkarni)
- FC fixes and improvements (James Smart)
- Lots of fixes and tweaks (Various)
- bcache
- New maintainer (Michael Lyle)
- Writeback control improvements (Michael)
- Various fixes (Coly, Elena, Eric, Liang, et al)
- lightnvm updates, mostly centered around the pblk interface
(Javier, Hans, and Rakesh).
- Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)
- Writeback series that fix the much discussed hundreds of millions
of sync-all units. This goes all the way, as discussed previously
(me).
- Fix for missing wakeup on writeback timer adjustments (Yafang
Shao).
- Fix laptop mode on blk-mq (me).
- {mq,name} tupple lookup for IO schedulers, allowing us to have
alias names. This means you can use 'deadline' on both !mq and on
mq (where it's called mq-deadline). (me).
- blktrace race fix, oopsing on sg load (me).
- blk-mq optimizations (me).
- Obscure waitqueue race fix for kyber (Omar).
- NBD fixes (Josef).
- Disable writeback throttling by default on bfq, like we do on cfq
(Luca Miccio).
- Series from Ming that enable us to treat flush requests on blk-mq
like any other request. This is a really nice cleanup.
- Series from Ming that improves merging on blk-mq with schedulers,
getting us closer to flipping the switch on scsi-mq again.
- BFQ updates (Paolo).
- blk-mq atomic flags memory ordering fixes (Peter Z).
- Loop cgroup support (Shaohua).
- Lots of minor fixes from lots of different folks, both for core and
driver code"
* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
nvme: fix visibility of "uuid" ns attribute
blk-mq: fixup some comment typos and lengths
ide: ide-atapi: fix compile error with defining macro DEBUG
blk-mq: improve tag waiting setup for non-shared tags
brd: remove unused brd_mutex
blk-mq: only run the hardware queue if IO is pending
block: avoid null pointer dereference on null disk
fs: guard_bio_eod() needs to consider partitions
xtensa/simdisk: fix compile error
nvme: expose subsys attribute to sysfs
nvme: create 'slaves' and 'holders' entries for hidden controllers
block: create 'slaves' and 'holders' entries for hidden gendisks
nvme: also expose the namespace identification sysfs files for mpath nodes
nvme: implement multipath access to nvme subsystems
nvme: track shared namespaces
nvme: introduce a nvme_ns_ids structure
nvme: track subsystems
block, nvme: Introduce blk_mq_req_flags_t
block, scsi: Make SCSI quiesce and resume work reliably
block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
...
The contexts from which a SCSI device can be quiesced or resumed are:
* Writing into /sys/class/scsi_device/*/device/state.
* SCSI parallel (SPI) domain validation.
* The SCSI device power management methods. See also scsi_bus_pm_ops.
It is essential during suspend and resume that neither the filesystem
state nor the filesystem metadata in RAM changes. This is why while
the hibernation image is being written or restored that SCSI devices
are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
and scsi_device_resume(). In the SDEV_QUIESCE state execution of
non-preempt requests is deferred. This is realized by returning
BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
devices. Avoid that a full queue prevents power management requests
to be submitted by deferring allocation of non-preempt requests for
devices in the quiesced state. This patch has been tested by running
the following commands and by verifying that after each resume the
fio job was still running:
for ((i=0; i<10; i++)); do
(
cd /sys/block/md0/md &&
while true; do
[ "$(<sync_action)" = "idle" ] && echo check > sync_action
sleep 1
done
) &
pids=($!)
for d in /sys/class/block/sd*[a-z]; do
bdev=${d#/sys/class/block/}
hcil=$(readlink "$d/device")
hcil=${hcil#../../../}
echo 4 > "$d/queue/nr_requests"
echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth"
fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 \
--rw=randread --ioengine=libaio --numjobs=4 --iodepth=16 \
--iodepth_batch=1 --thread --loops=$((2**31)) &
pids+=($!)
done
sleep 1
echo "$(date) Hibernating ..." >>hibernate-test-log.txt
systemctl hibernate
sleep 10
kill "${pids[@]}"
echo idle > /sys/block/md0/md/sync_action
wait
echo "$(date) Done." >>hibernate-test-log.txt
done
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348).
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Convert blk_get_request(q, op, __GFP_RECLAIM) into
blk_get_request_flags(q, op, BLK_MQ_PREEMPT). This patch does not
change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Acked-by: David S. Miller <davem@davemloft.net> [ for IDE ]
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This reverts commit 8a97712e53.
This commit added a call to sysfs_notify() from within
scsi_device_set_state(), which in turn turns out to make libata very
unhappy, because ata_eh_detach_dev() does
spin_lock_irqsave(ap->lock, flags);
..
if (ata_scsi_offline_dev(dev)) {
dev->flags |= ATA_DFLAG_DETACHED;
ap->pflags |= ATA_PFLAG_SCSI_HOTPLUG;
}
and ata_scsi_offline_dev() then does that scsi_device_set_state() to set
it offline.
So now we called sysfs_notify() from within a spinlocked region, which
really doesn't work. The 0day robot reported this as:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
because sysfs_notify() ends up calling kernfs_find_and_get_ns() which
then does mutex_lock(&kernfs_mutex)..
The pollability of the device state isn't critical, so revert this all
for now, and maybe we'll do it differently in the future.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is enough to just check if we can get the budget via .get_budget().
And we don't need to deal with device state change in .get_budget().
For SCSI, one issue to be fixed is that we have to call
scsi_mq_uninit_cmd() to free allocated ressources if SCSI device fails
to handle the request. And it isn't enough to simply call
blk_mq_end_request() to do that if this request is marked as
RQF_DONTPREP.
Fixes: 0df21c86bdbf(scsi: implement .get_budget and .put_budget for blk-mq)
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It is very expensive to atomic_inc/atomic_dec the host wide counter of
host->busy_count, and it should have been avoided via blk-mq's mechanism
of getting driver tag, which uses the more efficient way of sbitmap queue.
Also we don't check atomic_read(&sdev->device_busy) in scsi_mq_get_budget()
and don't run queue if the counter becomes zero, so IO hang may be caused
if all requests are completed just before the current SCSI device
is added to shost->starved_list.
Fixes: 0df21c86bdbf(scsi: implement .get_budget and .put_budget for blk-mq)
Reported-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We need to tell blk-mq to reserve resources before queuing one request,
so implement these two callbacks. Then blk-mq can avoid to dequeue
request too early, and IO merging can be improved a lot.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In the following patch, we will implement scsi_get_budget()
which need to call scsi_prep_state_check() when rq isn't
dequeued yet.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The legacy block layer handles requests as follows:
- If the prep function returns BLKPREP_OK, let blk_peek_request()
return the pointer to that request.
- If the prep function returns BLKPREP_DEFER, keep the RQF_STARTED
flag and retry calling the prep function later.
- If the prep function returns BLKPREP_KILL or BLKPREP_INVALID, end
the request.
In none of these cases it is correct to clear the SCMD_INITIALIZED
flag from inside scsi_prep_fn(). Since scsi_prep_fn() already
guarantees that scsi_init_command() will be called once even if
scsi_prep_fn() is called multiple times, remove the code that clears
SCMD_INITIALIZED from scsi_prep_fn().
The scsi-mq code handles requests as follows:
- If scsi_mq_prep_fn() returns BLKPREP_OK, set the RQF_DONTPREP flag
and submit the request to the SCSI LLD.
- If scsi_mq_prep_fn() returns BLKPREP_DEFER, call
blk_mq_delay_run_hw_queue() and return BLK_STS_RESOURCE.
- If the prep function returns BLKPREP_KILL or BLKPREP_INVALID, call
scsi_mq_uninit_cmd() and let the blk-mq core end the request.
In none of these cases scsi_mq_prep_fn() should clear the
SCMD_INITIALIZED flag. Hence remove the code from scsi_mq_prep_fn()
function that clears that flag.
This patch avoids that the following warning is triggered when using
the legacy block layer:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 4198 at drivers/scsi/scsi_lib.c:654 scsi_end_request+0x1de/0x220
CPU: 1 PID: 4198 Comm: mkfs.f2fs Not tainted 4.14.0-rc5+ #1
task: ffff91c147a4b800 task.stack: ffffb282c37b8000
RIP: 0010:scsi_end_request+0x1de/0x220
Call Trace:
<IRQ>
scsi_io_completion+0x204/0x5e0
scsi_finish_command+0xce/0xe0
scsi_softirq_done+0x126/0x130
blk_done_softirq+0x6e/0x80
__do_softirq+0xcf/0x2a8
irq_exit+0xab/0xb0
do_IRQ+0x7b/0xc0
common_interrupt+0x90/0x90
</IRQ>
RIP: 0010:_raw_spin_unlock_irqrestore+0x9/0x10
__test_set_page_writeback+0xc7/0x2c0
__block_write_full_page+0x158/0x3b0
block_write_full_page+0xc4/0xd0
blkdev_writepage+0x13/0x20
__writepage+0x12/0x40
write_cache_pages+0x204/0x500
generic_writepages+0x48/0x70
blkdev_writepages+0x9/0x10
do_writepages+0x34/0xc0
__filemap_fdatawrite_range+0x6c/0x90
file_write_and_wait_range+0x31/0x90
blkdev_fsync+0x16/0x40
vfs_fsync_range+0x44/0xa0
do_fsync+0x38/0x60
SyS_fsync+0xb/0x10
entry_SYSCALL_64_fastpath+0x13/0x94
---[ end trace 86e8ef85a4a6c1d1 ]---
Fixes: commit 64104f7032 ("scsi: Call scsi_initialize_rq() for filesystem requests")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
As per SAM there is a status precedence, with any sense code 29/XX
taking second place just after an ACA ACTIVE status. Additionally, each
target might prefer to not queue any unit attention conditions, but just
report one. Due to the above, this will be that one with the highest
precedence. This results in the sense code 29/XX effectively
overwriting any other unit attention. Hence we should report the
power-on reset to userland so that it can take appropriate action.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix comments style (use kernel-doc style) and content to clarify some
functions. Also fix some functions signature indentation and remove a
useless blank line in sd_zbc_read_zones().
No functional change is introduced by this patch.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
One of the two scsi-mq functions that requeue a request unprepares a
request before requeueing (scsi_io_completion()) but the other function
not (__scsi_queue_insert()). Make sure that a request is unprepared
before requeuing it.
Fixes: commit d285203cf6 ("scsi: add support for a blk-mq based I/O path.")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Requests are unprepared and reprepared when being requeued. Avoid that
requeuing resets .jiffies_at_alloc and .retries by initializing these
two member variables from inside scsi_initialize_rq() and by preserving
both member variables when preparing a request. This patch affects the
requeuing behavior of both the legacy scsi and the scsi-mq code paths.
Reported-by: Brian King <brking@linux.vnet.ibm.com>
References: https://lkml.org/lkml/2017/8/18/923 ("Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a pass-through request is submitted then blk_get_request()
initializes that request by calling scsi_initialize_rq(). Also call this
function for filesystem requests. Introduce CMD_INITIALIZED to keep
track of whether or not a request has already been initialized.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Introduce struct scsi_vpd for the VPD page length, data and the RCU head
that will be used to free the VPD data. Use kfree_rcu() instead of
kfree() to free VPD data. Move the VPD buffer pointer check inside the
RCU read lock in the sysfs code. Only annotate pointers that are shared
across threads with __rcu. Use rcu_dereference() when dereferencing an
RCU pointer. This patch suppresses about twenty sparse complaints about
the vpd_pg8[03] pointers. This patch also fixes a race condition, namely
that updating of the VPD pointers and length variables in struct
scsi_device was not atomic with reference to the code reading these
variables. See also "Does the update code tolerate concurrent accesses?"
in Documentation/RCU/checklist.txt.
Fixes: commit 09e2b0b146 ("scsi: rescan VPD attributes")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Shane Seymour <shane.seymour@hpe.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Shane Seymour <shane.seymour@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The kerneldoc comment for scsi_initialize_rq() neglected to document the
"rq" parameter, leading to this docs build warning:
./drivers/scsi/scsi_lib.c:1116: warning: No description found for parameter 'rq'
Document the parameter and make the build slightly quieter.
[mkp: used wording suggested by Bart]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The function returns '0' if successful; with the original comment
the function doesn't have a way to indicate success ...
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart van Assche <bvanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since commit e9c787e65c ("scsi: allocate scsi_cmnd structures as
part of struct request") struct request and struct scsi_cmnd are
adjacent. This means that there is now an alternative to reading
req->special to convert a pointer to a prepared request into a
SCSI command pointer, namely by using blk_mq_rq_to_pdu(). Make
this change where appropriate. Although this patch does not
change any functionality, it slightly improves performance and
slightly improves readability.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Rename several functions to make it easy to see which queue type a
function is intended for.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While the 'state' attribute can (and will) change occasionally,
calling 'poll()' or 'select()' on it fails as sysfs is never
notified that the state has changed.
With this patch calling 'poll()' or 'select()' will work
properly.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Rework scsi_internal_device_unblock_nowait() into using a switch
statement. No functional changes.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is mostly updates of the usual suspects: lpfc, qla2xxx, bnx2fc,
qedf, hpsa, hisi_sas, smartpqi, cxlflash, aacraid, csiostor along with
a host of minor and miscellaneous changes.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJZW7JMAAoJEAVr7HOZEZN4IM4P/AqBtvH+6Lo1Eb+3A/HnHskK
hIxVxgBxaw3fhW5AegDfVCvrdVTTEkCB/g5CIKN8NCWEx6OGmCX0Lu6lnjld9BOZ
cTlPtzNwFGlgHrz34LwCc3vlc5ovMpTQBrpGAQpGGWoAZIP+c3ilEihIYTEMNCsN
dmjI71AigDE5g6X1OT361IJ1gydkjfG41IcRe/jlMtEgRNdy3B2JVIdATL89Pw4b
0uZO3uUTn8EGEKUdyJZCNpie7sGZv8u2LhA+Znby2C4h3bwWNV/d0p7ped4xrQY5
yVpZEUbYVdcOOYBgeBJlfwOhvjRQTdxeK4d7W9XTb+AQf30F3DgSepdMCdf3BjVt
KnQvBOTxyidB8xsCL46wlxxNew3qoUtaKoY88WUOOnnJwU5U7hlRtPkf/eO2i5QF
+k7fxUpFfkBTS4I2gXnyGWpmSoxwJerd0knojSOjrjJcAlcgM65+pocUAea/0Dpr
SsfL2sTb12gk6bkF9UlRv8/4aSsWYb92WW1nbTt2nFRXncPNN5Qzc3lGj//36O+b
2bka+aSKVAFoNAnQ1pUE8EJxSboy5q7y4509iZzO/Fom+pVuzBROm5fmrpcOE5dP
IjW7gqSFB6578tnNiK049rrrPja5wkUa+Ptc8s0FjPAVyIDrp2RN+f2nljOBBhW8
3Z1nXMG0eFqvb5taLtfZ
=D9QX
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates of the usual suspects: lpfc, qla2xxx, bnx2fc,
qedf, hpsa, hisi_sas, smartpqi, cxlflash, aacraid, csiostor along with
a host of minor and miscellaneous changes"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (276 commits)
qla2xxx: Fix NVMe entry_type for iocb packet on BE system
scsi: qla2xxx: avoid unused-function warning
scsi: snic: fix a couple of spelling mistakes/typos
scsi: qla2xxx: fix a bunch of typos and spelling mistakes
scsi: lpfc: don't double count abort errors
scsi: lpfc: spin_lock_irq() is not nestable
scsi: hisi_sas: optimise DMA slot memory
scsi: ibmvfc: constify dev_pm_ops structures.
scsi: ibmvscsi: constify dev_pm_ops structures.
scsi: cxlflash: Update debug prints in reset handlers
scsi: cxlflash: Update send_tmf() parameters
scsi: cxlflash: Avoid double free of character device
scsi: Add STARGET_CREATED_REMOVE state to scsi_target_state
scsi: ses: do not add a device to an enclosure if enclosure_add_links() fails.
scsi: ufs: flush eh_work when eh_work scheduled.
scsi: qla2xxx: Protect access to qpair members with qpair->qp_lock
scsi: sun_esp: fix device reference leaks
scsi: fnic: changing queue command to return result DID_IMM_RETRY when rport is init
scsi: fnic: correct speed display and add support for 25,40 and 100G
scsi: fnic: added timestamp reporting in fnic debug stats
...
Since scsi_req_init() works on a struct scsi_request, change the
argument type into struct scsi_request *.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of explicitly calling scsi_req_init() after blk_get_request(),
call that function from inside blk_get_request(). Add an
.initialize_rq_fn() callback function to the block drivers that need
it. Merge the IDE .init_rq_fn() function into .initialize_rq_fn()
because it is too small to keep it as a separate function. Keep the
scsi_req_init() call in ide_prep_sense() because it follows a
blk_rq_init() call.
References: commit 82ed4db499 ("block: split scsi_request out of struct request")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk_mq_unquiesce_queue() is used for unquiescing the
queue explicitly, so replace blk_mq_start_stopped_hw_queues()
with it.
For the scsi part, this patch takes Bart's suggestion to
switch to block quiesce/unquiesce API completely.
Cc: linux-nvme@lists.infradead.org
Cc: linux-scsi@vger.kernel.org
Cc: dm-devel@redhat.com
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch reduces code duplication. There are two functional changes in
this patch:
- It causes scsi_mq_prep_fn() to clear driver-private command data, just
like the already upstream commit 1bad6c4a57 ("scsi: zero per-cmd
private driver data for each MQ I/O").
- The initialization of .prot_sdb is moved from scsi_mq_prep_fn() into
scsi_init_request().
[mkp: applied by hand]
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch does not change any functionality but makes the next patch
easier to read.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Just like for the scsi-mq code path, in the single queue SCSI code path
only add commands to the per-device command list if required by the SCSI
LLD. This patch will make it easier to merge the single-queue and
multiqueue command initialization code.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a device is blocked, make __scsi_remove_device() cause it to
transition to the DEL state. This means that all the commands issued in
.shutdown() will error in the mid-layer, thus making the removal proceed
without being stopped.
This patch is a slightly modified version of a patch from James
Bottomley. This patch avoids that the following lockup occurs:
Call Trace:
schedule+0x35/0x80
schedule_timeout+0x237/0x2d0
io_schedule_timeout+0xa6/0x110
wait_for_completion_io+0xa3/0x110
blk_execute_rq+0xdf/0x120
scsi_execute+0xce/0x150 [scsi_mod]
scsi_execute_req_flags+0x8f/0xf0 [scsi_mod]
sd_sync_cache+0xa9/0x190 [sd_mod]
sd_shutdown+0x6a/0x100 [sd_mod]
sd_remove+0x64/0xc0 [sd_mod]
__device_release_driver+0x8d/0x120
device_release_driver+0x1e/0x30
bus_remove_device+0xf9/0x170
device_del+0x127/0x240
__scsi_remove_device+0xc1/0xd0 [scsi_mod]
scsi_forget_host+0x57/0x60 [scsi_mod]
scsi_remove_host+0x72/0x110 [scsi_mod]
srp_remove_work+0x8b/0x200 [ib_srp]
Reported-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Serializing SCSI device state changes avoids that two state changes can
occur concurrently, e.g. the state changes in scsi_target_block() and
__scsi_remove_device(). This serialization is essential to make patch
"Make __scsi_remove_device go straight from BLOCKED to DEL" work
reliably.
Enable this mechanism for all scsi_target_*block() callers but not for
the scsi_internal_device_unblock() calls from the mpt3sas driver because
that driver can call scsi_internal_device_unblock() from atomic context.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This will make it easier to serialize SCSI device state changes through
a mutex.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of passing a "wait" argument to scsi_internal_device_block(),
split this function into a function that waits and a function that
doesn't wait. This will make it easier to serialize SCSI device state
changes through a mutex.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dereferencing shost from scsi_exit_rq() is not safe because the SCSI
host may already have been freed when scsi_exit_rq() is called.
Increasing the shost reference count in scsi_init_rq() and dropping that
reference in scsi_exit_rq() is nontrivial since scsi_host_dev_release()
may sleep and since scsi_exit_rq() may be called from interrupt
context. Since scsi_exit_rq() only needs a single bit from shost, copy
that bit into struct scsi_cmnd.
Reported-by: Scott Bauer <scott.bauer@intel.com>
Fixes: e9c787e65c ("scsi: allocate scsi_cmnd structures as part of struct request")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Scott Bauer <scott.bauer@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use the same values for use for request completion errors as the return
value from ->queue_rq. BLK_STS_RESOURCE is special cased to cause
a requeue, and all the others are completed as-is.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Currently we use nornal Linux errno values in the block layer, and while
we accept any error a few have overloaded magic meanings. This patch
instead introduces a new blk_status_t value that holds block layer specific
status codes and explicitly explains their meaning. Helpers to convert from
and to the previous special meanings are provided for now, but I suspect
we want to get rid of them in the long run - those drivers that have a
errno input (e.g. networking) usually get errnos that don't know about
the special block layer overloads, and similarly returning them to userspace
will usually return somethings that strictly speaking isn't correct
for file system operations, but that's left as an exercise for later.
For now the set of errors is a very limited set that closely corresponds
to the previous overloaded errno values, but there is some low hanging
fruite to improve it.
blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
typechecking, so that we can easily catch places passing the wrong values.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
From the context where a SCSI command is submitted it is not always
possible to figure out whether or not the queue the command is
submitted to has struct scsi_request as the first member of its
private data. Hence introduce the flag QUEUE_FLAG_SCSI_PASSTHROUGH.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Don Brace <don.brace@microsemi.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
In lower layer driver's (LLD) scsi_host_template, the driver may
optionally ask SCSI to allocate its private driver memory for each
command, by specifying cmd_size. This memory is allocated at the end of
scsi_cmnd by SCSI. Later when SCSI queues a command, the LLD can use
scsi_cmd_priv to get to its private data.
Some LLD, e.g. hv_storvsc, doesn't clear its private data before use. In
this case, the LLD may get to stale or uninitialized data in its private
driver memory. This may result in unexpected driver and hardware
behavior.
Fix this problem by also zeroing the private driver memory before
passing them to LLD.
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Reviewed-by: KY Srinivasan <kys@microsoft.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
CC: <stable@vger.kernel.org> # 4.11+
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch avoids that when building with W=1 the compiler complains
that __scsi_init_queue() has not been declared. See also commit
d48777a633 ("scsi: remove __scsi_alloc_queue").
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull block fixes and updates from Jens Axboe:
"Some fixes and followup features/changes that should go in, in this
merge window. This contains:
- Two fixes for lightnvm from Javier, fixing problems in the new code
merge previously in this merge window.
- A fix from Jan for the backing device changes, fixing an issue in
NFS that causes a failure to mount on certain setups.
- A change from Christoph, cleaning up the blk-mq init and exit
request paths.
- Remove elevator_change(), which is now unused. From Bart.
- A fix for queue operation invocation on a dead queue, from Bart.
- A series fixing up mtip32xx for blk-mq scheduling, removing a
bandaid we previously had in place for this. From me.
- A regression fix for this series, fixing a case where we wait on
workqueue flushing from an invalid (non-blocking) context. From me.
- A fix/optimization from Ming, ensuring that we don't both quiesce
and freeze a queue at the same time.
- A fix from Peter on lock ordering for CPU hotplug. Not a real
problem right now, but will be once the CPU hotplug rework goes in.
- A series from Omar, cleaning up out blk-mq debugfs support, and
adding support for exporting info from schedulers in debugfs as
well. This is really useful in debugging stalls or livelocks. From
Omar"
* 'for-linus' of git://git.kernel.dk/linux-block: (28 commits)
mq-deadline: add debugfs attributes
kyber: add debugfs attributes
blk-mq-debugfs: allow schedulers to register debugfs attributes
blk-mq: untangle debugfs and sysfs
blk-mq: move debugfs declarations to a separate header file
blk-mq: Do not invoke queue operations on a dead queue
blk-mq-debugfs: get rid of a bunch of boilerplate
blk-mq-debugfs: rename hw queue directories from <n> to hctx<n>
blk-mq-debugfs: don't open code strstrip()
blk-mq-debugfs: error on long write to queue "state" file
blk-mq-debugfs: clean up flag definitions
blk-mq-debugfs: separate flags with |
nfs: Fix bdi handling for cloned superblocks
block/mq: Cure cpu hotplug lock inversion
lightnvm: fix bad back free on error path
lightnvm: create cmd before allocating request
blk-mq: don't use sync workqueue flushing from drivers
mtip32xx: convert internal commands to regular block infrastructure
mtip32xx: cleanup internal tag assumptions
block: don't call blk_mq_quiesce_queue() after queue is frozen
...
Remove the request_idx parameter, which can't be used safely now that we
support I/O schedulers with blk-mq. Except for a superflous check in
mtip32xx it was unused anyway.
Also pass the tag_set instead of just the driver data - this allows drivers
to avoid some code duplication in a follow on cleanup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull block layer updates from Jens Axboe:
- Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
was initially a fork of CFQ, but subsequently changed to implement
fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
to be used on desktop type single drives, providing good fairness.
From Paolo.
- Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
using a scalable token based algorithm that throttles IO based on
live completion IO stats, similary to blk-wbt. From Omar.
- A series from Jan, moving users to separately allocated backing
devices. This continues the work of separating backing device life
times, solving various problems with hot removal.
- A series of updates for lightnvm, mostly from Javier. Includes a
'pblk' target that exposes an open channel SSD as a physical block
device.
- A series of fixes and improvements for nbd from Josef.
- A series from Omar, removing queue sharing between devices on mostly
legacy drivers. This helps us clean up other bits, if we know that a
queue only has a single device backing. This has been overdue for
more than a decade.
- Fixes for the blk-stats, and improvements to unify the stats and user
windows. This both improves blk-wbt, and enables other users to
register a need to receive IO stats for a device. From Omar.
- blk-throttle improvements from Shaohua. This provides a scalable
framework for implementing scalable priotization - particularly for
blk-mq, but applicable to any type of block device. The interface is
marked experimental for now.
- Bucketized IO stats for IO polling from Stephen Bates. This improves
efficiency of polled workloads in the presence of mixed block size
IO.
- A few fixes for opal, from Scott.
- A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
From a variety of folks, mostly Sagi and James Smart.
- A series from Bart, improving our exposed info and capabilities from
the blk-mq debugfs support.
- A series from Christoph, cleaning up how handle WRITE_ZEROES.
- A series from Christoph, cleaning up the block layer handling of how
we track errors in a request. On top of being a nice cleanup, it also
shrinks the size of struct request a bit.
- Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
never used by platforms, and the latter has outlived it's usefulness.
- Various little bug fixes and cleanups from a wide variety of folks.
* 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
block: hide badblocks attribute by default
blk-mq: unify hctx delay_work and run_work
block: add kblock_mod_delayed_work_on()
blk-mq: unify hctx delayed_run_work and run_work
nbd: fix use after free on module unload
MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
blk-mq-sched: alloate reserved tags out of normal pool
mtip32xx: use runtime tag to initialize command header
scsi: Implement blk_mq_ops.show_rq()
blk-mq: Add blk_mq_ops.show_rq()
blk-mq: Show operation, cmd_flags and rq_flags names
blk-mq: Make blk_flags_show() callers append a newline character
blk-mq: Move the "state" debugfs attribute one level down
blk-mq: Unregister debugfs attributes earlier
blk-mq: Only unregister hctxs for which registration succeeded
blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
blk-mq: Let blk_mq_debugfs_register() look up the queue name
blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
ide-pm: always pass 0 error to __blk_end_request_all
..
Show the SCSI CDB for pending SCSI commands in
/sys/kernel/debug/block/*/mq/*/dispatch and */rq_list. An example
of how SCSI commands are displayed by this code:
ffff8801703245c0 {.op=READ, .cmd_flags=META PRIO, .rq_flags=DONTPREP IO_STAT STATS, .tag=14, .internal_tag=-1, .cmd=Read(10) 28 00 2a 81 1b 30 00 00 08 00}
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: <linux-scsi@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Our final fix before the 4.12 release (hopefully). It's an error leg
again: the fix to not bug on empty DMA transfers is returning the
wrong code and confusing the block layer.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJY/gx3AAoJEAVr7HOZEZN4ai0P/i7xr9kaKtH2trTE5d51fG1y
Jk1ec5Ri6mqjiJY4r1u6wbEu5y1bcUlsyz71QsRbOOBiC951dpAbQwKeWa2rmQWd
YLevigg3Jh2sM1WghYSgRGPZ9uz2+j5jHpHz55jTTr6WQKUixY/Ms6DG4Ya9tWVO
Tuzip4Vsga/91g27Z7HDGxxg6y0n7eEAPEYZJcmpwUl2F+zscZh3RX1YDHTU+BP9
Z7inla293PWUf4kXNP6KUT63vO5w5C0fCvqoFU5p59JyPY+nB5O0povZ9XlHw7Ez
ug0YHVOvX+1wLc7fzrhFoV0mUEutKQnLF4sBtNFrfZFkYgbYYfmNmAAVTc7CpkNS
tBVWCzq8v053HXIx2a8bP6wnztVeQhpXkmLfBSgXgTP6+ae1dH/F+3v9lA5RUwgx
FJ5XgfxAtdpiJ5tao8Deb0D9KJ5NKymy0cfY8oA5nB0Oto5RrqesO8hVmF7zoaXV
TWrPOTuFX6Zin70AwhLtZRiNirGplcEnSOU+EsBi6OBZ629pvUHNguMbR6IQRsTz
hso15Ve++UnaG/fwWBzZPqFc1fI5PlI0YtgM8dngqkc54VKIA0MmkcI2b+Fw8C4w
Ugir0rHlL+od36NI2J0vZIMqzwNwsJQ9fOGxuuKGy8fKB4JYiYnFTJWa7YjAmfle
BLfWA+xZ84jcFwvNpYMj
=tuno
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"Our final fix before the 4.12 release (hopefully).
It's an error leg again: the fix to not bug on empty DMA transfers is
returning the wrong code and confusing the block layer"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: return correct blkprep status code in case scsi_init_io() fails.
Now that all drivers that call blk_mq_complete_requests have a
->complete callback we can remove the direct call to blk_mq_end_request,
as well as the error argument to blk_mq_complete_request.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This passes on the scsi_cmnd result field to users of passthrough
requests. Currently we abuse req->errors for this purpose, but that
field will go away in its current form.
Note that the old IDE code abuses the errors field in very creative
ways and stores all kinds of different values in it. I didn't dare
to touch this magic, so the abuses are brought forward 1:1.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
When instrumenting the SCSI layer to run into the
!blk_rq_nr_phys_segments(rq) case the following warning emitted from the
block layer:
blk_peek_request: bad return=-22
This happens because since commit fd3fc0b4d7 ("scsi: don't BUG_ON()
empty DMA transfers") we return the wrong error value from
scsi_prep_fn() back to the block layer.
[mkp: silenced checkpatch]
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: fd3fc0b4d7 scsi: don't BUG_ON() empty DMA transfers
Cc: <stable@vger.kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We've added a considerable amount of fixes for stalls and issues
with the blk-mq scheduling in the 4.11 series since forking
off the for-4.12/block branch. We need to do improvements on
top of that for 4.12, so pull in the previous fixes to make
our lives easier going forward.
Signed-off-by: Jens Axboe <axboe@fb.com>
If a .queue_rq() function returns BLK_MQ_RQ_QUEUE_BUSY then the block
driver that implements that function is responsible for rerunning the
hardware queue once requests can be queued again successfully.
commit 52d7f1b5c2 ("blk-mq: Avoid that requeueing starts stopped
queues") removed the blk_mq_stop_hw_queue() call from scsi_queue_rq()
for the BLK_MQ_RQ_QUEUE_BUSY case. Hence change all calls to functions
that are intended to rerun a busy queue such that these examine all
hardware queues instead of only stopped queues.
Since no other functions than scsi_internal_device_block() and
scsi_internal_device_unblock() should ever stop or restart a SCSI
queue, change the blk_mq_delay_queue() call into a
blk_mq_delay_run_hw_queue() call.
Fixes: commit 52d7f1b5c2 ("blk-mq: Avoid that requeueing starts stopped queues")
Fixes: commit 7e79dadce2 ("blk-mq: stop hardware queue in blk_mq_delay_queue()")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Long Li <longli@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
There hasn't been any reports for HBAs where asynchronous abort
would not work, so we should make it mandatory and remove
the fallback.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi_eh_scmd_add() currently only will fail if no
error handler thread is started (which will never be the
case) or if the state machine encounters an illegal transition.
But if we're encountering an invalid state transition
chances is we cannot fixup things with the error handler.
So better add a WARN_ON for illegal host states and
make scsi_dh_scmd_add() a void function.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of bloating the generic struct request with it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Constify all instances of blk_mq_ops, as they are never modified.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This is the set of stuff that didn't quite make the initial pull and a
set of fixes for stuff which did. The new stuff is basically lpfc
(nvme), qedi and aacraid. The fixes cover a lot of previously
submitted stuff, the most important of which probably covers some of
the failing irq vectors allocation and other fallout from having the
SCSI command allocated as part of the block allocation functions.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJYug/3AAoJEAVr7HOZEZN4bRwP/RvAW/NWlvs812gUeHNahUHx
HF3EtY5YHneZev1L4fa1Myd8cNcB3gic53e1WvQhN5f+LczeI7iXHH8xIQK/B4GT
OI5W7ZCGlhIamgsU8tUFGrtIOM6N/LIMZPMsaE62dcev0jC0Db4p/5yv5bUg1yuL
JOLWqIaTYTx7b5FwR2cXKv7KGYpXonvcoVUDV6s8aMYToJhJiQ6JtWmjaAA+/OJ7
HTcqNX/KSWQBG3ERjE3A/rGBFxPR9KPq8kpqVJlw39KEN9tMjXb0QAGvAeauNye2
xaIjV41tXH1Ys3aoh6ZU+TpzN3HLlH/gegRe8671fUwqu/RnSzFC8Eidoddzgqne
z5BF0Dy36FA5ol2HTFEwS5WM1gRM+z6YGnbhpE/XfyTL2sLHRynSXKxb02IdjRNi
gCdV89AL8xdfPPFJjsx4hGWUJhalW/RwuPayAupePKGQHePabHa19McI2ssg0WBg
OC6uuEk7vT95xuTZVJlrwXTJPnc9dxWJwzh21oEnvMfWM0VMW06EKT4AcX8FOsjL
RswiYp9wl8JDmfNxyJnQJj0+QXeIE/gh35vBQCGsLVKY4+xzbi3yZfb1sZB0XOum
yLK2LFQ+Ltk1TaPPSZsgXQZfodot0dFJPTnWr+whOfY1kepNlFRwQHltXnCWzgRs
QG//WHsk0u1Mj2gehwmQ
=e+6v
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"This is the set of stuff that didn't quite make the initial pull and a
set of fixes for stuff which did.
The new stuff is basically lpfc (nvme), qedi and aacraid. The fixes
cover a lot of previously submitted stuff, the most important of which
probably covers some of the failing irq vectors allocation and other
fallout from having the SCSI command allocated as part of the block
allocation functions"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (59 commits)
scsi: qedi: Fix memory leak in tmf response processing.
scsi: aacraid: remove redundant zero check on ret
scsi: lpfc: use proper format string for dma_addr_t
scsi: lpfc: use div_u64 for 64-bit division
scsi: mac_scsi: Fix MAC_SCSI=m option when SCSI=m
scsi: cciss: correct check map error.
scsi: qla2xxx: fix spelling mistake: "seperator" -> "separator"
scsi: aacraid: Fixed expander hotplug for SMART family
scsi: mpt3sas: switch to pci_alloc_irq_vectors
scsi: qedf: fixup compilation warning about atomic_t usage
scsi: remove scsi_execute_req_flags
scsi: merge __scsi_execute into scsi_execute
scsi: simplify scsi_execute_req_flags
scsi: make the sense header argument to scsi_test_unit_ready mandatory
scsi: sd: improve TUR handling in sd_check_events
scsi: always zero sshdr in scsi_normalize_sense
scsi: scsi_dh_emc: return success in clariion_std_inquiry()
scsi: fix memory leak of sdpk on when gd fails to allocate
scsi: sd: make sd_devt_release() static
scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.
...
Commit 669f044170 ("scsi: srp_transport: Move queuecommand() wait code
to SCSI core") can make scsi_internal_device_block() sleep. However,
the mpt3sas driver can call this function from an interrupt
handler. Hence add a second argument to scsi_internal_device_block()
that restores the old behavior of this function for the mpt3sas handler.
The call chain that triggered an "IRQ handler enabled interrupts"
complaint is as follows:
_base_interrupt()
-> _base_async_event()
-> mpt3sas_scsih_event_callback()
-> _scsih_check_topo_delete_events()
-> _scsih_block_io_to_children_attached_directly()
-> _scsih_block_io_device()
-> _scsih_internal_device_block()
-> scsi_internal_device_block()
Reported-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Omar Sandoval <osandov@osandov.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sathya Prakash <sathya.prakash@broadcom.com>
Cc: Chaitra P B <chaitra.basappa@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
Cc: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Cc: <stable@vger.kernel.org> # v4.10+
Tested-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
And switch all callers to use scsi_execute instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
All but one caller want the decoded sense header, so offer the existing
__scsi_execute helper as the public scsi_execute API to simply the
callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add a sshdr argument to __scsi_execute so that we can decode the sense
data directly into the sense header instead of needing a copy of it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It's a tiny structure that can be allocated on the stack, don't
complicate the code by making it optional.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The device handler needs to check if a given queue belongs to a scsi
device; only then does it make sense to attach a device handler.
[mkp: dropped flags]
Cc: <stable@vger.kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Without this drivers that don't clear the state themselves can see off
effects. For example Hyper-V VMs using the storvsc driver will often
hang during boot due to uncleared Test Unit Ready failures.
Fixes: e9c787e6 ("scsi: allocate scsi_cmnd structures as part of struct request")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJYqeb8AAoJEPfTWPspceCmB3UP/3UtcPrzEm8w2cxB9MaWhZN3
J+jiwlO4vaqhm2HVzQtoJqfaqRlud/iDx5cIXE2S7FnIM54ZKs3CANbKu8X+b1zm
eJije3zMI8A8qyftigbz6a/Y2kWE4ZqFEc9WU5CWawfTl3ImCVUi8+F5X0wOLU/h
r50zAQOEyURH4G5usNl9q0olF6FonJ82AcYm1iJ0QP2wYWZRJauC0rRn8IT93tyK
bZPHnGKdkd7km8yi3zr2GNWOfuZZuA0HWAaF4qfrHPZQ883gITFAUIlFb1f+2TNl
DkQzRrBB2wPWPnlbfb9KejMkvL94hflzsLb5rHt835DyVXFRyjxsgyAI8A+LPGSz
vqZ3rsbWj6H4F9z2CkZ+T+AP/ZSWDNjwc0RXPm9HYdR5CDeTxIUVvnFQ44YNsmTv
Xd5BKrUJ2oKegAxQG6zcuFx23p8JzhT70l+mNrMdtyeKnDD9FRdDvhKG9AHeTipn
o/DnGivhS3UMQoQ7D68KOO+kuhLDeo7my5XGsnjzMO/iHqg++7IP2HyYYs/Ba4qZ
cYaCtSDQW71Zt0vsqa6dvPuXBveu4h8Qh8R7uAGjSGS9IAFFb4Cab2tiUdISE6PE
YnMWzY+G6pT8imlLVOL5/QFuo2Q4pUsaL0AHpXMCN9TZnQtbqXa8eqwnKnQ0m2KN
7ut0IYYEPaYUX5xFn1K6
=z7AL
-----END PGP SIGNATURE-----
Merge tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
- blk-mq scheduling framework from me and Omar, with a port of the
deadline scheduler for this framework. A port of BFQ from Paolo is in
the works, and should be ready for 4.12.
- Various fixups and improvements to the above scheduling framework
from Omar, Paolo, Bart, me, others.
- Cleanup of the exported sysfs blk-mq data into debugfs, from Omar.
This allows us to export more information that helps debug hangs or
performance issues, without cluttering or abusing the sysfs API.
- Fixes for the sbitmap code, the scalable bitmap code that was
migrated from blk-mq, from Omar.
- Removal of the BLOCK_PC support in struct request, and refactoring of
carrying SCSI payloads in the block layer. This cleans up the code
nicely, and enables us to kill the SCSI specific parts of struct
request, shrinking it down nicely. From Christoph mainly, with help
from Hannes.
- Support for ranged discard requests and discard merging, also from
Christoph.
- Support for OPAL in the block layer, and for NVMe as well. Mainly
from Scott Bauer, with fixes/updates from various others folks.
- Error code fixup for gdrom from Christophe.
- cciss pci irq allocation cleanup from Christoph.
- Making the cdrom device operations read only, from Kees Cook.
- Fixes for duplicate bdi registrations and bdi/queue life time
problems from Jan and Dan.
- Set of fixes and updates for lightnvm, from Matias and Javier.
- A few fixes for nbd from Josef, using idr to name devices and a
workqueue deadlock fix on receive. Also marks Josef as the current
maintainer of nbd.
- Fix from Josef, overwriting queue settings when the number of
hardware queues is updated for a blk-mq device.
- NVMe fix from Keith, ensuring that we don't repeatedly mark and IO
aborted, if we didn't end up aborting it.
- SG gap merging fix from Ming Lei for block.
- Loop fix also from Ming, fixing a race and crash between setting loop
status and IO.
- Two block race fixes from Tahsin, fixing request list iteration and
fixing a race between device registration and udev device add
notifiations.
- Double free fix from cgroup writeback, from Tejun.
- Another double free fix in blkcg, from Hou Tao.
- Partition overflow fix for EFI from Alden Tondettar.
* tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block: (156 commits)
nvme: Check for Security send/recv support before issuing commands.
block/sed-opal: allocate struct opal_dev dynamically
block/sed-opal: tone down not supported warnings
block: don't defer flushes on blk-mq + scheduling
blk-mq-sched: ask scheduler for work, if we failed dispatching leftovers
blk-mq: don't special case flush inserts for blk-mq-sched
blk-mq-sched: don't add flushes to the head of requeue queue
blk-mq: have blk_mq_dispatch_rq_list() return if we queued IO or not
block: do not allow updates through sysfs until registration completes
lightnvm: set default lun range when no luns are specified
lightnvm: fix off-by-one error on target initialization
Maintainers: Modify SED list from nvme to block
Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN
uapi: sed-opal fix IOW for activate lsp to use correct struct
cdrom: Make device operations read-only
elevator: fix loading wrong elevator type for blk-mq devices
cciss: switch to pci_irq_alloc_vectors
block/loop: fix race between I/O and set_status
blk-mq-sched: don't hold queue_lock when calling exit_icq
block: set make_request_fn manually in blk_mq_update_nr_hw_queues
...
Don't crash the machine just because of an empty transfer. Use WARN_ON()
combined with returning an error.
Found by Dmitry Vyukov and syzkaller.
[ Changed to "WARN_ON_ONCE()". Al has a patch that should fix the root
cause, but a BUG_ON() is not acceptable in any case, and a WARN_ON()
might still be a cause of excessive log spamming.
NOTE! If this warning ever triggers, we may end up leaking resources,
since this doesn't bother to try to clean the command up. So this
WARN_ON_ONCE() triggering does imply real problems. But BUG_ON() is
much worse.
People really need to stop using BUG_ON() for "this shouldn't ever
happen". It makes pretty much any bug worse. - Linus ]
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of keeping two levels of indirection for requests types, fold it
all into the operations. The little caveat here is that previously
cmd_type only applied to struct request, while the request and bio op
fields were set to plain REQ_OP_READ/WRITE even for passthrough
operations.
Instead this patch adds new REQ_OP_* for SCSI passthrough and driver
private requests, althought it has to add two for each so that we
can communicate the data in/out nature of the request.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
This can be used to check for fs vs non-fs requests and basically
removes all knowledge of BLOCK_PC specific from the block layer,
as well as preparing for removing the cmd_type field in struct request.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
And require all drivers that want to support BLOCK_PC to allocate it
as the first thing of their private data. To support this the legacy
IDE and BSG code is switched to set cmd_size on their queues to let
the block layer allocate the additional space.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Rely on the new block layer functionality to allocate additional driver
specific data behind struct request instead of implementing it in SCSI
itѕelf.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Instead do an internal export of __scsi_init_queue for the transport
classes that export BSG nodes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Currently blk-mq always allocates the sense buffer using normal GFP_KERNEL
allocation. Refactor the cmd pool code to split the cmd and sense allocation
and share the code to allocate the sense buffers as well as the sense buffer
slab caches between the legacy and blk-mq path.
Note that this switches to lazy allocation of the sense slab caches - the
slab caches (not the actual allocations) won't be destroy until the scsi
module is unloaded instead of keeping track of hosts using them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull block fixes from Jens Axboe:
- the virtio_blk stack DMA corruption fix from Christoph, fixing and
issue with VMAP stacks.
- O_DIRECT blkbits calculation fix from Chandan.
- discard regression fix from Christoph.
- queue init error handling fixes for nbd and virtio_blk, from Omar and
Jeff.
- two small nvme fixes, from Christoph and Guilherme.
- rename of blk_queue_zone_size and bdev_zone_size to _sectors instead,
to more closely follow what we do in other places in the block layer.
This interface is new for this series, so let's get the naming right
before releasing a kernel with this feature. From Damien.
* 'for-linus' of git://git.kernel.dk/linux-block:
block: don't try to discard from __blkdev_issue_zeroout
sd: remove __data_len hack for WRITE SAME
nvme: use blk_rq_payload_bytes
scsi: use blk_rq_payload_bytes
block: add blk_rq_payload_bytes
block: Rename blk_queue_zone_size and bdev_zone_size
nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too
nvme-rdma: fix nvme_rdma_queue_is_ready
virtio_blk: fix panic in initialization error path
nbd: blk_mq_init_queue returns an error code on failure, not NULL
virtio_blk: avoid DMA to stack for the sense buffer
do_direct_IO: Use inode->i_blkbits to compute block count to be cleaned
Without that we'll pass a wrong payload size in cmd->sdb, which
can lead to hangs with drivers that need the total transfer size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Chris Valean <v-chvale@microsoft.com>
Reported-by: Dexuan Cui <decui@microsoft.com>
Fixes: f9d03f96 ("block: improve handling of the magic discard payload")
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Ensure that if scsi-mq is enabled that scsi_internal_device_block()
waits until ongoing shost->hostt->queuecommand() calls have finished.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Doug Ledford <dledford@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This update includes the usual round of major driver updates (ncr5380,
lpfc, hisi_sas, megaraid_sas, ufs, ibmvscsis, mpt3sas). There's also
an assortment of minor fixes, mostly in error legs or other not very
user visible stuff. The major change is the pci_alloc_irq_vectors
replacement for the old pci_msix_.. calls; this effectively makes IRQ
mapping generic for the drivers and allows blk_mq to use the
information.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJYUJOvAAoJEAVr7HOZEZN42B0P/1lj1W2N7y0LOAbR2MasyQvT
fMD/SSip/v+R/zJiTv+5M/IDQT6ez62JnQGWyX3HZTob9VEfoqagbUuHH6y+fmib
tHyqiYc9FC/NZSRX/0ib+rpnSVhC/YRSVV7RrAqilbpOKAzeU25FlN/vbz+Nv/XL
ReVBl+2nGjJtHyWqUN45Zuf74c0zXOWPPUD0hRaNclK5CfZv5wDYupzHzTNSQTkj
nWvwPYT0OgSMNe7mR+IDFyOe3UQ/OYyeJB0yBNqO63IiaUabT8/hgrWR9qJAvWh8
LiH+iSQ69+sDUnvWvFjuls/GzcFuuTljwJbm+FyTsmNHONPVY8JRCLNq7CNDJ6Vx
HwpNuJdTSJpne4lAVBGPwgjs+GhlMvUP/xYVLWAXdaBxU9XGePrwqQDcFu1Rbx3P
yfMiVaY1+e45OEjLRCbDAwFnMPevC3kyymIvSsTySJxhTbYrOsyrrWt5kwWsvE3r
SKANsub+xUnpCkyg57nXRQStJSCiSfGIDsydKmMX+pf1SR4k6gCUQZlcchUX0uOa
dcY6re0c7EJIQQiT7qeGP5TRBblxARocCA/Igx6b5U5HmuQ48tDFlMCps7/TE84V
JBsBnmkXcEi/ALShL/Tui+3YKA1DfOtEnwHtXx/9Ecx/nxP2Sjr9LJwCKiONv8NY
RgLpGfccrix34lQumOk5
=sPXh
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This update includes the usual round of major driver updates (ncr5380,
lpfc, hisi_sas, megaraid_sas, ufs, ibmvscsis, mpt3sas).
There's also an assortment of minor fixes, mostly in error legs or
other not very user visible stuff. The major change is the
pci_alloc_irq_vectors replacement for the old pci_msix_.. calls; this
effectively makes IRQ mapping generic for the drivers and allows
blk_mq to use the information"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (256 commits)
scsi: qla4xxx: switch to pci_alloc_irq_vectors
scsi: hisi_sas: support deferred probe for v2 hw
scsi: megaraid_sas: switch to pci_alloc_irq_vectors
scsi: scsi_devinfo: remove synchronous ALUA for NETAPP devices
scsi: be2iscsi: set errno on error path
scsi: be2iscsi: set errno on error path
scsi: hpsa: fallback to use legacy REPORT PHYS command
scsi: scsi_dh_alua: Fix RCU annotations
scsi: hpsa: use %phN for short hex dumps
scsi: hisi_sas: fix free'ing in probe and remove
scsi: isci: switch to pci_alloc_irq_vectors
scsi: ipr: Fix runaway IRQs when falling back from MSI to LSI
scsi: dpt_i2o: double free on error path
scsi: cxlflash: Migrate scsi command pointer to AFU command
scsi: cxlflash: Migrate IOARRIN specific routines to function pointers
scsi: cxlflash: Cleanup queuecommand()
scsi: cxlflash: Cleanup send_tmf()
scsi: cxlflash: Remove AFU command lock
scsi: cxlflash: Wait for active AFU commands to timeout upon tear down
scsi: cxlflash: Remove private command pool
...
Instead of allocating a single unused biovec for discard requests, send
them down without any payload. Instead we allow the driver to add a
"special" payload using a biovec embedded into struct request (unioned
over other fields never used while in the driver), and overloading
the number of segments for this case.
This has a couple of advantages:
- we don't have to allocate the bio_vec
- the amount of special casing for discard requests in the block
layer is significantly reduced
- using this same scheme for other request types is trivial,
which will be important for implementing the new WRITE_ZEROES
op on devices where it actually requires a payload (e.g. SCSI)
- we can get rid of playing games with the request length, as
we'll never touch it and completions will work just fine
- it will allow us to support ranged discard operations in the
future by merging non-contiguous discard bios into a single
request
- last but not least it removes a lot of code
This patch is the common base for my WIP series for ranges discards and to
remove discard_zeroes_data in favor of always using REQ_OP_WRITE_ZEROES,
so it would be good to get it in quickly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Additionally, rename srp_wait_for_queuecommand() into
scsi_wait_for_queuecommand() and add a comment about the queuecommand()
call from scsi_send_eh_cmnd().
Note: this patch changes scsi_internal_device_block from a function that
did not sleep into a function that may sleep. This is fine for all
callers of this function:
* scsi_internal_device_block() is called from the mpt3sas device while
that driver holds the ioc->dm_cmds.mutex. This means that the mpt3sas
driver calls this function from thread context.
* scsi_target_block() is called by __iscsi_block_session() from
kernel thread context and with IRQs enabled.
* The SRP transport code also calls scsi_target_block() from kernel
thread context while sleeping is allowed.
* The snic driver also calls scsi_target_block() from a context from
which sleeping is allowed. The scsi_target_block() call namely occurs
immediately after a scsi_flush_work() call.
[mkp: s/shost/sdev/]
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <jejb@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Doug Ledford <dledford@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Let's not depend on any of the BLK_MQ_RQ_QUEUE_* constants having
specific values. No functional change.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Just hand through the blk-mq map_queues method in the host template.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>