Commit Graph

467 Commits

Author SHA1 Message Date
Martin K. Petersen e6bd931284 scsi: sd: Separate zeroout and discard command choices
Now that zeroout and discards are distinct operations we need to
separate the policy of choosing the appropriate command. Create a
zeroing_mode which can be one of:

write:			Zeroout assist not present, use regular WRITE
writesame:		Allow WRITE SAME(10/16) with a zeroed payload
writesame_16_unmap:	Allow WRITE SAME(16) with UNMAP
writesame_10_unmap:	Allow WRITE SAME(10) with UNMAP

The last two are conditional on the device being thin provisioned with
LBPRZ=1 and LBPWS=1 or LBPWS10=1 respectively.

Whether to set the UNMAP bit or not depends on the REQ_NOUNMAP flag. And
if none of the _unmap variants are supported, regular WRITE SAME will be
used if the device supports it.

The zeroout_mode is exported in sysfs and the detected mode for a given
device can be overridden using the string constants above.

With this change in place we can now issue WRITE SAME(16) with UNMAP set
for block zeroing applications that require hard guarantees and
logical_block_size granularity. And at the same time use the UNMAP
command with the device's preferred granulary and alignment for discard
operations.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Christoph Hellwig 48920ff2a5 block: remove the discard_zeroes_data flag
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
kill this hack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Christoph Hellwig e4b878371b sd: implement unmapping Write Zeroes
Try to use a write same with unmap bit variant if the device supports it
and the caller allows for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Christoph Hellwig 02d261034f sd: implement REQ_OP_WRITE_ZEROES
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Christoph Hellwig 81d926e8b5 sd: split sd_setup_discard_cmnd
Split sd_setup_discard_cmnd into one function per provisioning type.  While
this creates some very slight duplication of boilerplate code it keeps the
code modular for additions of new provisioning types, and for reusing the
write same functions for the upcoming scsi implementation of the Write Zeroes
operation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Martin K. Petersen 7c856152cb scsi: sd: Fix capacity calculation with 32-bit sector_t
We previously made sure that the reported disk capacity was less than
0xffffffff blocks when the kernel was not compiled with large sector_t
support (CONFIG_LBDAF). However, this check assumed that the capacity
was reported in units of 512 bytes.

Add a sanity check function to ensure that we only enable disks if the
entire reported capacity can be expressed in terms of sector_t.

Cc: <stable@vger.kernel.org>
Reported-by: Steve Magnani <steve.magnani@digidescorp.com>
Cc: Bart Van Assche <Bart.VanAssche@sandisk.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-07 17:07:16 -04:00
Fam Zheng 6780414519 scsi: sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
If device reports a small max_xfer_blocks and a zero opt_xfer_blocks, we
end up using BLK_DEF_MAX_SECTORS, which is wrong and r/w of that size
may get error.

[mkp: tweaked to avoid setting rw_max twice and added typecast]

Cc: <stable@vger.kernel.org> # v4.4+
Fixes: ca369d51b3 ("block/sd: Fix device-imposed transfer length limits")
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-07 17:05:15 -04:00
Hannes Reinecke e8f8d50e07 scsi: sd: Return SUCCESS in sd_eh_action() after device offline
If sd_eh_action() decides to take the device offline there is
no point in returning FAILED, as taking the device offline
is the ultimate step in SCSI EH anyway.
So further escalation via SCSI EH is not likely to make a
difference and we can as well return SUCCESS.

Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 13:07:32 -04:00
Hannes Reinecke 7a38dc0bfb scsi: scsi_error: count medium access timeout only once per EH run
The current medium access timeout counter will be increased for
each command, so if there are enough failed commands we'll hit
the medium access timeout for even a single device failure and
the following kernel message is displayed:

sd H:C:T:L: [sdXY] Medium access timeout failure. Offlining disk!

Fix this by making the timeout per EH run, ie the counter will
only be increased once per device and EH run.

Fixes: 18a4d0a ("[SCSI] Handle disk devices which can not process medium access commands")
Cc: Ewan Milne <emilne@redhat.com>
Cc: Lawrence Obermann <loberman@redhat.com>
Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Cc: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 13:07:32 -04:00
Linus Torvalds 95422dec6b SCSI fixes on 20170315
This is a rather large set of fixes.  The bulk are for lpfc correcting
 a lot of issues in the new NVME driver code which just went in in the
 merge window.  The others are: fix a hang in the vmware paravirt
 driver caused by incorrect handling of the new MSI vector allocation.
 A long standing bug in storvsc, which recent block changes turned from
 being a harmless annoyance into a hang and yet more fallout (in
 mpt3sas) from the changes to device blocking.  The remainder are small
 fixes and updates.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJYyXW+AAoJEAVr7HOZEZN4yzIP/R2qDzwoXw3dl5xHs2AQuxWU
 P7mbUqtN2meU0klIKNCaYq6WO797WvGblSldrPwB08M7QOg8vy7boNiCYmwdEs6W
 Ftihj99OwEp0598MldNw1hBtMqjJ1WSu/kLK/I3pjLjltyPbQunuvaEwSNvxD1r7
 8BRmngYLFPb3KQDSO74ILBdZ5DqsriIXDHlngKSKfrpjO5IWXHrmzhZTmU2qpE40
 H4C4+Kn3/iZKGRiMm4rsikRDEGty9aVo7f4CIrwWslLkaJWf/AS4Kf3rd5qqclRj
 crjP/axqXheuwggKaHNhnxX5oSGt61ZCH9lGMmSNV/q6pUtNsrhO2uZuBIP0vtDB
 y7Pfw0Z3MGnbsOmBhZLRnUYC9PgSZPysy3TuS6/BgGx2ZFPnFxXjA1T73YaUil9q
 zcCOV2E7WHjIFGrpF2jNmoGsh1360TUOjg8YXdLi8pC2sEqSU331nvB5W9xAw3EG
 ewusZFetumxV8pykSyuDknU5gAMVYhHv+I+eNP8SB0eUZTLeKlcZpWIt3QYSYRdJ
 KbCc4CSi28J+DghB86eRZC64QqFiYMP546zUPpD1Enh/HXtVrTYzRN9/Qwo2FJRZ
 s5E3aC6tPceSuOmqn4aL9+Il7NXDj1Y/M9Qwe6Dzjp7q8hLCb7J1TWY3G5ImBWI7
 TfdyBrvZerqpxga+6j5g
 =0mha
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a rather large set of fixes. The bulk are for lpfc correcting
  a lot of issues in the new NVME driver code which just went in in the
  merge window.

  The others are:

   - fix a hang in the vmware paravirt driver caused by incorrect
     handling of the new MSI vector allocation

   - long standing bug in storvsc, which recent block changes turned
     from being a harmless annoyance into a hang

   - yet more fallout (in mpt3sas) from the changes to device blocking

  The remainder are small fixes and updates"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (34 commits)
  scsi: lpfc: Add shutdown method for kexec
  scsi: storvsc: Workaround for virtual DVD SCSI version
  scsi: lpfc: revise version number to 11.2.0.10
  scsi: lpfc: code cleanups in NVME initiator discovery
  scsi: lpfc: code cleanups in NVME initiator base
  scsi: lpfc: correct rdp diag portnames
  scsi: lpfc: remove dead sli3 nvme code
  scsi: lpfc: correct double print
  scsi: lpfc: Rename LPFC_MAX_EQ_DELAY to LPFC_MAX_EQ_DELAY_EQID_CNT
  scsi: lpfc: Rework lpfc Kconfig for NVME options
  scsi: lpfc: add transport eh_timed_out reference
  scsi: lpfc: Fix eh_deadline setting for sli3 adapters.
  scsi: lpfc: add NVME exchange aborts
  scsi: lpfc: Fix nvme allocation bug on failed nvme_fc_register_localport
  scsi: lpfc: Fix IO submission if WQ is full
  scsi: lpfc: Fix NVME CMD IU byte swapped word 1 problem
  scsi: lpfc: Fix RCTL value on NVME LS request and response
  scsi: lpfc: Fix crash during Hardware error recovery on SLI3 adapters
  scsi: lpfc: fix missing spin_unlock on sql_list_lock
  scsi: lpfc: don't dereference dma_buf->iocbq before null check
  ...
2017-03-15 10:44:19 -07:00
Jan Kara c01228db4b Revert "scsi, block: fix duplicate bdi name registration crashes"
This reverts commit 0dba1314d4. It causes
leaking of device numbers for SCSI when SCSI registers multiple gendisks
for one request_queue in succession. It can be easily reproduced using
Omar's script [1] on kernel with CONFIG_DEBUG_TEST_DRIVER_REMOVE.
Furthermore the protection provided by this commit is not needed anymore
as the problem it was fixing got also fixed by commit 165a5e22fa
"block: Move bdi_unregister() to del_gendisk()".

[1]: http://marc.info/?l=linux-block&m=148554717109098&w=2

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-08 10:55:17 -07:00
Damien Le Moal c46f09175d scsi: sd: Check for unaligned partial completion
Commit <f2e767bb5d6e> ("mpt3sas: Force request partial completion
alignment") was not considering the case of commands not operating on
logical block size units (e.g. REQ_OP_ZONE_REPORT and its 64B aligned
partial replies). In this case, forcing alignment of resid to the device
logical block size can break the command result, e.g. in the case of
REQ_OP_ZONE_REPORT, the exact number of zone reported by the device.

Move the partial completion alignement check of mpt3sas to a generic
implementation in sd_done(). The check is added within the default
section of the initial req_op() switch case so that the report and reset
zone commands are ignored. In addition, as sd_done() is not called for
passthrough requests, resid corrections are not done as intended by the
initial mpt3sas patch.

Fixes: f2e767bb5d ("mpt3sas: Force request partial completion alignment")
Cc: <stable@vger.kernel.org> # v4.10
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-01 21:46:11 -05:00
Christoph Hellwig fcbfffe2c5 scsi: remove scsi_execute_req_flags
And switch all callers to use scsi_execute instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-23 16:57:36 -05:00
Christoph Hellwig 6fa2b8f9e3 scsi: sd: improve TUR handling in sd_check_events
Remove bogus evaluations of retval and sshdr when the device is offline,
and fix a possible NULL pointer dereference by allocating the 8 byte
sized sense header on stack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-22 19:34:17 -05:00
Colin Ian King f170396c14 scsi: fix memory leak of sdpk on when gd fails to allocate
On an allocation failure of gd, the current exit path is via
out_free_devt which leaves sdpk still allocated and hence it gets
leaked. Fix this by correcting the order of resource free'ing with a
change in the error exit path labels.

Detected by CoverityScan, CID#1399519 ("Resource Leak")

Fixes: 0dba1314d4 ("scsi, block: fix duplicate bdi name registration crashes")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-22 19:20:32 -05:00
Wei Yongjun 8bfcd1bf86 scsi: sd: make sd_devt_release() static
Fixes the following sparse warning:

drivers/scsi/sd.c:3087:6: warning:
 symbol 'sd_devt_release' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-22 19:13:54 -05:00
Linus Torvalds cdc194705d SCSI misc on 20170220
This update includes the usual round of major driver updates (ncr5380,
 ufs, lpfc, be2iscsi, hisi_sas, storvsc, cxlflash, aacraid,
 megaraid_sas, ).  There's also an assortment of minor fixes and the
 major update of switching a bunch of drivers to pci_alloc_irq_vectors
 from Christoph.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJYq5adAAoJEAVr7HOZEZN4bjUP/Atk7CSZVnC75pcYmncbEGCx
 ysOlEHK4uW2HhiAYk3PlYMk+pKrMHet2zsbbM9PHJfopdOHZ7Sq1+UZZVeqE1Zun
 8pe0NhON+fZx7XAnevdEvnSSULQZ+AGfjZO72iUwkJiN3ozYaFtCITOyn49l4GpR
 ra9emskBh7CQOFW2voGn1AKeDijPYGx3+TO4AUrWjVMiByR06gb1bmImx+ljiUrs
 jzRJPfrt90ORcTdpMateyN2EXxudcASMhX03SJ6fRI84hPAhMCROMbTv8RnzOTE4
 DPbnvbYUowlHt43iUhJHSwGdkRRaRBnkzQENBp1fNrNzZgF6vB7+kShxbonrYB2p
 gC4ewaJr0BNj+HsUnvTpe3WseiPOcfsnBsKilPLKBlm2dCKEXqFox/dj/T1uexxg
 HoyFrl3u8fyEqVHrzRS4M9t/njWh0NFmXxb0wBdj+lkVFTRErGSKQ8SfOqshuSGs
 P8NN88jy8vC7uqgzKBJ+UH3ehzn3qfBxasFHIC/e2awY9FqKjHGTxKMmSVpjXVxy
 wCvE2FQ3k/qEj2XSM6f7/NGytlSOlju5q1rFtHPW2M+TFSh0LJWCnmVjR/Zle9em
 pBWmtIgCv8W5b41zL2H94nLWAZbfdrrNU/XnX88l47LKnmorte/PGhpxu36NEsMS
 VCgreQmFMdMRY+WzDWl1
 =cBQx
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This update includes the usual round of major driver updates (ncr5380,
  ufs, lpfc, be2iscsi, hisi_sas, storvsc, cxlflash, aacraid,
  megaraid_sas, ...).

  There's also an assortment of minor fixes and the major update of
  switching a bunch of drivers to pci_alloc_irq_vectors from Christoph"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (188 commits)
  scsi: megaraid_sas: handle dma_addr_t right on 32-bit
  scsi: megaraid_sas: array overflow in megasas_dump_frame()
  scsi: snic: switch to pci_irq_alloc_vectors
  scsi: megaraid_sas: driver version upgrade
  scsi: megaraid_sas: Change RAID_1_10_RMW_CMDS to RAID_1_PEER_CMDS and set value to 2
  scsi: megaraid_sas: Indentation and smatch warning fixes
  scsi: megaraid_sas: Cleanup VD_EXT_DEBUG and SPAN_DEBUG related debug prints
  scsi: megaraid_sas: Increase internal command pool
  scsi: megaraid_sas: Use synchronize_irq to wait for IRQs to complete
  scsi: megaraid_sas: Bail out the driver load if ld_list_query fails
  scsi: megaraid_sas: Change build_mpt_mfi_pass_thru to return void
  scsi: megaraid_sas: During OCR, if get_ctrl_info fails do not continue with OCR
  scsi: megaraid_sas: Do not set fp_possible if TM capable for non-RW syspdIO, change fp_possible to bool
  scsi: megaraid_sas: Remove unused pd_index from megasas_build_ld_nonrw_fusion
  scsi: megaraid_sas: megasas_return_cmd does not memset IO frame to zero
  scsi: megaraid_sas: max_fw_cmds are decremented twice, remove duplicate
  scsi: megaraid_sas: update can_queue only if the new value is less
  scsi: megaraid_sas: Change max_cmd from u32 to u16 in all functions
  scsi: megaraid_sas: set pd_after_lb from MR_BuildRaidContext and initialize pDevHandle to MR_DEVHANDLE_INVALID
  scsi: megaraid_sas: latest controller OCR capability from FW before sending shutdown DCMD
  ...
2017-02-21 11:51:42 -08:00
Jens Axboe 818551e2b2 Merge branch 'for-4.11/next' into for-4.11/linus-merge
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-17 14:08:19 -07:00
Dan Williams 0dba1314d4 scsi, block: fix duplicate bdi name registration crashes
Warnings of the following form occur because scsi reuses a devt number
while the block layer still has it referenced as the name of the bdi
[1]:

 WARNING: CPU: 1 PID: 93 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
 sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:192'
 [..]
 Call Trace:
  dump_stack+0x86/0xc3
  __warn+0xcb/0xf0
  warn_slowpath_fmt+0x5f/0x80
  ? kernfs_path_from_node+0x4f/0x60
  sysfs_warn_dup+0x62/0x80
  sysfs_create_dir_ns+0x77/0x90
  kobject_add_internal+0xb2/0x350
  kobject_add+0x75/0xd0
  device_add+0x15a/0x650
  device_create_groups_vargs+0xe0/0xf0
  device_create_vargs+0x1c/0x20
  bdi_register+0x90/0x240
  ? lockdep_init_map+0x57/0x200
  bdi_register_owner+0x36/0x60
  device_add_disk+0x1bb/0x4e0
  ? __pm_runtime_use_autosuspend+0x5c/0x70
  sd_probe_async+0x10d/0x1c0
  async_run_entry_fn+0x39/0x170

This is a brute-force fix to pass the devt release information from
sd_probe() to the locations where we register the bdi,
device_add_disk(), and unregister the bdi, blk_cleanup_queue().

Thanks to Omar for the quick reproducer script [2]. This patch survives
where an unmodified kernel fails in a few seconds.

[1]: https://marc.info/?l=linux-scsi&m=147116857810716&w=4
[2]: http://marc.info/?l=linux-block&m=148554717109098&w=2

Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Reported-by: Omar Sandoval <osandov@osandov.com>
Tested-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02 08:23:19 -07:00
Christoph Hellwig 68b568c797 ѕd: remove pointless REQ_TYPE_FS check
->done can only be called for fs requests, so no need to check again here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31 14:00:04 -07:00
Christoph Hellwig 82ed4db499 block: split scsi_request out of struct request
And require all drivers that want to support BLOCK_PC to allocate it
as the first thing of their private data.  To support this the legacy
IDE and BSG code is switched to set cmd_size on their queues to let
the block layer allocate the additional space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27 15:08:35 -07:00
Bart Van Assche 08965c2eba Revert "sd: remove __data_len hack for WRITE SAME"
This patch reverts commit f80de881d8 and avoids that sending a
WRITE SAME command to the iSCSI initiator triggers the following:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
TARGET_CORE[iSCSI]: Expected Transfer Length: 260096 does not match SCSI CDB Length: 512 for SAM Opcode: 0x41
IP: iscsi_tcp_segment_done+0x20b/0x310 [libiscsi_tcp]

Oops: 0000 [#1] SMP
Modules linked in: target_core_user uio target_core_iblock target_core_file iscsi_target_mod target_core_mod netconsole configfs crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper virtio_console virtio_rng virtio_balloon serio_raw i2c_piix4 acpi_cpufreq button iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext4 jbd2 mbcache virtio_blk virtio_net psmouse floppy drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm drm virtio_pci
CPU: 2 PID: 5 Comm: kworker/u8:0 Not tainted 4.10.0-rc5-debug+ #3
Workqueue: iscsi_q_0 iscsi_xmitworker [libiscsi]
RIP: 0010:iscsi_tcp_segment_done+0x20b/0x310 [libiscsi_tcp]
Call Trace:
 iscsi_sw_tcp_xmit_segment+0x84/0x120 [iscsi_tcp]
 iscsi_sw_tcp_pdu_xmit+0x51/0x180 [iscsi_tcp]
 iscsi_tcp_task_xmit+0xb3/0x290 [libiscsi_tcp]
 iscsi_xmit_task+0x4e/0xc0 [libiscsi]
 iscsi_xmitworker+0x243/0x330 [libiscsi]
 process_one_work+0x1d8/0x4b0
 worker_thread+0x49/0x4a0
 kthread+0x102/0x140

Fixes: f80de881d8 ("sd: remove __data_len hack for WRITE SAME")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Jens Axboe <axboe@fb.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Chris Leech <cleech@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-26 10:01:20 -07:00
John Pittman f2a3313d65 scsi: sd: Cleaned up comment references to @sdp argument explanation.
In sd.c there are two comment references to 'struct scsi_device *sdp' as
an argument.  One of the references has a typo and the other should be a
reference to 'struct device *dev' instead.

Fixed by correcting the typo in the first and changing the explanation
in the second.

Signed-off-by: John Pittman <jpittman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-20 16:59:03 -05:00
Linus Torvalds f09ff1de63 SCSI fixes on 20170119
This is a set of 12 fixes including the mpt3sas one that was causing
 hangs on ATA passthrough.  The others are a couple of zoned block
 device fixes, a SAS device detection bug which lead to SATA drives not
 being matched to bays, two qla2xxx MSI fixes, a qla2xxx req for rsp
 confusion caused by cut and paste, and a few other minor fixes.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJYgY/0AAoJEAVr7HOZEZN4avAP/1WxoiNBoQV16CsAyvSlbf0A
 ZQ7cR0O/cWjI7ccoLUKHxbTJ57GzMTelfmrdRsVIbNeH74NGSP4+IxzshItSwKG0
 /hz0MkCqAxRs/pxliZW8DM12cEbKDIm/t4XWjDst1KBdqakluigmAIniWitKnLE3
 IeA0am4/KDf/n5tQm7CI+7GOT7g0QuRqAlfAM1MFXZGAzjw2RZcSQOVkPr6vW2I+
 bvDUzmWDHOZ/3g5CRrgA6pGC/1FBVbYPF5iOQicCUuMjpniR4IVvcwek5ULUkCYl
 5yZnx7dg/FEXcFawp3fTQ8SbxM6Dhkqkb5/Xe/E+wjjAOwF1I3lDwdV39nQk3aBK
 TwIFgeZY8CS3Z+7e9rtlRFUL2kFWwKy5uhFM/6qmY9cpXFYnPN5JsrChq/7Tgb9t
 XW3/4CoS2ijcNuvxGymvyJG2FpZsLaJfLfXw4kkEUtn0W3Dzbub2PrMfPUUn7hQO
 rmk3ZEuoSANvsZ9n647Gpvo7iQBbC7blNf8krSLJvoopN6SUUmLCsePXEHxorBxz
 r7xAkHFqapOjBn28u4JAn5po7aVa3/fQwBuUF6UvfuVJKbvnobl4Qx0O/Fqadd7s
 bix9OfCG1GMyEnUBLEWPpBg0jS2JZbLupeXVssiATks93KV2SefiQlEhYwA5hOS5
 6+PykxkIEjEmU7asjQJ0
 =8y3/
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a set of 12 fixes including the mpt3sas one that was causing
  hangs on ATA passthrough.

  The others are a couple of zoned block device fixes, a SAS device
  detection bug which lead to SATA drives not being matched to bays, two
  qla2xxx MSI fixes, a qla2xxx req for rsp confusion caused by cut and
  paste, and a few other minor fixes"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: mpt3sas: fix hang on ata passthrough commands
  scsi: lpfc: Set elsiocb contexts to NULL after freeing it
  scsi: sd: Ignore zoned field for host-managed devices
  scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type
  scsi: bfa: fix wrongly initialized variable in bfad_im_bsg_els_ct_request()
  scsi: ses: Fix SAS device detection in enclosure
  scsi: libfc: Fix variable name in fc_set_wwpn
  scsi: lpfc: avoid double free of resource identifiers
  scsi: qla2xxx: remove irq_affinity_notifier
  scsi: qla2xxx: fix MSI-X vector affinity
  scsi: qla2xxx: Fix apparent cut-n-paste error.
  scsi: qla2xxx: Get mutex lock before checking optrom_state
2017-01-20 11:47:18 -08:00
James Bottomley 9208b75e04 Merge remote-tracking branch 'mkp-scsi/fixes' into fixes 2017-01-17 17:32:54 -05:00
Damien Le Moal 68af412c77 scsi: sd: Ignore zoned field for host-managed devices
There is no good match of the zoned field of the block device
characteristics page for host-managed devices. For these devices, the
zoning model is derived directly from the device type. So ignore the
zoned field for these drives.

[mkp: typo]

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-17 14:06:22 -05:00
Damien Le Moal 26f2819772 scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type
Zoned block devices force the use of READ/WRITE(16) commands by setting
sdkp->use_16_for_rw and clearing sdkp->use_10_for_rw. This result in
DPOFUA always being disabled for these drives as the assumed use of
the deprecated READ/WRITE(6) commands only looks at sdkp->use_10_for_rw.
Strenghten the test by also checking that sdkp->use_16_for_rw is false.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-17 14:05:02 -05:00
Christoph Hellwig f80de881d8 sd: remove __data_len hack for WRITE SAME
Now that we have the blk_rq_payload_bytes helper available to determine
the actual I/O size we don't need to mess around with __data_len for
WRITE SAME.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-13 15:17:04 -07:00
Linus Torvalds 7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Linus Torvalds a829a8445f SCSI misc on 20161213
This update includes the usual round of major driver updates (ncr5380,
 lpfc, hisi_sas, megaraid_sas, ufs, ibmvscsis, mpt3sas).  There's also
 an assortment of minor fixes, mostly in error legs or other not very
 user visible stuff.  The major change is the pci_alloc_irq_vectors
 replacement for the old pci_msix_.. calls; this effectively makes IRQ
 mapping generic for the drivers and allows blk_mq to use the
 information.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJYUJOvAAoJEAVr7HOZEZN42B0P/1lj1W2N7y0LOAbR2MasyQvT
 fMD/SSip/v+R/zJiTv+5M/IDQT6ez62JnQGWyX3HZTob9VEfoqagbUuHH6y+fmib
 tHyqiYc9FC/NZSRX/0ib+rpnSVhC/YRSVV7RrAqilbpOKAzeU25FlN/vbz+Nv/XL
 ReVBl+2nGjJtHyWqUN45Zuf74c0zXOWPPUD0hRaNclK5CfZv5wDYupzHzTNSQTkj
 nWvwPYT0OgSMNe7mR+IDFyOe3UQ/OYyeJB0yBNqO63IiaUabT8/hgrWR9qJAvWh8
 LiH+iSQ69+sDUnvWvFjuls/GzcFuuTljwJbm+FyTsmNHONPVY8JRCLNq7CNDJ6Vx
 HwpNuJdTSJpne4lAVBGPwgjs+GhlMvUP/xYVLWAXdaBxU9XGePrwqQDcFu1Rbx3P
 yfMiVaY1+e45OEjLRCbDAwFnMPevC3kyymIvSsTySJxhTbYrOsyrrWt5kwWsvE3r
 SKANsub+xUnpCkyg57nXRQStJSCiSfGIDsydKmMX+pf1SR4k6gCUQZlcchUX0uOa
 dcY6re0c7EJIQQiT7qeGP5TRBblxARocCA/Igx6b5U5HmuQ48tDFlMCps7/TE84V
 JBsBnmkXcEi/ALShL/Tui+3YKA1DfOtEnwHtXx/9Ecx/nxP2Sjr9LJwCKiONv8NY
 RgLpGfccrix34lQumOk5
 =sPXh
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This update includes the usual round of major driver updates (ncr5380,
  lpfc, hisi_sas, megaraid_sas, ufs, ibmvscsis, mpt3sas).

  There's also an assortment of minor fixes, mostly in error legs or
  other not very user visible stuff. The major change is the
  pci_alloc_irq_vectors replacement for the old pci_msix_.. calls; this
  effectively makes IRQ mapping generic for the drivers and allows
  blk_mq to use the information"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (256 commits)
  scsi: qla4xxx: switch to pci_alloc_irq_vectors
  scsi: hisi_sas: support deferred probe for v2 hw
  scsi: megaraid_sas: switch to pci_alloc_irq_vectors
  scsi: scsi_devinfo: remove synchronous ALUA for NETAPP devices
  scsi: be2iscsi: set errno on error path
  scsi: be2iscsi: set errno on error path
  scsi: hpsa: fallback to use legacy REPORT PHYS command
  scsi: scsi_dh_alua: Fix RCU annotations
  scsi: hpsa: use %phN for short hex dumps
  scsi: hisi_sas: fix free'ing in probe and remove
  scsi: isci: switch to pci_alloc_irq_vectors
  scsi: ipr: Fix runaway IRQs when falling back from MSI to LSI
  scsi: dpt_i2o: double free on error path
  scsi: cxlflash: Migrate scsi command pointer to AFU command
  scsi: cxlflash: Migrate IOARRIN specific routines to function pointers
  scsi: cxlflash: Cleanup queuecommand()
  scsi: cxlflash: Cleanup send_tmf()
  scsi: cxlflash: Remove AFU command lock
  scsi: cxlflash: Wait for active AFU commands to timeout upon tear down
  scsi: cxlflash: Remove private command pool
  ...
2016-12-14 10:49:33 -08:00
Christoph Hellwig f9d03f96b9 block: improve handling of the magic discard payload
Instead of allocating a single unused biovec for discard requests, send
them down without any payload.  Instead we allow the driver to add a
"special" payload using a biovec embedded into struct request (unioned
over other fields never used while in the driver), and overloading
the number of segments for this case.

This has a couple of advantages:

 - we don't have to allocate the bio_vec
 - the amount of special casing for discard requests in the block
   layer is significantly reduced
 - using this same scheme for other request types is trivial,
   which will be important for implementing the new WRITE_ZEROES
   op on devices where it actually requires a payload (e.g. SCSI)
 - we can get rid of playing games with the request length, as
   we'll never touch it and completions will work just fine
 - it will allow us to support ranged discard operations in the
   future by merging non-contiguous discard bios into a single
   request
 - last but not least it removes a lot of code

This patch is the common base for my WIP series for ranges discards and to
remove discard_zeroes_data in favor of always using REQ_OP_WRITE_ZEROES,
so it would be good to get it in quickly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-09 08:30:51 -07:00
Andy Shevchenko df441cc030 scsi: replace custom approach to hexdump small buffers
In kernel we have defined specifier (%*ph[C]) to dump small buffers in a
hex format. Replace custom approach by a generic one.

Cc: Jon Mason <jonmason@broadcom.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-08 17:29:57 -05:00
Christoph Hellwig ef295ecf09 block: better op and flags encoding
Now that we don't need the common flags to overflow outside the range
of a 32-bit type we can encode them the same way for both the bio and
request fields.  This in addition allows us to place the operation
first (and make some room for more ops while we're at it) and to
stop having to shift around the operation values.

In addition this allows passing around only one value in the block layer
instead of two (and eventuall also in the file systems, but we can do
that later) and thus clean up a lot of code.

Last but not least this allows decreasing the size of the cmd_flags
field in struct request to 32-bits.  Various functions passing this
value could also be updated, but I'd like to avoid the churn for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-28 08:48:16 -06:00
Christoph Hellwig e806402130 block: split out request-only flags into a new namespace
A lot of the REQ_* flags are only used on struct requests, and only of
use to the block layer and a few drivers that dig into struct request
internals.

This patch adds a new req_flags_t rq_flags field to struct request for
them, and thus dramatically shrinks the number of common requests.  It
also removes the unfortunate situation where we have to fit the fields
from the same enum into 32 bits for struct bio and 64 bits for
struct request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-28 08:45:17 -06:00
Hannes Reinecke 89d9475610 sd: Implement support for ZBC devices
Implement ZBC support functions to setup zoned disks, both
host-managed and host-aware models. Only zoned disks that satisfy
the following conditions are supported:
1) All zones are the same size, with the exception of an eventual
   last smaller runt zone.
2) For host-managed disks, reads are unrestricted (reads are not
   failed due to zone or write pointer alignement constraints).
Zoned disks that do not satisfy these 2 conditions are setup with
a capacity of 0 to prevent their use.

The function sd_zbc_read_zones, called from sd_revalidate_disk,
checks that the device satisfies the above two constraints. This
function may also change the disk capacity previously set by
sd_read_capacity for devices reporting only the capacity of
conventional zones at the beginning of the LBA range (i.e. devices
reporting rc_basis set to 0).

The capacity message output was moved out of sd_read_capacity into
a new function sd_print_capacity to include this eventual capacity
change by sd_zbc_read_zones. This new function also includes a call
to sd_zbc_print_zones to display the number of zones and zone size
of the device.

Signed-off-by: Hannes Reinecke <hare@suse.de>

[Damien: * Removed zone cache support
         * Removed mapping of discard to reset write pointer command
         * Modified sd_zbc_read_zones to include checks that the
           device satisfies the kernel constraints
         * Implemeted REPORT ZONES setup and post-processing based
           on code from Shaun Tancheff <shaun.tancheff@seagate.com>
         * Removed confusing use of 512B sector units in functions
           interface]
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-18 19:49:11 -06:00
Christoph Hellwig 8475c81185 scsi: sd: Move DIF protection types to t10-pi.h
These should go together with the rest of the T10 protection information
defintions.

[mkp: s/T10_DIF/T10_PI/]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-09-15 09:51:14 -04:00
Linus Torvalds f7e6816994 - initially based on Jens' 'for-4.8/core' (given all the flag churn) and
later merged with 'for-4.8/core' to pickup the QUEUE_FLAG_DAX commits
   that DM depends on to provide its DAX support
 
 - clean up the bio-based vs request-based DM core code by moving the
   request-based DM core code out to dm-rq.[hc]
 
 - reinstate bio-based support in the DM multipath target (done with the
   idea that fast storage like NVMe over Fabrics could benefit) -- while
   preserving support for request_fn and blk-mq request-based DM mpath
 
 - SCSI and DM multipath persistent reservation fixes that were
   coordinated with Martin Petersen.
 
 - the DM raid target saw the most extensive change this cycle; it now
   provides reshape and takeover support (by layering ontop of the
   corresponding MD capabilities)
 
 - DAX support for DM core and the linear, stripe and error targets
 
 - A DM thin-provisioning block discard vs allocation race fix that
   addresses potential for corruption
 
 - A stable fix for DM verity-fec's block calculation during decode
 
 - A few cleanups and fixes to DM core and various targets
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXkRZmAAoJEMUj8QotnQNat2wH/i4LpkoGI5tI6UhyKWxRkzJp
 vKaJ0zuZ2Ez73DucJujNuvaiyHq1IjHD5pfr8JQO3E8ygDkRC2KjF2O8EXp0Has6
 U1uLahQej72MAs0ZJTpvfE+JiY6qyIl4K+xxuPmYm2f2S5TWTIgOetYjJQmcMlQo
 Y8zFfcDYn4Dv5rMdvDT4+1ePETxq74wcBwTxyW3OAbHE1f0JjsUGdMKzXB1iTWcM
 VjLjWI//ETfFdIlDO0w2Qbd90aLUjmTR2k67RGnbPj5kNUNikv/X6iiY32KERR/0
 vMiiJ7JS+a44P7FJqCMoAVM/oBYFiSNpS4LYevOgHb0G0ikF8kaSeqBPC6sMYvg=
 =uYt9
 -----END PGP SIGNATURE-----

Merge tag 'dm-4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - initially based on Jens' 'for-4.8/core' (given all the flag churn)
   and later merged with 'for-4.8/core' to pickup the QUEUE_FLAG_DAX
   commits that DM depends on to provide its DAX support

 - clean up the bio-based vs request-based DM core code by moving the
   request-based DM core code out to dm-rq.[hc]

 - reinstate bio-based support in the DM multipath target (done with the
   idea that fast storage like NVMe over Fabrics could benefit) -- while
   preserving support for request_fn and blk-mq request-based DM mpath

 - SCSI and DM multipath persistent reservation fixes that were
   coordinated with Martin Petersen.

 - the DM raid target saw the most extensive change this cycle; it now
   provides reshape and takeover support (by layering ontop of the
   corresponding MD capabilities)

 - DAX support for DM core and the linear, stripe and error targets

 - a DM thin-provisioning block discard vs allocation race fix that
   addresses potential for corruption

 - a stable fix for DM verity-fec's block calculation during decode

 - a few cleanups and fixes to DM core and various targets

* tag 'dm-4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (73 commits)
  dm: allow bio-based table to be upgraded to bio-based with DAX support
  dm snap: add fake origin_direct_access
  dm stripe: add DAX support
  dm error: add DAX support
  dm linear: add DAX support
  dm: add infrastructure for DAX support
  dm thin: fix a race condition between discarding and provisioning a block
  dm btree: fix a bug in dm_btree_find_next_single()
  dm raid: fix random optimal_io_size for raid0
  dm raid: address checkpatch.pl complaints
  dm: call PR reserve/unreserve on each underlying device
  sd: don't use the ALL_TG_PT bit for reservations
  dm: fix second blk_delay_queue() parameter to be in msec units not jiffies
  dm raid: change logical functions to actually return bool
  dm raid: use rdev_for_each in status
  dm raid: use rs->raid_disks to avoid memory leaks on free
  dm raid: support delta_disks for raid1, fix table output
  dm raid: enhance reshape check and factor out reshape setup
  dm raid: allow resize during recovery
  dm raid: fix rs_is_recovering() to allow for lvextend
  ...
2016-07-26 17:12:11 -07:00
Linus Torvalds 3fc9d69093 Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
 "This branch also contains core changes.  I've come to the conclusion
  that from 4.9 and forward, I'll be doing just a single branch.  We
  often have dependencies between core and drivers, and it's hard to
  always split them up appropriately without pulling core into drivers
  when that happens.

  That said, this contains:

   - separate secure erase type for the core block layer, from
     Christoph.

   - set of discard fixes, from Christoph.

   - bio shrinking fixes from Christoph, as a followup up to the
     op/flags change in the core branch.

   - map and append request fixes from Christoph.

   - NVMeF (NVMe over Fabrics) code from Christoph.  This is pretty
     exciting!

   - nvme-loop fixes from Arnd.

   - removal of ->driverfs_dev from Dan, after providing a
     device_add_disk() helper.

   - bcache fixes from Bhaktipriya and Yijing.

   - cdrom subchannel read fix from Vchannaiah.

   - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.

   - set of drbd updates and fixes from Fabian, Lars, and Philipp.

   - mg_disk error path fix from Bart.

   - user notification for failed device add for loop, from Minfei.

   - NVMe in general:
        + NVMe delay quirk from Guilherme.
        + SR-IOV support and command retry limits from Keith.
        + fix for memory-less NUMA node from Masayoshi.
        + use UINT_MAX for discard sectors, from Minfei.
        + cancel IO fixes from Ming.
        + don't allocate unused major, from Neil.
        + error code fixup from Dan.
        + use constants for PSDT/FUSE from James.
        + variable init fix from Jay.
        + fabrics fixes from Ming, Sagi, and Wei.
        + various fixes"

* 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
  nvme/pci: Provide SR-IOV support
  nvme: initialize variable before logical OR'ing it
  block: unexport various bio mapping helpers
  scsi/osd: open code blk_make_request
  target: stop using blk_make_request
  block: simplify and export blk_rq_append_bio
  block: ensure bios return from blk_get_request are properly initialized
  virtio_blk: use blk_rq_map_kern
  memstick: don't allow REQ_TYPE_BLOCK_PC requests
  block: shrink bio size again
  block: simplify and cleanup bvec pool handling
  block: get rid of bio_rw and READA
  block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
  block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
  NVMe: don't allocate unused nvme_major
  nvme: avoid crashes when node 0 is memoryless node.
  nvme: Limit command retries
  loop: Make user notify for adding loop device failed
  nvme-loop: fix nvme-loop Kconfig dependencies
  nvmet: fix return value check in nvmet_subsys_alloc()
  ...
2016-07-26 15:37:51 -07:00
Linus Torvalds d05d7f4079 Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:

   - the big change is the cleanup from Mike Christie, cleaning up our
     uses of command types and modified flags.  This is what will throw
     some merge conflicts

   - regression fix for the above for btrfs, from Vincent

   - following up to the above, better packing of struct request from
     Christoph

   - a 2038 fix for blktrace from Arnd

   - a few trivial/spelling fixes from Bart Van Assche

   - a front merge check fix from Damien, which could cause issues on
     SMR drives

   - Atari partition fix from Gabriel

   - convert cfq to highres timers, since jiffies isn't granular enough
     for some devices these days.  From Jan and Jeff

   - CFQ priority boost fix idle classes, from me

   - cleanup series from Ming, improving our bio/bvec iteration

   - a direct issue fix for blk-mq from Omar

   - fix for plug merging not involving the IO scheduler, like we do for
     other types of merges.  From Tahsin

   - expose DAX type internally and through sysfs.  From Toshi and Yigal

* 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
  block: Fix front merge check
  block: do not merge requests without consulting with io scheduler
  block: Fix spelling in a source code comment
  block: expose QUEUE_FLAG_DAX in sysfs
  block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
  Btrfs: fix comparison in __btrfs_map_block()
  block: atari: Return early for unsupported sector size
  Doc: block: Fix a typo in queue-sysfs.txt
  cfq-iosched: Charge at least 1 jiffie instead of 1 ns
  cfq-iosched: Fix regression in bonnie++ rewrite performance
  cfq-iosched: Convert slice_resid from u64 to s64
  block: Convert fifo_time from ulong to u64
  blktrace: avoid using timespec
  block/blk-cgroup.c: Declare local symbols static
  block/bio-integrity.c: Add #include "blk.h"
  block/partition-generic.c: Remove a set-but-not-used variable
  block: bio: kill BIO_MAX_SIZE
  cfq-iosched: temporarily boost queue priority for idle classes
  block: drbd: avoid to use BIO_MAX_SIZE
  block: bio: remove BIO_MAX_SECTORS
  ...
2016-07-26 15:03:07 -07:00
Christoph Hellwig 01f90dd9e0 sd: don't use the ALL_TG_PT bit for reservations
These only work if the we use the same initiator ID for all path,
which might not be true if we use different protocols, or even just
different HBAs.

Instead dm-mpath will grow support to register all path manually
later in this series.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-18 15:37:35 -04:00
Dan Williams 0d52c756a6 block: convert to device_add_disk()
For block drivers that specify a parent device, convert them to use
device_add_disk().

This conversion was done with the following semantic patch:

    @@
    struct gendisk *disk;
    expression E;
    @@

    - disk->driverfs_dev = E;
    ...
    - add_disk(disk);
    + device_add_disk(E, disk);

    @@
    struct gendisk *disk;
    expression E1, E2;
    @@

    - disk->driverfs_dev = E1;
    ...
    E2 = disk;
    ...
    - add_disk(E2);
    + device_add_disk(E1, E2);

...plus some manual fixups for a few missed conversions.

Cc: Jens Axboe <axboe@fb.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-06-27 12:26:08 -07:00
Mike Christie 3a5e02ced1 block, drivers: add REQ_OP_FLUSH operation
This adds a REQ_OP_FLUSH operation that is sent to request_fn
based drivers by the block layer's flush code, instead of
sending requests with the request->cmd_flags REQ_FLUSH bit set.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Mike Christie 4e1b2d52a8 block, fs, drivers: remove REQ_OP compat defs and related code
This patch drops the compat definition of req_op where it matches
the rq_flag_bits definitions, and drops the related old and compat
code that allowed users to set either the op or flags for the operation.

We also then store the operation in the bi_rw/cmd_flags field similar
to how we used to store the bio ioprio where it sat in the upper bits
of the field.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Mike Christie c2df40dfb8 drivers: use req op accessor
The req operation REQ_OP is separated from the rq_flag_bits
definition. This converts the block layer drivers to
use req_op to get the op from the request struct.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Martin K. Petersen 6b7e9cde49 sd: Fix rw_max for devices that report an optimal xfer size
For historic reasons, io_opt is in bytes and max_sectors in block layer
sectors. This interface inconsistency is error prone and should be
fixed. But for 4.4--4.7 let's make the unit difference explicit via a
wrapper function.

Fixes: d0eb20a863 ("sd: Optimal I/O size is in bytes, not sectors")
Cc: stable@vger.kernel.org # 4.4+
Reported-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Andrew Patterson <andrew.patterson@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-06-01 22:07:47 -04:00
Hannes Reinecke eb72d0bb84 sd: get disk reference in sd_check_events()
sd_check_events() is called asynchronously, and might race
with device removal. So always take a disk reference when
processing the event to avoid the device being removed while
the event is processed.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-05-22 14:53:06 -04:00
Jens Axboe eb310e23db sd: switch to using blk_queue_write_cache()
Switch to the newer interface, instead of using blk_queue_flush()
directly.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-04-12 16:00:39 -06:00
Ming Lin 37e58237a1 block: add offset in blk_add_request_payload()
We could kmalloc() the payload, so need the offset in page.

Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-12 13:13:23 -06:00
Linus Torvalds fb41b4be00 SCSI fixes on 20160408
This is a set of 8 fixes.  Two are trivial gcc-6 updates (brace
 additions and unused variable removal).  There's a couple of cxlflash
 regressions, a correction for sd being overly chatty on revalidation
 (causing excess log increases).  A VPD issue which could crash USB
 devices because they seem very intolerant to VPD inquiries, an ALUA
 deadlock fix and a mpt3sas buffer overrun fix.
 
 Signed-off-by: James Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJXCD7VAAoJEDeqqVYsXL0MFKEH/ixQH8FSz7FYdunmkp4Q2sjT
 7gda6mXOeJN75zBSRDlV0U6wl0jK2B0iHh7ycRpCD72+XslMOOnji3I6Tmt/GU2C
 kkibxN/Iw95cduAOcd04/XqkBMbvjBDeIii/s3xixju3tIR6b4WTGcAHK6zmnWVE
 zDvIVKswhcGesBWBtNw0BvvG0RLujIst3tnLT81MmqYMNlwydHXkhm1OAB4w7Xl1
 2m7gnLhqPCw0HPBWQ9w6j/eGqOc5+YS6mAj/mAUc6qLTbqA1TSGqmD4NfqsqR+MU
 F8bIgESbYBZ3kj//zWBdHkGp6iVsxUhXsE1F62EHD4DOZEtFzkeuMxRmMu5xqmc=
 =v4SO
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a set of eight fixes.

  Two are trivial gcc-6 updates (brace additions and unused variable
  removal).  There's a couple of cxlflash regressions, a correction for
  sd being overly chatty on revalidation (causing excess log increases).
  A VPD issue which could crash USB devices because they seem very
  intolerant to VPD inquiries, an ALUA deadlock fix and a mpt3sas buffer
  overrun fix"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: Do not attach VPD to devices that don't support it
  sd: Fix excessive capacity printing on devices with blocks bigger than 512 bytes
  scsi_dh_alua: Fix a recently introduced deadlock
  scsi: Declare local symbols static
  cxlflash: Move to exponential back-off when cmd_room is not available
  cxlflash: Fix regression issue with re-ordering patch
  mpt3sas: Don't overreach ioc->reply_post[] during initialization
  aacraid: add missing curly braces
2016-04-09 12:00:42 -07:00
Hannes Reinecke 5ddfe0858e scsi: Do not attach VPD to devices that don't support it
The patch "scsi: rescan VPD attributes" introduced a regression in which
devices that don't support VPD were being scanned for VPD attributes
anyway.  This could cause issues for some devices and should be avoided
so the check for scsi_level has been moved out of scsi_add_lun and into
scsi_attach_vpd so that all callers will not scan VPD for devices that
don't support it.

[mkp: Merge fix]

Fixes: 09e2b0b146 ("scsi: rescan VPD attributes")
Cc: <stable@vger.kernel.org> #v4.5+
Suggested-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-04-05 06:56:40 -04:00
Kirill A. Shutemov 09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Martin K. Petersen f08bb1e0db sd: Fix excessive capacity printing on devices with blocks bigger than 512 bytes
During revalidate we check whether device capacity has changed before we
decide whether to output disk information or not.

The check for old capacity failed to take into account that we scaled
sdkp->capacity based on the reported logical block size. And therefore
the capacity test would always fail for devices with sectors bigger than
512 bytes and we would print several copies of the same discovery
information.

Avoid scaling sdkp->capacity and instead adjust the value on the fly
when setting the block device capacity and generating fake C/H/S
geometry.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <stable@vger.kernel.org>
Reported-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-03-31 20:59:08 -04:00
Martin K. Petersen f4327a95dd sd: Fix discard granularity when LBPRZ=1
Commit 397737223c ("sd: Make discard granularity match logical block
size when LBPRZ=1") accidentally set the granularity to one byte instead
of one logical block on devices that provide deterministic zeroes after
UNMAP.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Fixes: 397737223c
Cc: <stable@vger.kernel.org> #v4.4+
2016-03-14 15:50:06 -04:00
Martin K. Petersen 6540a65da9 sd: Fix discard granularity when LBPRZ=1
Commit 397737223c ("sd: Make discard granularity match logical block
size when LBPRZ=1") accidentally set the granularity to one byte instead
of one logical block on devices that provide deterministic zeroes after
UNMAP.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Fixes: 397737223c
Cc: <stable@vger.kernel.org> #v4.4+
2016-03-08 20:58:20 -05:00
James Bottomley 12ffbbe94d Merge remote-tracking branch 'mkp-scsi/4.5/scsi-fixes' into fixes 2016-02-04 21:37:52 -08:00
Martin K. Petersen 0fb5b1fb30 block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled
When a storage device rejects a WRITE SAME command we will disable write
same functionality for the device and return -EREMOTEIO to the block
layer. -EREMOTEIO will in turn prevent DM from retrying the I/O and/or
failing the path.

Yiwen Jiang discovered a small race where WRITE SAME requests issued
simultaneously would cause -EIO to be returned. This happened because
any requests being prepared after WRITE SAME had been disabled for the
device caused us to return BLKPREP_KILL. The latter caused the block
layer to return -EIO upon completion.

To overcome this we introduce BLKPREP_INVALID which indicates that this
is an invalid request for the device. blk_peek_request() is modified to
return -EREMOTEIO in that case.

Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Suggested-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-02-04 22:42:58 -05:00
James Bottomley 6344a5cd3e Merge remote-tracking branch 'mkp-scsi/4.5/scsi-fixes' into fixes 2016-01-26 17:44:42 -08:00
Alan Stern 13b4389143 SCSI: fix crashes in sd and sr runtime PM
Runtime suspend during driver probe and removal can cause problems.
The driver's runtime_suspend or runtime_resume callbacks may invoked
before the driver has finished binding to the device or after the
driver has unbound from the device.

This problem shows up with the sd and sr drivers, and can cause disk
or CD/DVD drives to become unusable as a result.  The fix is simple.
The drivers store a pointer to the scsi_disk or scsi_cd structure as
their private device data when probing is finished, so we simply have
to be sure to clear the private data during removal and test it during
runtime suspend/resume.

This fixes <https://bugs.debian.org/801925>.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Paul Menzel <paul.menzel@giantmonkey.de>
Reported-by: Erich Schubert <erich@debian.org>
Reported-by: Alexandre Rossi <alexandre.rossi@gmail.com>
Tested-by: Paul Menzel <paul.menzel@giantmonkey.de>
Tested-by: Erich Schubert <erich@debian.org>
CC: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2016-01-26 17:24:16 -08:00
Martin K. Petersen d0eb20a863 sd: Optimal I/O size is in bytes, not sectors
Commit ca369d51b3 ("block/sd: Fix device-imposed transfer length
limits") accidentally switched optimal I/O size reporting from bytes to
block layer sectors.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: ca369d51b3
Cc: stable@vger.kernel.org # 4.4+
Reviewed-by: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
2016-01-20 19:51:34 -05:00
Martin K. Petersen 9c1d9c207b sd: Reject optimal transfer length smaller than page size
Eryu Guan reported that loading scsi_debug would fail. This turned out
to be caused by scsi_debug reporting an optimal I/O size of 32KB which
is smaller than the 64KB page size on the PowerPC system in question.

Add a check to ensure that we only use the device-reported OPTIMAL
TRANSFER LENGTH if it is bigger than or equal to the page cache size.

Reported-by: Eryu Guan <guaneryu@gmail.com>
Reported-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-12-21 21:37:18 -05:00
James Bottomley be9e2f775f Merge branch 'mkp-fixes' into fixes 2015-12-03 09:32:33 -08:00
Martin K. Petersen ca369d51b3 block/sd: Fix device-imposed transfer length limits
Commit 4f258a4634 ("sd: Fix maximum I/O size for BLOCK_PC requests")
had the unfortunate side-effect of removing an implicit clamp to
BLK_DEF_MAX_SECTORS for REQ_TYPE_FS requests in the block layer
code. This caused problems for some SMR drives.

Debugging this issue revealed a few problems with the existing
infrastructure since the block layer didn't know how to deal with
device-imposed limits, only limits set by the I/O controller.

 - Introduce a new queue limit, max_dev_sectors, which is used by the
   ULD to signal the maximum sectors for a REQ_TYPE_FS request.

 - Ensure that max_dev_sectors is correctly stacked and taken into
   account when overriding max_sectors through sysfs.

 - Rework sd_read_block_limits() so it saves the max_xfer and opt_xfer
   values for later processing.

 - In sd_revalidate() set the queue's max_dev_sectors based on the
   MAXIMUM TRANSFER LENGTH value in the Block Limits VPD. If this value
   is not reported, fall back to a cap based on the CDB TRANSFER LENGTH
   field size.

 - In sd_revalidate(), use OPTIMAL TRANSFER LENGTH from the Block Limits
   VPD--if reported and sane--to signal the preferred device transfer
   size for FS requests. Otherwise use BLK_DEF_MAX_SECTORS.

 - blk_limits_max_hw_sectors() is no longer used and can be removed.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93581
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: sweeneygj@gmx.com
Tested-by: Arzeets <anatol.pomozov@gmail.com>
Tested-by: David Eisner <david.eisner@oriel.oxon.org>
Tested-by: Mario Kicherer <dev@kicherer.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-25 21:38:58 -05:00
Martin K. Petersen 397737223c sd: Make discard granularity match logical block size when LBPRZ=1
A device may report an OPTIMAL UNMAP GRANULARITY and UNMAP GRANULARITY
ALIGNMENT in the Block Limits VPD. These parameters describe the
device's internal provisioning allocation units. By default the block
layer will round and align any discard requests based on these limits.

If a device reports LBPRZ=1 to guarantee zeroes after discard, however,
it is imperative that the block layer does not leave out any parts of
the requested block range. Otherwise the device can not do the required
zeroing of any partial allocation units and this can lead to data
corruption.

Since the dm thinp personality relies on the block layer's current
behavior and is unable to deal with partial discard blocks we work
around the problem by setting the granularity to match the logical block
size when LBPRZ is enabled.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-25 21:38:34 -05:00
Linus Torvalds d83763f4a6 SCSI misc on 20151113
Sorry for the delay in this patch which was mostly caused by getting the
 merger of the mpt2/mpt3sas driver, which was seen as an essential item of
 maintenance work to do before the drivers diverge too much.  Unfortunately,
 this caused a compile failure (detected by linux-next), which then had to be
 fixed up and incubated.  In addition to the mpt2/3sas rework, there are
 updates from pm80xx, lpfc, bnx2fc, hpsa, ipr, aacraid, megaraid_sas, storvsc
 and ufs plus an assortment of changes including some year 2038 issues, a fix
 for a remove before detach issue in some drivers and a couple of other minor
 issues.
 
 Signed-off-by: James Bottomley <JBottomley@Odin.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJWRnpiAAoJEDeqqVYsXL0MJd0IAIkNP1q/6ksMw/lam2UlbdxN
 zFsFOIhGM3xLnBFehbCx7JW3cmybA7WC5jhMjaa1OeJmYHLpwbTzvVwrLs2l/0y1
 /S+G0wwxROZfIKj/1EO3oKbSPCu9N3lStxOnmNXt5PSUEzAXrTqSNTtnMf2ZGh7j
 bQxTWEJe+66GckgGw4ozTXJHWXqM/Zs/FsYjn2h/WzFhFv8utr7zRSzHMVjcqpLG
 TK/Lt03hiTGqiignKrV9W6JzdGTWf2LGIsj/njgR0dxv59cNH8PrHIjZyknSBxgn
 lblinsCjxDHWnP4BSfTi9MQG1lEiZiWO3Y6TKkKJTgxZ9M0Eitspc+cLOiJ1mpg=
 =HvQf
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull final round of SCSI updates from James Bottomley:
 "Sorry for the delay in this patch which was mostly caused by getting
  the merger of the mpt2/mpt3sas driver, which was seen as an essential
  item of maintenance work to do before the drivers diverge too much.
  Unfortunately, this caused a compile failure (detected by linux-next),
  which then had to be fixed up and incubated.

  In addition to the mpt2/3sas rework, there are updates from pm80xx,
  lpfc, bnx2fc, hpsa, ipr, aacraid, megaraid_sas, storvsc and ufs plus
  an assortment of changes including some year 2038 issues, a fix for a
  remove before detach issue in some drivers and a couple of other minor
  issues"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (141 commits)
  mpt3sas: fix inline markers on non inline function declarations
  sd: Clear PS bit before Mode Select.
  ibmvscsi: set max_lun to 32
  ibmvscsi: display default value for max_id, max_lun and max_channel.
  mptfusion: don't allow negative bytes in kbuf_alloc_2_sgl()
  scsi: pmcraid: replace struct timeval with ktime_get_real_seconds()
  mvumi: 64bit value for seconds_since1970
  be2iscsi: Fix bogus WARN_ON length check
  scsi_scan: don't dump trace when scsi_prep_async_scan() is called twice
  mpt3sas: Bump mpt3sas driver version to 09.102.00.00
  mpt3sas: Single driver module which supports both SAS 2.0 & SAS 3.0 HBAs
  mpt2sas, mpt3sas: Update the driver versions
  mpt3sas: setpci reset kernel oops fix
  mpt3sas: Added OEM Gen2 PnP ID branding names
  mpt3sas: Refcount fw_events and fix unsafe list usage
  mpt3sas: Refcount sas_device objects and fix unsafe list usage
  mpt3sas: sysfs attribute to report Backup Rail Monitor Status
  mpt3sas: Ported WarpDrive product SSS6200 support
  mpt3sas: fix for driver fails EEH, recovery from injected pci bus error
  mpt3sas: Manage MSI-X vectors according to HBA device type
  ...
2015-11-13 20:35:54 -08:00
Gabriel Krisman Bertazi 2c5d16d6a9 sd: Clear PS bit before Mode Select.
According to SPC-4, in a Mode Select, the PS bit in Mode Pages is
reserved and must be set to 0 by the driver.  In the sd implementation,
function cache_type_store does a Mode Sense, which might set the PS bit
on the read buffer, followed by a Mode Select, which receives the same
buffer, without explicitly clearing the PS bit.  So, in cases where
target supports saving the Mode Page to a non-volatile location, we end
up doing a Mode Select with the PS bit set, which could cause an illegal
request error if the target is checking this.

This was observed on a new firmware change, which was subsequently
reverted, but this changes sd.c to be more compliant with SPC-4.

This patch clears the PS bit in the buffer returned by Mode Select,
right before it is used in the Mode Select command.

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-11 21:10:26 -05:00
Linus Torvalds ccf21b69a8 Merge branch 'for-4.4/reservations' of git://git.kernel.dk/linux-block
Pull block reservation support from Jens Axboe:
 "This adds support for persistent reservations, both at the core level,
  as well as for sd and NVMe"

[ Background from the docs: "Persistent Reservations allow restricting
  access to block devices to specific initiators in a shared storage
  setup.  All implementations are expected to ensure the reservations
  survive a power loss and cover all connections in a multi path
  environment" ]

* 'for-4.4/reservations' of git://git.kernel.dk/linux-block:
  NVMe: Precedence error in nvme_pr_clear()
  nvme: add missing endianess annotations in nvme_pr_command
  NVMe: Add persistent reservation ops
  sd: implement the Persistent Reservation API
  block: add an API for Persistent Reservations
  block: cleanup blkdev_ioctl
2015-11-04 21:01:27 -08:00
Christoph Hellwig 924d55b063 sd: implement the Persistent Reservation API
This is a mostly trivial mapping to the PERSISTENT RESERVE IN/OUT
commands.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-21 14:46:58 -06:00
Dan Williams 9609b9942b md, dm, scsi, nvme, libnvdimm: drop blk_integrity_unregister() at shutdown
Now that the integrity profile is statically allocated there is no work
to do when shutting down an integrity enabled block device.

Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: James Bottomley <JBottomley@Odin.com>
Acked-by: NeilBrown <neilb@suse.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Acked-by: Vishal Verma <vishal.l.verma@intel.com>
Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-21 14:43:37 -06:00
Linus Torvalds 1081230b74 Merge branch 'for-4.3/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
 "This first core part of the block IO changes contains:

   - Cleanup of the bio IO error signaling from Christoph.  We used to
     rely on the uptodate bit and passing around of an error, now we
     store the error in the bio itself.

   - Improvement of the above from myself, by shrinking the bio size
     down again to fit in two cachelines on x86-64.

   - Revert of the max_hw_sectors cap removal from a revision again,
     from Jeff Moyer.  This caused performance regressions in various
     tests.  Reinstate the limit, bump it to a more reasonable size
     instead.

   - Make /sys/block/<dev>/queue/discard_max_bytes writeable, by me.
     Most devices have huge trim limits, which can cause nasty latencies
     when deleting files.  Enable the admin to configure the size down.
     We will look into having a more sane default instead of UINT_MAX
     sectors.

   - Improvement of the SGP gaps logic from Keith Busch.

   - Enable the block core to handle arbitrarily sized bios, which
     enables a nice simplification of bio_add_page() (which is an IO hot
     path).  From Kent.

   - Improvements to the partition io stats accounting, making it
     faster.  From Ming Lei.

   - Also from Ming Lei, a basic fixup for overflow of the sysfs pending
     file in blk-mq, as well as a fix for a blk-mq timeout race
     condition.

   - Ming Lin has been carrying Kents above mentioned patches forward
     for a while, and testing them.  Ming also did a few fixes around
     that.

   - Sasha Levin found and fixed a use-after-free problem introduced by
     the bio->bi_error changes from Christoph.

   - Small blk cgroup cleanup from Viresh Kumar"

* 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits)
  blk: Fix bio_io_vec index when checking bvec gaps
  block: Replace SG_GAPS with new queue limits mask
  block: bump BLK_DEF_MAX_SECTORS to 2560
  Revert "block: remove artifical max_hw_sectors cap"
  blk-mq: fix race between timeout and freeing request
  blk-mq: fix buffer overflow when reading sysfs file of 'pending'
  Documentation: update notes in biovecs about arbitrarily sized bios
  block: remove bio_get_nr_vecs()
  fs: use helper bio_add_page() instead of open coding on bi_io_vec
  block: kill merge_bvec_fn() completely
  md/raid5: get rid of bio_fits_rdev()
  md/raid5: split bio for chunk_aligned_read
  block: remove split code in blkdev_issue_{discard,write_same}
  btrfs: remove bio splitting and merge_bvec_fn() calls
  bcache: remove driver private bio splitting code
  block: simplify bio_add_page()
  block: make generic_make_request handle arbitrarily sized bios
  blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL)
  block: don't access bio->bi_error after bio_put()
  block: shrink struct bio down to 2 cache lines again
  ...
2015-09-02 13:10:25 -07:00
Martin K. Petersen 4f258a4634 sd: Fix maximum I/O size for BLOCK_PC requests
Commit bcdb247c6b ("sd: Limit transfer length") clamped the maximum
size of an I/O request to the MAXIMUM TRANSFER LENGTH field in the BLOCK
LIMITS VPD. This had the unfortunate effect of also limiting the maximum
size of non-filesystem requests sent to the device through sg/bsg.

Avoid using blk_queue_max_hw_sectors() and set the max_sectors queue
limit directly.

Also update the comment in blk_limits_max_hw_sectors() to clarify that
max_hw_sectors defines the limit for the I/O controller only.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Brian King <brking@linux.vnet.ibm.com>
Tested-by: Brian King <brking@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 3.17+
Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-08-12 11:54:37 -07:00
Jens Axboe 2bb4cd5cc4 block: have drivers use blk_queue_max_discard_sectors()
Some drivers use it now, others just set the limits field manually.
But in preparation for splitting this into a hard and soft limit,
ensure that they all call the proper function for setting the hw
limit for discards.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-17 08:41:53 -06:00
Dan Carpenter dee0586e15 sd: fix an error return in probe()
If device_add() fails then it should return the error code but instead
the current code returns success.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-05-25 08:46:24 -07:00
Mark Hounschell 74856fbf44 sd: Disable support for 256 byte/sector disks
256 bytes per sector support has been broken since 2.6.X,
and no-one stepped up to fix this.
So disable support for it.

Signed-off-by: Mark Hounschell <dmarkh@cfl.rr.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-05-18 11:26:37 -07:00
Martin K. Petersen e727c42bd5 sd: Unregister integrity profile
The new integrity code did not correctly unregister the profile for SD
disks. Call blk_integrity_unregister() when we release a disk.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Tested-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # v3.17+
Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-04-16 10:36:14 -07:00
James Bottomley b9f28d8635 sd, mmc, virtio_blk, string_helpers: fix block size units
The current string_get_size() overflows when the device size goes over
2^64 bytes because the string helper routine computes the suffix from
the size in bytes.  However, the entirety of Linux thinks in terms of
blocks, not bytes, so this will artificially induce an overflow on very
large devices.  Fix this by making the function string_get_size() take
blocks and the block size instead of bytes.  This should allow us to
keep working until the current SCSI standard overflows.

Also fix virtio_blk and mmc (both of which were also artificially
multiplying by the block size to pass a byte side to string_get_size()).

The mathematics of this is pretty simple:  we're taking a product of
size in blocks (S) and block size (B) and trying to re-express this in
exponential form: S*B = R*N^E (where N, the exponent is either 1000 or
1024) and R < N.  Mathematically, S = RS*N^ES and B=RB*N^EB, so if RS*RB
< N it's easy to see that S*B = RS*RB*N^(ES+EB).  However, if RS*BS > N,
we can see that this can be re-expressed as RS*BS = R*N (where R =
RS*BS/N < N) so the whole exponent becomes R*N^(ES+EB+1)

[jejb: fix incorrect 32 bit do_div spotted by kbuild test robot <fengguang.wu@intel.com>]
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-04-10 16:27:48 -07:00
Christoph Hellwig 3d9a1f530e sd: don't grab a device references from driver methods
The device model already takes care of races between ->remove and
->shutdown vs its other methods, and we now take care about locking
them out for ->rescan as well.

This is a partial revert of commit 39b7f1 ("[SCSI] sd: Fix refcounting").

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2015-03-19 06:40:53 -07:00
Linus Torvalds 540a7c5061 SCSI misc on 20150209
This is the usual grab bag of driver updates (hpsa, storvsc, mp2sas,
 megaraid_sas, ses) plus an assortment of minor updates.  There's also an
 update to ufs which adds new phy drivers and finally a new logging
 infrastructure for SCSI.
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJU2Ty5AAoJEDeqqVYsXL0M9rAH/1xNpAxXuxQq+dW5Z+uOaX60
 5RRIu7/xA1HEfzkT5FTHrolmogDjVqawu4PZS66iHDeo05RBVUlbTA8qCK+MlRcN
 U6s0cLEw59eH3EaCfOGuYp/MnbhuV0eNxe0btmqJIQwuW3+gwZKGJdOq6LS2YasJ
 k/DyIBVmkJAVsN56vm9q2vbtcZp+Bg+ngqBS+SC4TF7vV1WCtFmS6yaUf62PYW3D
 +Irx37qHZntDR5wdw3dsuKDi5U8bl6myPjaVLnVJqg/WIF9RlCkjk5xpWT99AmVO
 NmtYQxLLBlAQ5K+sIlBUwxZe+8q1l+Aj4TTmJHAfFtyfp25s7JR9I6/QtOyC5Kw=
 =odol
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull first round of SCSI updates from James Bottomley:
 "This is the usual grab bag of driver updates (hpsa, storvsc, mp2sas,
  megaraid_sas, ses) plus an assortment of minor updates.

  There's also an update to ufs which adds new phy drivers and finally a
  new logging infrastructure for SCSI"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (114 commits)
  scsi_logging: return void for dev_printk() functions
  scsi: print single-character strings with seq_putc
  scsi: merge consecutive seq_puts calls
  scsi: replace seq_printf with seq_puts
  aha152x: replace seq_printf with seq_puts
  advansys: replace seq_printf with seq_puts
  scsi: remove SPRINTF macro
  sg: remove an unused variable
  hpsa: Use local workqueues instead of system workqueues
  hpsa: add in P840ar controller model name
  hpsa: add in gen9 controller model names
  hpsa: detect and report failures changing controller transport modes
  hpsa: shorten the wait for the CISS doorbell mode change ack
  hpsa: refactor duplicated scan completion code into a new routine
  hpsa: move SG descriptor set-up out of hpsa_scatter_gather()
  hpsa: do not use function pointers in fast path command submission
  hpsa: print CDBs instead of kernel virtual addresses for uncommon errors
  hpsa: do not use a void pointer for scsi_cmd field of struct CommandList
  hpsa: return failed from device reset/abort handlers
  hpsa: check for ctlr lockup after command allocation in main io path
  ...
2015-02-11 10:28:45 -08:00
Brian King 3a9794d329 sd: Fix max transfer length for 4k disks
The following patch fixes an issue observed with 4k sector disks
where the max_hw_sectors attribute was getting set too large in
sd_revalidate_disk. Since sdkp->max_xfer_blocks is in units
of SCSI logical blocks and queue_max_hw_sectors is in units of
512 byte blocks, on a 4k sector disk, every time we went through
sd_revalidate_disk, we were taking the current value of
queue_max_hw_sectors and increasing it by a factor of 8. Fix
this by only shifting sdkp->max_xfer_blocks.

Cc: stable@vger.kernel.org
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-02-02 13:46:29 +01:00
Hannes Reinecke 2104551969 scsi: use per-cpu buffer for formatting sense
Convert sense buffer logging to use the per-cpu buffer to avoid line
breakup.

Tested-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-01-09 15:44:30 +01:00
Martin K. Petersen e461338b6c sd: tweak discard heuristics to work around QEMU SCSI issue
7985090aa0 changed the discard heuristics to give preference to the
WRITE SAME commands that (unlike UNMAP) guarantee deterministic results.

Ming Lei discovered that QEMU SCSI's WRITE SAME implementation
internally relied on limits that were only communicated for the UNMAP
case. And therefore discard commands backed by WRITE SAME would fail.

Tweak the heuristics so we still pick UNMAP in the LBPRZ=0 case and only
prefer the WRITE SAME variants if the device has the LBPRZ flag set.

Reported-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-12-30 13:30:38 +01:00
Hannes Reinecke eb846d9f14 scsi: rename SERVICE_ACTION_IN to SERVICE_ACTION_IN_16
SPC-3 defines SERVICE ACTION IN(12) and SERVICE ACTION IN(16).
So rename SERVICE_ACTION_IN to SERVICE_ACTION_IN_16 to be
consistent with SPC and to allow for better distinction.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Tested-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-24 20:01:40 +01:00
Christoph Hellwig 3af6b35261 scsi: remove scsi_driver owner field
The driver core driver structure has grown an owner field and now
requires it to be set for all modular drivers.  Set it up for
all scsi_driver instances and get rid of the now superflous
scsi_driver owner field.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Shane M Seymour <shane.seymour@hp.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-24 20:01:28 +01:00
Christoph Hellwig 3c356bde19 scsi: stop passing a gfp_mask argument down the command setup path
There is no reason for ULDs to pass in a flag on how to allocate the S/G
lists.  While we don't need GFP_ATOMIC for the blk-mq case because we
don't hold locks, that decision can be made way down the chain without
having to pass a pointless gfp_mask argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-24 19:56:55 +01:00
Martin K. Petersen 7985090aa0 sd: disable discard_zeroes_data for UNMAP
The T10 SBC UNMAP command does not provide any hard guarantees that
blocks will return zeroes on a subsequent READ. This is due to the fact
that the device server is free to silently ignore all or parts of the
request.

The only way to ensure that a block consistently returns zeroes after
being unmapped is to use WRITE SAME with the UNMAP bit set. Should the
device be unable to unmap one or more blocks described by the command it
is required to manually write zeroes to them.

Until now we have preferred UNMAP over the WRITE SAME variants to
accommodate thinly provisioned devices that predated the final SBC-3
spec. This patch changes the heuristic so that we favor WRITE SAME(16)
or (10) over UNMAP if these commands are marked as supported in the
Logical Block Provisioning VPD page.

The patch also disables discard_zeroes_data for devices operating in
UNMAP mode.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 11:19:14 +01:00
Christoph Hellwig 21a9d4c9d6 sd: fix up ->compat_ioctl
No need to verify the passthrough ioctls, the real handler will
take care of that.  Also make sure not to block for resets on
O_NONBLOCK fds.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-12 11:16:11 +01:00
Christoph Hellwig 906d15fbd2 scsi: split scsi_nonblockable_ioctl
The calling conventions for this function are bad as it could return
-ENODEV both for a device not currently online and a not recognized ioctl.

Add a new scsi_ioctl_block_when_processing_errors function that wraps
scsi_block_when_processing_errors with the a special case for the
SG_SCSI_RESET ioctl command, and handle the SG_SCSI_RESET case itself
in scsi_ioctl.  All callers of scsi_ioctl now must call the above helper
to check for the EH state, so that the ioctl handler itself doesn't
have to.

Reported-by: Robert Elliott <Elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-12 11:16:11 +01:00
Hannes Reinecke ef61329db7 scsi: remove scsi_show_result()
Open-code scsi_print_result in sd.c, and cleanup logging to
not print duplicate informations.
Also remove the call to scsi_show_result() in ufshcd.c
to be consistent with other callers of scsi_execute().

With that we can remove scsi_show_result in constants.c

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 11:16:05 +01:00
Hannes Reinecke d811b848eb scsi: use sdev as argument for sense code printing
We should be using the standard dev_printk() variants for
sense code printing.

[hch: remove __scsi_print_sense call in xen-scsiback, Acked by Juergen]
[hch: folded bracing fix from Dan Carpenter]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 11:15:58 +01:00
Hannes Reinecke ad3819c09b sd: remove scsi_print_sense() in sd_done()
sd_done() was calling scsi_print_sense() for a sense code
of 'NO_SENSE'.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-12 11:15:56 +01:00
Linus Torvalds e75437fb93 Merge branch 'for-3.18/drivers' of git://git.kernel.dk/linux-block
Pull block layer driver update from Jens Axboe:
 "This is the block driver pull request for 3.18.  Not a lot in there
  this round, and nothing earth shattering.

   - A round of drbd fixes from the linbit team, and an improvement in
     asender performance.

   - Removal of deprecated (and unused) IRQF_DISABLED flag in rsxx and
     hd from Michael Opdenacker.

   - Disable entropy collection from flash devices by default, from Mike
     Snitzer.

   - A small collection of xen blkfront/back fixes from Roger Pau Monné
     and Vitaly Kuznetsov"

* 'for-3.18/drivers' of git://git.kernel.dk/linux-block:
  block: disable entropy contributions for nonrot devices
  xen, blkfront: factor out flush-related checks from do_blkif_request()
  xen-blkback: fix leak on grant map error path
  xen/blkback: unmap all persistent grants when frontend gets disconnected
  rsxx: Remove deprecated IRQF_DISABLED
  block: hd: remove deprecated IRQF_DISABLED
  drbd: use RB_DECLARE_CALLBACKS() to define augment callbacks
  drbd: compute the end before rb_insert_augmented()
  drbd: Add missing newline in resync progress display in /proc/drbd
  drbd: reduce lock contention in drbd_worker
  drbd: Improve asender performance
  drbd: Get rid of the WORK_PENDING macro
  drbd: Get rid of the __no_warn and __cond_lock macros
  drbd: Avoid inconsistent locking warning
  drbd: Remove superfluous newline from "resync_extents" debugfs entry.
  drbd: Use consistent names for all the bi_end_io callbacks
  drbd: Use better variable names
2014-10-18 12:12:45 -07:00
Linus Torvalds d3dc366bba Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block
Pull core block layer changes from Jens Axboe:
 "This is the core block IO pull request for 3.18.  Apart from the new
  and improved flush machinery for blk-mq, this is all mostly bug fixes
  and cleanups.

   - blk-mq timeout updates and fixes from Christoph.

   - Removal of REQ_END, also from Christoph.  We pass it through the
     ->queue_rq() hook for blk-mq instead, freeing up one of the request
     bits.  The space was overly tight on 32-bit, so Martin also killed
     REQ_KERNEL since it's no longer used.

   - blk integrity updates and fixes from Martin and Gu Zheng.

   - Update to the flush machinery for blk-mq from Ming Lei.  Now we
     have a per hardware context flush request, which both cleans up the
     code should scale better for flush intensive workloads on blk-mq.

   - Improve the error printing, from Rob Elliott.

   - Backing device improvements and cleanups from Tejun.

   - Fixup of a misplaced rq_complete() tracepoint from Hannes.

   - Make blk_get_request() return error pointers, fixing up issues
     where we NULL deref when a device goes bad or missing.  From Joe
     Lawrence.

   - Prep work for drastically reducing the memory consumption of dm
     devices from Junichi Nomura.  This allows creating clone bio sets
     without preallocating a lot of memory.

   - Fix a blk-mq hang on certain combinations of queue depths and
     hardware queues from me.

   - Limit memory consumption for blk-mq devices for crash dump
     scenarios and drivers that use crazy high depths (certain SCSI
     shared tag setups).  We now just use a single queue and limited
     depth for that"

* 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
  block: Remove REQ_KERNEL
  blk-mq: allocate cpumask on the home node
  bio-integrity: remove the needless fail handle of bip_slab creating
  block: include func name in __get_request prints
  block: make blk_update_request print prefix match ratelimited prefix
  blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
  block: fix alignment_offset math that assumes io_min is a power-of-2
  blk-mq: Make bt_clear_tag() easier to read
  blk-mq: fix potential hang if rolling wakeup depth is too high
  block: add bioset_create_nobvec()
  block: use bio_clone_fast() in blk_rq_prep_clone()
  block: misplaced rq_complete tracepoint
  sd: Honor block layer integrity handling flags
  block: Replace strnicmp with strncasecmp
  block: Add T10 Protection Information functions
  block: Don't merge requests if integrity flags differ
  block: Integrity checksum flag
  block: Relocate bio integrity flags
  block: Add a disk flag to block integrity profile
  block: Add prefix to block integrity profile flags
  ...
2014-10-18 11:53:51 -07:00
Mike Snitzer b277da0a8a block: disable entropy contributions for nonrot devices
Clear QUEUE_FLAG_ADD_RANDOM in all block drivers that set
QUEUE_FLAG_NONROT.

Historically, all block devices have automatically made entropy
contributions.  But as previously stated in commit e2e1a148 ("block: add
sysfs knob for turning off disk entropy contributions"):
    - On SSD disks, the completion times aren't as random as they
      are for rotational drives. So it's questionable whether they
      should contribute to the random pool in the first place.
    - Calling add_disk_randomness() has a lot of overhead.

There are more reliable sources for randomness than non-rotational block
devices.  From a security perspective it is better to err on the side of
caution than to allow entropy contributions from unreliable "random"
sources.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-10-04 10:55:32 -06:00
Martin K. Petersen c611529e7c sd: Honor block layer integrity handling flags
A set of flags introduced in the block layer enable better control over
how protection information is handled. These flags are useful for both
error injection and data recovery purposes. Checking can be enabled and
disabled for controller and disk, and the guard tag format is now a
per-I/O property.

Update sd_protect_op to communicate the relevant information to the
low-level device driver via a set of flags in scsi_cmnd.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-30 15:17:35 -06:00
Subhash Jadavani 6fe8c1dbef scsi: balance out autopm get/put calls in scsi_sysfs_add_sdev()
SCSI Well-known logical units generally don't have any scsi driver
associated with it which means no one will call scsi_autopm_put_device()
on these wlun scsi devices and this would result in keeping the
corresponding scsi device always active (hence LLD can't be suspended as
well). Same exact problem can be seen for other scsi device representing
normal logical unit whose driver is yet to be loaded. This patch fixes
the above problem with this approach:

- make the scsi_autopm_put_device call at the end of scsi_sysfs_add_sdev
  to make it balance out the get earlier in the function.
- let drivers do paired get/put calls in their probe methods.

Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-15 16:02:05 -07:00
Sujit Reddy Thumma 2eefd57b97 sd: Avoid sending medium write commands if device is write protected
The SYNCHRONIZE_CACHE command is a medium write command and hence can
fail when the device is write protected. Avoid sending such commands by
making sure that write-cache-enable is disabled even though the device
claim to support it.

Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-15 16:01:57 -07:00
K. Y. Srinivasan 26b9fd8b34 sd: fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout
Commit ID: 7e660100d8 added code to derive the
FLUSH_TIMEOUT from the basic I/O timeout. However, this patch did not use the
basic I/O timeout of the device. Fix this bug.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-25 17:16:42 -04:00
Martin K. Petersen c1d40a527e scsi: add a blacklist flag which enables VPD page inquiries
Despite supporting modern SCSI features some storage devices continue to
claim conformance to an older version of the SPC spec. This is done for
compatibility with legacy operating systems.

Linux by default will not attempt to read VPD pages on devices that
claim SPC-2 or older. Introduce a blacklist flag that can be used to
trigger VPD page inquiries on devices that are known to support them.

Reported-by: KY Srinivasan <kys@microsoft.com>
Tested-by: KY Srinivasan <kys@microsoft.com>
Reviewed-by: KY Srinivasan <kys@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-25 17:16:41 -04:00
Christoph Hellwig fd2eb9034e scsi: move the writeable field from struct scsi_device to struct scsi_cd
We currently set the field in common code based on the device type,
but then only use it in the cdrom driver which also overrides the
value previously set in the generic code.

Just leave this entirely to the CDROM driver to make everyones life
simpler.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
2014-07-25 17:16:41 -04:00
Christoph Hellwig 87949eee7e sd: split sd_init_command
Factor out a function to initialize regular read/write commands and leave
sd_init_command as a simple dispatcher to the different prepare routines.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:16:29 +02:00
Christoph Hellwig e4200f8ee3 sd: retry discard commands
Currently cmd->allowed is initialized from rq->retries for discard
commands, but retries is always 0 for non-BLOCK_PC requests.  Set it
to the standard number of retries instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:16:28 +02:00
Christoph Hellwig a25ee54851 sd: retry write same commands
Currently cmd->allowed is initialized from rq->retries for write same
commands, but retries is always 0 for non-BLOCK_PC requests.  Set it
to the standard number of retries instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:16:27 +02:00
Christoph Hellwig 6a7b43985d sd: don't use scsi_setup_blk_pc_cmnd for discard requests
Simplify handling of discard requests by setting up the command directly
instead of initializing request fields and then calling
scsi_setup_blk_pc_cmnd to propagate the information into the command.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:16:26 +02:00
Christoph Hellwig 59b1134c5a sd: don't use scsi_setup_blk_pc_cmnd for write same requests
Simplify handling of write same requests by setting up the command directly
instead of initializing request fields and then calling
scsi_setup_blk_pc_cmnd to propagate the information into the command.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Webb Scales <webbnh@hp.com>
2014-07-17 22:12:47 +02:00
Christoph Hellwig a118c6c1d9 sd: don't use scsi_setup_blk_pc_cmnd for flush requests
Simplify handling of flush requests by setting up the command directly
instead of initializing request fields and then calling
scsi_setup_blk_pc_cmnd to propagate the information into the command.

Also rename scsi_setup_flush_cmnd to sd_setup_flush_cmnd for consistency.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:12:33 +02:00
Christoph Hellwig 5158a899d8 scsi: set sc_data_direction in common code
The data direction fiel in the SCSI command is derived only from the block
request structure.  Move setting it up into common code instead of
duplicating it in the ULDs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:11:41 +02:00
Christoph Hellwig 3868cf8ea7 scsi: restructure command initialization for TYPE_FS requests
We should call the device handler prep_fn for all TYPE_FS requests,
not just simple read/write calls that are handled by the disk driver.

Restructure the common I/O code to call the prep_fn handler and zero
out the CDB, and just leave the call to scsi_init_io to the ULDs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:11:27 +02:00
Martin K. Petersen bcdb247c6b sd: Limit transfer length
Until now the per-command transfer length has exclusively been gated by
the max_sectors parameter in the scsi_host template. Given that the size
of this parameter has been bumped to an unsigned int we have to be
careful not to exceed the target device's capabilities.

If the if the device specifies a Maximum Transfer Length in the Block
Limits VPD we'll use that value. Otherwise we'll use 0xffffffff for
devices that have use_16_for_rw set and 0xffff for the rest. We then
combine the chosen disk limit with max_sectors in the host template. The
smaller of the two will be used to set the max_hw_sectors queue limit.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:33 +02:00
Clément Calmels 8d964478b2 sd: bad return code of init_sd
In init_sd function, if kmem_cache_create or mempool_create_slab_pools
calls fail, the error will not be correclty reported because
class_register previously set the value of err to 0.

Signed-off-by: Clément Calmels <clement.calmels@free.fr>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:32 +02:00
Vaughan Cao cb2fb68d06 sd: notify block layer when using temporary change to cache_type
This is a fix for commit 39c60a0948

  "sd: fix array cache flushing bug causing performance problems"

We must notify the block layer via q->flush_flags after a temporary change
of the cache_type to write through.  Without this, a SYNCHRONIZE CACHE
command will still be generated.  This patch factors out a helper that
can be called from sd_revalidate_disk and cache_type_store.

Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:31 +02:00
Akinobu Mita e430cbc8bb sd: use READ_16 or WRITE_16 when transfer length is greater than 0xffff
This change makes the scsi disk driver handle the requests whose
transfer length is greater than 0xffff with READ_16 or WRITE_16.

However, this is a preparation for extending the data type of
max_sectors in struct Scsi_Host and scsi_host_template.  So, it is
impossible to happen this condition for now, because SCSI low-level
drivers can not specify max_sectors greater than 0xffff due to the
data type limitation.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:30 +02:00
Alan Stern b14bf2d0c0 usb-storage/SCSI: Add broken_fua blacklist flag
Some buggy JMicron USB-ATA bridges don't know how to translate the FUA
bit in READs or WRITEs.  This patch adds an entry in unusual_devs.h
and a blacklist flag to tell the sd driver not to use FUA.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Michael Büsch <m@bues.ch>
Tested-by: Michael Büsch <m@bues.ch>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 22:47:18 -07:00
Linus Torvalds 1c54fc1efe SCSI for-linus on 20140609
This patch consists of the usual driver updates (qla2xxx, qla4xxx, lpfc,
 be2iscsi, fnic, ufs, NCR5380) The NCR5380 is the addition to maintained status
 of a long neglected driver for older hardware.  In addition there are a lot of
 minor fixes and cleanups and some more updates to make scsi mq ready.
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJTlcwiAAoJEDeqqVYsXL0M5qsIALzVPLd4yxA16zCiaPQUeIV5
 mfYmwISFlN+qW3AcUeSH4D13YgegCjEBfqaDMWvIkgouxLy/7jpxtChutq3MCzUE
 cDT1B9+ZrzoqBISRNHEh/gx5F1MOF2VPuqG2pe0J90wyRCNzJscB6PbtWMAo86CA
 2eu7wq3K9FXxCC1qY0PzwBLXHqUcgk5GWiK9CM/k4W0NiTVeNmwPeh5i91IQnBHx
 E2l7NAXgNLyCf5tyeswvZ4pW0T3hlaswNmBB4qC8oJm4U6UqMN+tk4ML63Pz7uPe
 4mlHG0uI8Vbdi13iv1EDUZ9Vo8iqVrzP2UAhakgP9poKSGqE4d/MD0EKNGQB2Es=
 =cBY8
 -----END PGP SIGNATURE-----

Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This patch consists of the usual driver updates (qla2xxx, qla4xxx,
  lpfc, be2iscsi, fnic, ufs, NCR5380) The NCR5380 is the addition to
  maintained status of a long neglected driver for older hardware.  In
  addition there are a lot of minor fixes and cleanups and some more
  updates to make scsi mq ready"

* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (130 commits)
  include/scsi/osd_protocol.h: remove unnecessary __constant
  mvsas: Recognise device/subsystem 9485/9485 as 88SE9485
  Revert "be2iscsi: Fix processing cqe for cxn whose endpoint is freed"
  mptfusion: fix msgContext in mptctl_hp_hostinfo
  acornscsi: remove linked command support
  scsi/NCR5380: dprintk macro
  fusion: Remove use of DEF_SCSI_QCMD
  fusion: Add free msg frames to the head, not tail of list
  mpt2sas: Add free smids to the head, not tail of list
  mpt2sas: Remove use of DEF_SCSI_QCMD
  mpt2sas: Remove uses of serial_number
  mpt3sas: Remove use of DEF_SCSI_QCMD
  mpt3sas: Remove uses of serial_number
  qla2xxx: Use kmemdup instead of kmalloc + memcpy
  qla4xxx: Use kmemdup instead of kmalloc + memcpy
  qla2xxx: fix incorrect debug printk
  be2iscsi: Bump the driver version
  be2iscsi: Fix processing cqe for cxn whose endpoint is freed
  be2iscsi: Fix destroy MCC-CQ before MCC-EQ is destroyed
  be2iscsi: Fix memory corruption in MBX path
  ...
2014-06-09 18:54:06 -07:00
David Jeffery 2a863ba8f6 sd: medium access timeout counter fails to reset
There is an error with the medium access timeout feature of the sd driver. The
sdkp->medium_access_timed_out value is reset to zero in sd_done() in the wrong
place.  Currently it is reset to zero only when a command returns sense data.
This can result in cases where the medium access check falsely triggers from
timed out commands which are hours or days apart.

For example, an I/O command times out and is aborted.  It then retries and
succeeds.  But with no sense data generated and returned, the
medium_access_timed_out value is not reset.  If no sd command returns sense
data, then the next command to time out (however far in time from the first
failure) will trigger the medium access timeout and put the device offline.

The resetting of sdkp->medium_access_timed_out should occur before the check
for sense data.

To reproduce using scsi_debug, use SCSI_DEBUG_OPT_TIMEOUT or
SCSI_DEBUG_OPT_MAC_TIMEOUT to force an I/O command to timeout.  Then, remove
the opt value so the I/O will succeed on retry.  Perform more I/O as desired.
Finally, repeat the process to make a new I/O command time out.  Without the
patch, the device will be marked offline even though many I/O commands have
succeeded between the 2 instances of timed out commands.

Signed-off-by: David Jeffery <djeffery@redhat.com>
Reviewed-by:  Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-05-19 12:54:21 +02:00
Christoph Hellwig a1b73fc194 scsi: reintroduce scsi_driver.init_command
Instead of letting the ULD play games with the prep_fn move back to
the model of a central prep_fn with a callback to the ULD.  This
already cleans up and shortens the code by itself, and will be required
to properly support blk-mq in the SCSI midlayer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-05-19 12:35:09 +02:00
Jens Axboe dc4a93078b sd/skd: stuff discard page in request->completion_data
Store the pointer to the page there, so we can always safely
reference it from end_io context where ->bio may have been
cleared.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-16 21:37:30 -06:00
Jens Axboe b4f42e2831 block: remove struct request buffer member
This was used in the olden days, back when onions were proper
yellow. Basically it mapped to the current buffer to be
transferred. With highmem being added more than a decade ago,
most drivers map pages out of a bio, and rq->buffer isn't
pointing at anything valid.

Convert old style drivers to just use bio_data().

For the discard payload use case, just reference the page
in the bio.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-15 14:03:02 -06:00
Linus Torvalds b7e70ca9c7 Merge branch 'async-scsi-resume' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/isci
Pull async SCSI resume support from Dan Williams:
 "Allow disks and other devices to resume in parallel.

  This provides a tangible speed up for a non-esoteric use case (laptop
  resume):

    https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach"

* 'async-scsi-resume' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/isci:
  scsi: async sd resume
2014-04-11 17:23:52 -07:00
Dan Williams 3c31b52f96 scsi: async sd resume
async_schedule() sd resume work to allow disks and other devices to
resume in parallel.

This moves the entirety of scsi_device resume to an async context to
ensure that scsi_device_resume() remains ordered with respect to the
completion of the start/stop command.  For the duration of the resume,
new command submissions (that do not originate from the scsi-core) will
be deferred (BLKPREP_DEFER).

It adds a new ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain) as a container
of these operations.  Like scsi_sd_probe_domain it is flushed at
sd_remove() time to ensure async ops do not continue past the
end-of-life of the sdev.  The implementation explicitly refrains from
reusing scsi_sd_probe_domain directly for this purpose as it is flushed
at the end of dpm_resume(), potentially defeating some of the benefit.
Given sdevs are quiesced it is permissible for these resume operations
to bleed past the async_synchronize_full() calls made by the driver
core.

We defer the resolution of which pm callback to call until
scsi_dev_type_{suspend|resume} time and guarantee that the callback
parameter is never NULL.  With this in place the type of resume
operation is encoded in the async function identifier.

There is a concern that async resume could trigger PSU overload.  In the
enterprise, storage enclosures enforce staggered spin-up regardless of
what the kernel does making async scanning safe by default.  Outside of
that context a user can disable asynchronous scanning via a kernel
command line or CONFIG_SCSI_SCAN_ASYNC.  Honor that setting when
deciding whether to do resume asynchronously.

Inspired by Todd's analysis and initial proposal [2]:
https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach

Cc: Len Brown <len.brown@intel.com>
Cc: Phillip Susi <psusi@ubuntu.com>
[alan: bug fix and clean up suggestion]
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Suggested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
[djbw: kick all resume work to the async queue]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2014-04-10 15:30:35 -07:00
Martin K. Petersen b2bff6ceb6 [SCSI] sd: Quiesce mode sense error messages
Messages about discovered disk properties are only printed once unless
they are found to have changed. Errors encountered during mode sense,
however, are printed every time we revalidate.

Quiesce mode sense errors so they are only printed during the first
scan.

[jejb: checkpatch fixes]
Bugzilla: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=733565
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-03-27 08:26:33 -07:00
Alan Stern 7aae51347b [SCSI] sd: don't fail if the device doesn't recognize SYNCHRONIZE CACHE
Evidently some wacky USB-ATA bridges don't recognize the SYNCHRONIZE
CACHE command, as shown in this email thread:

	http://marc.info/?t=138978356200002&r=1&w=2

The fact that we can't tell them to drain their caches shouldn't
prevent the system from going into suspend.  Therefore sd_sync_cache()
shouldn't return an error if the device replies with an Invalid
Command ASC.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Sven Neumann <s.neumann@raumfeld.com>
Tested-by: Daniel Mack <zonque@gmail.com>
CC: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-03-15 10:19:22 -07:00
Linus Torvalds f568849eda Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe:
 "The major piece in here is the immutable bio_ve series from Kent, the
  rest is fairly minor.  It was supposed to go in last round, but
  various issues pushed it to this release instead.  The pull request
  contains:

   - Various smaller blk-mq fixes from different folks.  Nothing major
     here, just minor fixes and cleanups.

   - Fix for a memory leak in the error path in the block ioctl code
     from Christian Engelmayer.

   - Header export fix from CaiZhiyong.

   - Finally the immutable biovec changes from Kent Overstreet.  This
     enables some nice future work on making arbitrarily sized bios
     possible, and splitting more efficient.  Related fixes to immutable
     bio_vecs:

        - dm-cache immutable fixup from Mike Snitzer.
        - btrfs immutable fixup from Muthu Kumar.

  - bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
  xtensa: fixup simdisk driver to work with immutable bio_vecs
  block/blk-mq-cpu.c: use hotcpu_notifier()
  blk-mq: for_each_* macro correctness
  block: Fix memory leak in rw_copy_check_uvector() handling
  bio-integrity: Fix bio_integrity_verify segment start bug
  block: remove unrelated header files and export symbol
  blk-mq: uses page->list incorrectly
  blk-mq: use __smp_call_function_single directly
  btrfs: fix missing increment of bi_remaining
  Revert "block: Warn and free bio if bi_end_io is not set"
  block: Warn and free bio if bi_end_io is not set
  blk-mq: fix initializing request's start time
  block: blk-mq: don't export blk_mq_free_queue()
  block: blk-mq: make blk_sync_queue support mq
  block: blk-mq: support draining mq queue
  dm cache: increment bi_remaining when bi_end_io is restored
  block: fixup for generic bio chaining
  block: Really silence spurious compiler warnings
  block: Silence spurious compiler warnings
  block: Kill bio_pair_split()
  ...
2014-01-30 11:19:05 -08:00
Jens Axboe b28bc9b38c Linux 3.13-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJSwLfoAAoJEHm+PkMAQRiGi6QH/1U1B7lmHChDTw3jj1lfm9gA
 189Si4QJlnxFWCKHvKEL+pcaVuACU+aMGI8+KyMYK4/JfuWVjjj5fr/SvyHH2/8m
 LdSK8aHMhJ46uBS4WJ/l6v46qQa5e2vn8RKSBAyKm/h4vpt+hd6zJdoFrFai4th7
 k/TAwOAEHI5uzexUChwLlUBRTvbq4U8QUvDu+DeifC8cT63CGaaJ4qVzjOZrx1an
 eP6UXZrKDASZs7RU950i7xnFVDQu4PsjlZi25udsbeiKcZJgPqGgXz5ULf8ZH8RQ
 YCi1JOnTJRGGjyIOyLj7pyB01h7XiSM2+eMQ0S7g54F2s7gCJ58c2UwQX45vRWU=
 =/4/R
 -----END PGP SIGNATURE-----

Merge tag 'v3.13-rc6' into for-3.14/core

Needed to bring blk-mq uptodate, since changes have been going in
since for-3.14/core was established.

Fixup merge issues related to the immutable biovec changes.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Conflicts:
	block/blk-flush.c
	fs/btrfs/check-integrity.c
	fs/btrfs/extent_io.c
	fs/btrfs/scrub.c
	fs/logfs/dev_bdev.c
2013-12-31 09:51:02 -07:00
Geert Uytterhoeven ef80d1e18b [SCSI] sd: Do not call do_div() with a 64-bit divisor
do_div() is meant for divisions of 64-bit number by 32-bit numbers.
Passing 64-bit divisor types caused issues in the past on 32-bit platforms,
cfr. commit ea077b1b96 ("m68k: Truncate base
in do_div()").

As scsi_device.sector_size is unsigned (int), factor should be unsigned
int, too.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-12-19 07:39:03 -08:00
James Bottomley 2451079bc2 [SCSI] Fix erratic device offline during EH
Commit 18a4d0a22e
(Handle disk devices which can not process medium access commands)
was introduced to offline any device which cannot process medium
access commands.
However, commit 3eef6257de
(Reduce error recovery time by reducing use of TURs) reduced
the number of TURs by sending it only on the first failing
command, which might or might not be a medium access command.
So in combination this results in an erratic device offlining
during EH; if the command where the TUR was sent upon happens
to be a medium access command the device will be set offline,
if not everything proceeds as normal.

This patch moves the check to the final test, eliminating
this problem.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-12-19 07:39:02 -08:00
Martin K. Petersen 54b2b50c20 [SCSI] Disable WRITE SAME for RAID and virtual host adapter drivers
Some host adapters do not pass commands through to the target disk
directly. Instead they provide an emulated target which may or may not
accurately report its capabilities. In some cases the physical device
characteristics are reported even when the host adapter is processing
commands on the device's behalf. This can lead to adapter firmware hangs
or excessive I/O errors.

This patch disables WRITE SAME for devices connected to host adapters
that provide an emulated target. Driver writers can disable WRITE SAME
by setting the no_write_same flag in the host adapter template.

[jejb: fix up rejections due to eh_deadline patch]
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-11-29 08:48:39 +04:00
Kent Overstreet a4ad39b1d1 block: Convert bio_iovec() to bvec_iter
For immutable biovecs, we'll be introducing a new bio_iovec() that uses
our new bvec iterator to construct a biovec, taking into account
bvec_iter->bi_bvec_done - this patch updates existing users for the new
usage.

Some of the existing users really do need a pointer into the bvec array
- those uses are all going to be removed, but we'll need the
functionality from immutable to remove them - so for now rename the
existing bio_iovec() -> __bio_iovec(), and it'll be removed in a couple
patches.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
2013-11-23 22:33:49 -08:00
Linus Torvalds 0d522ee749 SCSI for-linus on 20131110
This patch set is driver updates for qla4xxx, scsi_debug, pm80xx, fcoe/libfc,
 eas2r, lpfc, be2iscsi and megaraid_sas plus some assorted bug fixes and
 cleanups.
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJSfw30AAoJEDeqqVYsXL0MZPEIAK6GBHFw8JsU3NQ4SbM5hzdM
 ywPryTn7DO9jyj0J04i6TNtbS6om9E8tjLyr3SnmTQNiTDXGv44rIEfJyHR9ko2n
 E2hRu4xaGEWK4dkuktQuOqj2fuXRyeXr2maYIXjkmFI0hesLqozYKgLAeWTHvabE
 2HICwG/lfCzesqVl69Y3V8n1vZvtJqAls6liwY09i9eSDRe39DynRn7bjLXzkPkc
 ynjJYl22CIZ7nb+PgzqQ+xEUIdXqGG890CvGaqg7+x3ZUmmOtfECaDUkCjVeiiE6
 sy72V6E4ET/YMrkhRmIUKyZxGbl/tMxYPuGaBhq2fSNRx8x1R+Ajfh9UM2AZTh4=
 =hCHG
 -----END PGP SIGNATURE-----

Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull first round of SCSI updates from James Bottomley:
 "This patch set is driver updates for qla4xxx, scsi_debug, pm80xx,
  fcoe/libfc, eas2r, lpfc, be2iscsi and megaraid_sas plus some assorted
  bug fixes and cleanups"

* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (106 commits)
  [SCSI] scsi_error: Escalate to LUN reset if abort fails
  [SCSI] Add 'eh_deadline' to limit SCSI EH runtime
  [SCSI] remove check for 'resetting'
  [SCSI] dc395: Move 'last_reset' into internal host structure
  [SCSI] tmscsim: Move 'last_reset' into host structure
  [SCSI] advansys: Remove 'last_reset' references
  [SCSI] dpt_i2o: return SCSI_MLQUEUE_HOST_BUSY when in reset
  [SCSI] dpt_i2o: Remove DPTI_STATE_IOCTL
  [SCSI] megaraid_sas: Fix synchronization problem between sysPD IO path and AEN path
  [SCSI] lpfc: Fix typo on NULL assignment
  [SCSI] scsi_dh_alua: ALUA handler attach should succeed while TPG is transitioning
  [SCSI] scsi_dh_alua: ALUA check sense should retry device internal reset unit attention
  [SCSI] esas2r: Cleanup snprinf formatting of firmware version
  [SCSI] esas2r: Remove superfluous mask of pcie_cap_reg
  [SCSI] esas2r: Fixes for big-endian platforms
  [SCSI] esas2r: Directly call kernel functions for atomic bit operations
  [SCSI] lpfc 8.3.43: Update lpfc version to driver version 8.3.43
  [SCSI] lpfc 8.3.43: Fixed not processing task management IOCB response status
  [SCSI] lpfc 8.3.43: Fixed spinlock hang.
  [SCSI] lpfc 8.3.43: Fixed invalid Total_Data_Placed value received for els and ct command responses
  ...
2013-11-14 12:25:38 +09:00
Jens Axboe e37459b8e2 Merge branch 'blk-mq/core' into for-3.13/core
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Conflicts:
	block/blk-timeout.c
2013-11-08 09:08:12 -07:00
Jens Axboe 5953316dbf block: make rq->cmd_flags be 64-bit
We have officially run out of flags in a 32-bit space. Extend it
to 64-bit even on 32-bit archs.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-10-25 11:55:59 +01:00
James Bottomley 7e660100d8 [SCSI] Derive the FLUSH_TIMEOUT from the basic I/O timeout
Rather than having a separate constant for specifying the timeout on FLUSH
operations, use the basic I/O timeout value that is already configurable
on a per target basis to derive the FLUSH timeout. Looking at the current
definitions of these timeout values, the FLUSH operation is supposed to have
a value that is twice the normal timeout value. This patch preserves this
relationship while leveraging the flexibility of specifying the I/O timeout.

Based on a prior patch by KY Srinivasan <kys@microsoft.com>

Reviewed-by: KY Srinivasan <kys@microsoft.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-25 09:58:16 +01:00
Oliver Neukum 95897910a5 [SCSI] sd: Add error handling during flushing caches
It makes no sense to flush the cache of a device without medium.
Errors during suspend must be handled according to their causes.
Errors due to missing media or unplugged devices must be ignored.
Errors due to devices being offlined must also be ignored.
The error returns must be modified so that the generic layer
understands them.

[jejb: fix up whitespace and other formatting problems]
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-25 09:58:13 +01:00
Bernd Schubert af73623f5f [SCSI] sd: Reduce buffer size for vpd request
Somehow older areca firmware versions have issues with
scsi_get_vpd_page() and a large buffer, the firmware
seems to crash and the scsi error-handler will start endless
recovery retries.
Limiting the buf-size to 64-bytes fixes this issue with older
firmware versions (<1.49 for my controller).

Fixes a regression with areca controllers and older firmware versions
introduced by commit: 66c28f9712

Reported-by: Nix <nix@esperi.org.uk>
Tested-by: Nix <nix@esperi.org.uk>
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Cc: stable@vger.kernel.org # delay inclusion for 2 months for testing
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-25 09:58:12 +01:00
Aaron Lu 10c580e423 [SCSI] sd: call blk_pm_runtime_init before add_disk
Sujit has found a race condition that would make q->nr_pending
unbalanced, it occurs as Sujit explained:

"
sd_probe_async() ->
	add_disk() ->
		disk_add_event() ->
			schedule(disk_events_workfn)
	sd_revalidate_disk()
	blk_pm_runtime_init()
return;

Let's say the disk_events_workfn() calls sd_check_events() which tries
to send test_unit_ready() and because of sd_revalidate_disk() trying to
send another commands the test_unit_ready() might be re-queued as the
tagged command queuing is disabled.

So the race condition is -

Thread 1 			  |		Thread 2
sd_revalidate_disk()		  |	sd_check_events()
...nr_pending = 0 as q->dev = NULL|	scsi_queue_insert()
blk_runtime_pm_init()		  | 	blk_pm_requeue_request() ->
				  |	nr_pending = -1 since
				  |	q->dev != NULL
"

The problem is, the test_unit_ready request doesn't get counted the
first time it is queued, so the later decrement of q->nr_pending in
blk_pm_requeue_request makes it unbalanced.

Fix this by calling blk_pm_runtime_init before add_disk so that all
requests initiated there will all be counted.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reported-and-tested-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-23 14:09:18 +01:00
Alan Stern 984f1733fc [SCSI] sd: Fix potential out-of-bounds access
This patch fixes an out-of-bounds error in sd_read_cache_type(), found
by Google's AddressSanitizer tool.  When the loop ends, we know that
"offset" lies beyond the end of the data in the buffer, so no Caching
mode page was found.  In theory it may be present, but the buffer size
is limited to 512 bytes.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
CC: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-09-11 09:44:00 -07:00
Greg Kroah-Hartman e1ea2351fb [SCSI] sd: convert class code to use dev_groups
The dev_attrs field of struct class is going away soon, dev_groups
should be used instead.  This converts the scsi disk class code to use
the correct field.

It required some functions to be moved around to place the show and
store functions next to each other, the old order seemed to make no
sense at all.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-21 10:10:48 -07:00
Ewan D. Milne 085b513f97 [SCSI] sd: fix crash when UA received on DIF enabled device
sd_prep_fn will allocate a larger CDB for the command via mempool_alloc
for devices using DIF type 2 protection.  This CDB was being freed
in sd_done, which results in a kernel crash if the command is retried
due to a UNIT ATTENTION.  This change moves the code to free the larger
CDB into sd_unprep_fn instead, which is invoked after the request is
complete.

It is no longer necessary to call scsi_print_command separately for
this case as the ->cmnd will no longer be NULL in the normal code path.

Also removed conditional test for DIF type 2 when freeing the larger
CDB because the protection_type could have been changed via sysfs while
the command was executing.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-07-23 07:41:53 -07:00
Linus Torvalds 84cbd7222b SCSI misc on 20130702
The patch set is mostly driver updates (usf, zfcp, lpfc, mpt2sas,
 megaraid_sas, bfa, ipr) and a few bug fixes.  Also of note is that the
 Buslogic driver has been rewritten to a better coding style and 64 bit support
 added.  We also removed the libsas limitation on 16 bytes for the command size
 (currently no drivers make use of this).
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJR0ugCAAoJEDeqqVYsXL0MX2sH+gOkWuy5p3igz+VEim8TNaOA
 VV5EIxG1v7Q0ZiXCp/wcF6eqhgQkWvkrKSxWkaN0yzq8LEWfQeY7VmFDbGgFeVUZ
 XMlX5ay8+FLCIK9M76oxwhV7VAXYbeUUZafh+xX6StWCdKrl0eJbicOGoUk/pjsi
 ZjCBpK5BM0SW+s2gMSDQhO2eMsgMp9QrJMiCJHUF1wWPN8Yez6va1tg4b9iW39BZ
 dd3sJq+PuN6yDbYAJIjEpiGF9gDaaYxSE6bTKJuY+oy08+VsP/RRWjorTENs9Aev
 rQXZIC3nwsv26QRSX7RDSj+UE+kFV6FcPMWMU3HN2UG6ttprtOxT8tslVJf7LcA=
 =BxtF
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull first round of SCSI updates from James Bottomley:
 "The patch set is mostly driver updates (usf, zfcp, lpfc, mpt2sas,
  megaraid_sas, bfa, ipr) and a few bug fixes.  Also of note is that the
  Buslogic driver has been rewritten to a better coding style and 64 bit
  support added.  We also removed the libsas limitation on 16 bytes for
  the command size (currently no drivers make use of this)"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (101 commits)
  [SCSI] megaraid: minor cut and paste error fixed.
  [SCSI] ufshcd-pltfrm: remove unnecessary dma_set_coherent_mask() call
  [SCSI] ufs: fix register address in UIC error interrupt handling
  [SCSI] ufshcd-pltfrm: add missing empty slot in ufs_of_match[]
  [SCSI] ufs: use devres functions for ufshcd
  [SCSI] ufs: Fix the response UPIU length setting
  [SCSI] ufs: rework link start-up process
  [SCSI] ufs: remove version check before IS reg clear
  [SCSI] ufs: amend interrupt configuration
  [SCSI] ufs: wrap the i/o access operations
  [SCSI] storvsc: Update the storage protocol to win8 level
  [SCSI] storvsc: Increase the value of scsi timeout for storvsc devices
  [SCSI] MAINTAINERS: Add myself as the maintainer for BusLogic SCSI driver
  [SCSI] BusLogic: Port driver to 64-bit.
  [SCSI] BusLogic: Fix style issues
  [SCSI] libiscsi: Added new boot entries in the session sysfs
  [SCSI] aacraid: Fix for arrays are going offline in the system. System hangs
  [SCSI] ipr: IOA Status Code(IOASC) update
  [SCSI] sd: Update WRITE SAME heuristics
  [SCSI] fnic: potential dead lock in fnic_is_abts_pending()
  ...
2013-07-04 12:30:30 -07:00
Kees Cook 02aa2a3763 drivers: avoid format string in dev_set_name
Calling dev_set_name with a single paramter causes it to be handled as a
format string.  Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents,
including wrappers like device_create*() and bdi_register().

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:41 -07:00
Martin K. Petersen 66c28f9712 [SCSI] sd: Update WRITE SAME heuristics
SATA drives located behind a SAS controller would incorrectly receive
WRITE SAME commands. Tweak the heuristics so that:

 - If REPORT SUPPORTED OPERATION CODES is provided we will use that to
   choose between WRITE SAME(16), WRITE SAME(10) and disabled. This also
   fixes an issue with the old code which would issue WRITE SAME(10)
   despite the command not being whitelisted in REPORT SUPPORTED
   OPERATION CODES.

 - If REPORT SUPPORTED OPERATION CODES is not provided we will fall back
   to WRITE SAME(10) unless the device has an ATA Information VPD page.
   The assumption is that a SATL which is smart enough to implement
   WRITE SAME would also provide REPORT SUPPORTED OPERATION CODES.

To facilitate the new heuristics scsi_report_opcode() has been modified
to so we can distinguish between "operation not supported" and "RSOC not
supported".

Reported-by: H. Peter Anvin <hpa@zytor.com>
Tested-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-06-26 17:56:18 -07:00
Ben Hutchings 2ee3e26c67 [SCSI] sd: Fix parsing of 'temporary ' cache mode prefix
Commit 39c60a0948 '[SCSI] sd: fix array cache flushing bug causing
performance problems' added temp as a pointer to "temporary " and used
sizeof(temp) - 1 as its length.  But sizeof(temp) is the size of the
pointer, not the size of the string constant.  Change temp to a static
array so that sizeof() does what was intended.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-06-26 08:44:33 -07:00
Hannes Reinecke 0761df9c4b [SCSI] sd: avoid deadlocks when running under multipath
When multipathed systems run into an all-paths-down scenario
all devices might be dropped, too. This causes 'del_gendisk'
to be called, which will unregister the kobj_map->probe()
function for all disk device numbers.
When the device comes back the default ->probe() function
is run which will call __request_module(), which will
deadlock.
As 'del_gendisk' typically does _not_ trigger a module unload
the default ->probe() function is pointless anyway.
This patch implements a dummy ->probe() function, which will
just return NULL if the disk is not registered.
This will avoid the deadlock. Plus it'll speed up device
scanning.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-06-04 11:16:23 -07:00
James Bottomley 297b8a0734 Merge branch 'postmerge' into for-linus
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10 07:54:01 -07:00
James Bottomley 832e77bc11 Merge branch 'misc' into for-linus
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10 07:53:40 -07:00
Al Viro db2a144bed block_device_operations->release() should return void
The value passed is 0 in all but "it can never happen" cases (and those
only in a couple of drivers) *and* it would've been lost on the way
out anyway, even if something tried to pass something meaningful.
Just don't bother.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-07 02:16:21 -04:00
Lin Ming 6df339a51e [SCSI] sd: change to auto suspend mode
Uses block layer runtime pm helper functions in
scsi_runtime_suspend/resume for devices that take advantage of it.

Remove scsi_autopm_* from sd open/release path and check_events path.

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-06 12:48:31 -07:00
Lin Ming 9b21493c45 [SCSI] sd: use REQ_PM in sd's runtime suspend operation
With the introduction of REQ_PM, modify sd's runtime suspend operation
functions to use that flag so that the operations to put the device into
runtime suspended state(i.e. sync cache and stop device) will not affect
its runtime PM status.

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-06 12:48:17 -07:00
James Bottomley 39c60a0948 [SCSI] sd: fix array cache flushing bug causing performance problems
Some arrays synchronize their full non volatile cache when the sd driver sends
a SYNCHRONIZE CACHE command.  Unfortunately, they can have Terrabytes of this
and we send a SYNCHRONIZE CACHE for every barrier if an array reports it has a
writeback cache.  This leads to massive slowdowns on journalled filesystems.

The fix is to allow userspace to turn off the writeback cache setting as a
temporary measure (i.e. without doing the MODE SELECT to write it back to the
device), so even though the device reported it has a writeback cache, the
user, knowing that the cache is non volatile and all they care about is
filesystem correctness, can turn that bit off in the kernel and avoid the
performance ruinous (and safety irrelevant) SYNCHRONIZE CACHE commands.

The way you do this is add a 'temporary' prefix when performing the usual
cache setting operations, so

echo temporary write through > /sys/class/scsi_disk/<disk>/cache_type

Reported-by: Ric Wheeler <rwheeler@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-02 16:15:54 -07:00
Aaron Lu 691e3d3175 [SCSI] sd: update sd to use the new pm callbacks
Update sd driver to use the callbacks defined in dev_pm_ops.

sd_freeze is NULL, the bus level callback has taken care of quiescing
the device so there should be nothing needs to be done here.
Consequently, sd_thaw is not needed here either.

suspend, poweroff and runtime suspend share the same routine sd_suspend,
which will sync flush and then stop the drive, this is the same as before.

resume, restore and runtime resume share the same routine sd_resume,
which will start the drive by putting it into active power state, this
is also the same as before.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-11-30 09:28:34 +00:00
Aaron Lu a01475637c [SCSI] sd: put to stopped power state when runtime suspend
When device is runtime suspended, put it to stopped power state to save
some power.

This will also make the behaviour consistent with what the scsi_pm.c
thinks about sd as the comment says:
sd treats runtime suspend, system suspend and system hibernate identical.
With this patch, it is now identical.
And sd_shutdown will also do nothing when it finds the device has been
runtime suspended, if we do not spin down the disk in runtime suspend
by putting it into stopped power state, the disk will be shut down
incorrectly.
And the the same problem can be solved for runtime power off after
runtime suspended case by this change.

With the current runtime scheme for disk, it will only be runtime
suspended when no process opens the disk, so this shouldn't happen a
lot, which makes it acceptable to spin down the disk when runtime
suspended. If some day a more aggressive runtime scheme is used, like
the 'request based runtime pm for disk' that Alan Stern and Lin Ming
has been working, we can introduce some policy to control this. But for
now, make it simple and correct by spinning down the disk.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-11-30 09:22:09 +00:00
Jason J. Herne 53ad570be6 [SCSI] sd: Use SCSI read/write(16) with > 32-bit LBA drives
Force large capacity (> 0xFFFFFFFF blocks) drives to use READ/WRITE(16) instead
of READ/WRITE(10). Some(most/all?) USB enclosures do not like READ(10) commands
when a large capacity drive is installed. This issue was reported and discussed
here: http://marc.info/?l=linux-usb&m=135247705222324

Signed-off-by: Jason J. Herne <hernejj@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-11-27 09:00:38 +04:00