sd_prep_fn will allocate a larger CDB for the command via mempool_alloc
for devices using DIF type 2 protection. This CDB was being freed
in sd_done, which results in a kernel crash if the command is retried
due to a UNIT ATTENTION. This change moves the code to free the larger
CDB into sd_unprep_fn instead, which is invoked after the request is
complete.
It is no longer necessary to call scsi_print_command separately for
this case as the ->cmnd will no longer be NULL in the normal code path.
Also removed conditional test for DIF type 2 when freeing the larger
CDB because the protection_type could have been changed via sysfs while
the command was executing.
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The patch set is mostly driver updates (usf, zfcp, lpfc, mpt2sas,
megaraid_sas, bfa, ipr) and a few bug fixes. Also of note is that the
Buslogic driver has been rewritten to a better coding style and 64 bit support
added. We also removed the libsas limitation on 16 bytes for the command size
(currently no drivers make use of this).
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQEcBAABAgAGBQJR0ugCAAoJEDeqqVYsXL0MX2sH+gOkWuy5p3igz+VEim8TNaOA
VV5EIxG1v7Q0ZiXCp/wcF6eqhgQkWvkrKSxWkaN0yzq8LEWfQeY7VmFDbGgFeVUZ
XMlX5ay8+FLCIK9M76oxwhV7VAXYbeUUZafh+xX6StWCdKrl0eJbicOGoUk/pjsi
ZjCBpK5BM0SW+s2gMSDQhO2eMsgMp9QrJMiCJHUF1wWPN8Yez6va1tg4b9iW39BZ
dd3sJq+PuN6yDbYAJIjEpiGF9gDaaYxSE6bTKJuY+oy08+VsP/RRWjorTENs9Aev
rQXZIC3nwsv26QRSX7RDSj+UE+kFV6FcPMWMU3HN2UG6ttprtOxT8tslVJf7LcA=
=BxtF
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull first round of SCSI updates from James Bottomley:
"The patch set is mostly driver updates (usf, zfcp, lpfc, mpt2sas,
megaraid_sas, bfa, ipr) and a few bug fixes. Also of note is that the
Buslogic driver has been rewritten to a better coding style and 64 bit
support added. We also removed the libsas limitation on 16 bytes for
the command size (currently no drivers make use of this)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (101 commits)
[SCSI] megaraid: minor cut and paste error fixed.
[SCSI] ufshcd-pltfrm: remove unnecessary dma_set_coherent_mask() call
[SCSI] ufs: fix register address in UIC error interrupt handling
[SCSI] ufshcd-pltfrm: add missing empty slot in ufs_of_match[]
[SCSI] ufs: use devres functions for ufshcd
[SCSI] ufs: Fix the response UPIU length setting
[SCSI] ufs: rework link start-up process
[SCSI] ufs: remove version check before IS reg clear
[SCSI] ufs: amend interrupt configuration
[SCSI] ufs: wrap the i/o access operations
[SCSI] storvsc: Update the storage protocol to win8 level
[SCSI] storvsc: Increase the value of scsi timeout for storvsc devices
[SCSI] MAINTAINERS: Add myself as the maintainer for BusLogic SCSI driver
[SCSI] BusLogic: Port driver to 64-bit.
[SCSI] BusLogic: Fix style issues
[SCSI] libiscsi: Added new boot entries in the session sysfs
[SCSI] aacraid: Fix for arrays are going offline in the system. System hangs
[SCSI] ipr: IOA Status Code(IOASC) update
[SCSI] sd: Update WRITE SAME heuristics
[SCSI] fnic: potential dead lock in fnic_is_abts_pending()
...
Calling dev_set_name with a single paramter causes it to be handled as a
format string. Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents,
including wrappers like device_create*() and bdi_register().
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SATA drives located behind a SAS controller would incorrectly receive
WRITE SAME commands. Tweak the heuristics so that:
- If REPORT SUPPORTED OPERATION CODES is provided we will use that to
choose between WRITE SAME(16), WRITE SAME(10) and disabled. This also
fixes an issue with the old code which would issue WRITE SAME(10)
despite the command not being whitelisted in REPORT SUPPORTED
OPERATION CODES.
- If REPORT SUPPORTED OPERATION CODES is not provided we will fall back
to WRITE SAME(10) unless the device has an ATA Information VPD page.
The assumption is that a SATL which is smart enough to implement
WRITE SAME would also provide REPORT SUPPORTED OPERATION CODES.
To facilitate the new heuristics scsi_report_opcode() has been modified
to so we can distinguish between "operation not supported" and "RSOC not
supported".
Reported-by: H. Peter Anvin <hpa@zytor.com>
Tested-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Commit 39c60a0948 '[SCSI] sd: fix array cache flushing bug causing
performance problems' added temp as a pointer to "temporary " and used
sizeof(temp) - 1 as its length. But sizeof(temp) is the size of the
pointer, not the size of the string constant. Change temp to a static
array so that sizeof() does what was intended.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
When multipathed systems run into an all-paths-down scenario
all devices might be dropped, too. This causes 'del_gendisk'
to be called, which will unregister the kobj_map->probe()
function for all disk device numbers.
When the device comes back the default ->probe() function
is run which will call __request_module(), which will
deadlock.
As 'del_gendisk' typically does _not_ trigger a module unload
the default ->probe() function is pointless anyway.
This patch implements a dummy ->probe() function, which will
just return NULL if the disk is not registered.
This will avoid the deadlock. Plus it'll speed up device
scanning.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The value passed is 0 in all but "it can never happen" cases (and those
only in a couple of drivers) *and* it would've been lost on the way
out anyway, even if something tried to pass something meaningful.
Just don't bother.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Uses block layer runtime pm helper functions in
scsi_runtime_suspend/resume for devices that take advantage of it.
Remove scsi_autopm_* from sd open/release path and check_events path.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
With the introduction of REQ_PM, modify sd's runtime suspend operation
functions to use that flag so that the operations to put the device into
runtime suspended state(i.e. sync cache and stop device) will not affect
its runtime PM status.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Some arrays synchronize their full non volatile cache when the sd driver sends
a SYNCHRONIZE CACHE command. Unfortunately, they can have Terrabytes of this
and we send a SYNCHRONIZE CACHE for every barrier if an array reports it has a
writeback cache. This leads to massive slowdowns on journalled filesystems.
The fix is to allow userspace to turn off the writeback cache setting as a
temporary measure (i.e. without doing the MODE SELECT to write it back to the
device), so even though the device reported it has a writeback cache, the
user, knowing that the cache is non volatile and all they care about is
filesystem correctness, can turn that bit off in the kernel and avoid the
performance ruinous (and safety irrelevant) SYNCHRONIZE CACHE commands.
The way you do this is add a 'temporary' prefix when performing the usual
cache setting operations, so
echo temporary write through > /sys/class/scsi_disk/<disk>/cache_type
Reported-by: Ric Wheeler <rwheeler@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Update sd driver to use the callbacks defined in dev_pm_ops.
sd_freeze is NULL, the bus level callback has taken care of quiescing
the device so there should be nothing needs to be done here.
Consequently, sd_thaw is not needed here either.
suspend, poweroff and runtime suspend share the same routine sd_suspend,
which will sync flush and then stop the drive, this is the same as before.
resume, restore and runtime resume share the same routine sd_resume,
which will start the drive by putting it into active power state, this
is also the same as before.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
When device is runtime suspended, put it to stopped power state to save
some power.
This will also make the behaviour consistent with what the scsi_pm.c
thinks about sd as the comment says:
sd treats runtime suspend, system suspend and system hibernate identical.
With this patch, it is now identical.
And sd_shutdown will also do nothing when it finds the device has been
runtime suspended, if we do not spin down the disk in runtime suspend
by putting it into stopped power state, the disk will be shut down
incorrectly.
And the the same problem can be solved for runtime power off after
runtime suspended case by this change.
With the current runtime scheme for disk, it will only be runtime
suspended when no process opens the disk, so this shouldn't happen a
lot, which makes it acceptable to spin down the disk when runtime
suspended. If some day a more aggressive runtime scheme is used, like
the 'request based runtime pm for disk' that Alan Stern and Lin Ming
has been working, we can introduce some policy to control this. But for
now, make it simple and correct by spinning down the disk.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Force large capacity (> 0xFFFFFFFF blocks) drives to use READ/WRITE(16) instead
of READ/WRITE(10). Some(most/all?) USB enclosures do not like READ(10) commands
when a large capacity drive is installed. This issue was reported and discussed
here: http://marc.info/?l=linux-usb&m=135247705222324
Signed-off-by: Jason J. Herne <hernejj@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
scsi_register_driver will register a prep_fn() function, which
in turn migh need to use the sd_cdp_pool for DIF.
Which hasn't been initialised at this point, leading to
a crash. So reshuffle the init_sd() and exit_sd() paths
to have the driver registered last.
Signed-off-by: Joel D. Diaz <joeldiaz@us.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Implement support for WRITE SAME(10) and WRITE SAME(16) in the SCSI disk
driver.
- We set the default maximum to 0xFFFF because there are several
devices out there that only support two-byte block counts even with
WRITE SAME(16). We only enable transfers bigger than 0xFFFF if the
device explicitly reports MAXIMUM WRITE SAME LENGTH in the BLOCK
LIMITS VPD.
- max_write_same_blocks can be overriden per-device basis in sysfs.
- The UNMAP discovery heuristics remain unchanged but the discard
limits are tweaked to match the "real" WRITE SAME commands.
- In the error handling logic we now distinguish between WRITE SAME
with and without UNMAP set.
The discovery process heuristics are:
- If the device reports a SCSI level of SPC-3 or greater we'll issue
READ SUPPORTED OPERATION CODES to find out whether WRITE SAME(16) is
supported. If that's the case we will use it.
- If the device supports the block limits VPD and reports a MAXIMUM
WRITE SAME LENGTH bigger than 0xFFFF we will use WRITE SAME(16).
- Otherwise we will use WRITE SAME(10) unless the target LBA is beyond
0xFFFFFFFF or the block count exceeds 0xFFFF.
- no_write_same is set for ATA, FireWire and USB.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Support requests with more than one bio payload for discards. The total
number of bytes to be discarded is stored in req->__data_len and used in
sd_done() to complete the I/O.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
We set the capacity to zero when we discovered a device formatted with
an unknown DIF protection type. However, the read_capacity code would
override the capacity and cause the device to be enabled regardless.
Make sd_read_protection_type() return an error if the protection type is
unknown. Also prevent duplicate printk lines when the device is being
revalidated.
Reported-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
We have encountered a few devices that misbehaved when operating in T10
PI mode. Allow T10 PI protection type to be overridden from userland.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
It does not make sense to translate ref tags with unexpected values.
Instead we simply ignore them and let the upper layers catch the
problem. Ref tags that contain the expected value are still remapped.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Make use of USB quirk method to identify such HDD while reading
the cache status in sd_probe(). If cache quirk is present for
the HDD, lets assume that cache is enabled and make WCE bit
equal to 1.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Initialize atomic_t scsi_host_next_hn and ioerr_cntas per the guidelines
defined in Documentation/atomic_ops.txt
Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Several bug reports have been received recently for USB mass-storage
devices that don't handle READ CAPACITY(16) commands properly. They
report bogus sizes, in some cases becoming unusable as a result.
The bugs were triggered by commit
09b6b51b0b (SCSI & usb-storage: add
flags for VPD pages and REPORT LUNS), which caused usb-storage to stop
overriding the SCSI level reported by devices. By default, the sd
driver will try READ CAPACITY(16) first for any device whose level is
above SCSI_SPC_2.
It seems likely that any device large enough to require the use of
READ CAPACITY(16) (i.e., 2 TB or more) would be able to handle READ
CAPACITY(10) commands properly. Indeed, I don't know of any devices
that don't handle READ CAPACITY(10) properly.
Therefore this patch (as1559) adds a new flag telling the sd driver
to try READ CAPACITY(10) before READ CAPACITY(16), and sets this flag
for every USB mass-storage device. If a device really is larger than
2 TB, sd will fall back to READ CAPACITY(16) just as it used to.
This fixes Bugzilla #43391.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Hans de Goede <hdegoede@redhat.com>
CC: "James E.J. Bottomley" <JBottomley@parallels.com>
CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQEcBAABAgAGBQJPdrIWAAoJEDeqqVYsXL0Mny4IAMTzXGOXCykpWhdIe2R8w0Ys
eIoTJhBKoQWnLTV8cOODtwmtZcoQLeXkZmizZiAJvX6O1tOgueg+W4AFa9grxXGI
O0d1bSb2ardzU7VZrZSY60Hd4bylMwn4Xv/0dRrQMwTJO0LEeGWsJPV2+2BuXwMB
lGCNB67oUBXgMOI1jUZQRwx/mBzQ3e/gINjnpZTNKHia7YkX/yVTFISq7htgfDN7
1wRGxymbHtVap3NbtUO96BUUndAiF5vom+4WNvaQUyPrCc6aoGWjv+J9DQXY/zgv
AYjujAluK396D6YncGFAWBzYOg9WFbq54v0PRUanjcTTAu5ILs2BxqWdhmnvl14=
=IH8T
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
Pull SCSI updates from James Bottomley:
"This is primarily another round of driver updates (lpfc, bfa, fcoe,
ipr) plus a new ufshcd driver. There shouldn't be anything
controversial in here (The final deletion of scsi proc_ops which
caused some build breakage has been held over until the next merge
window to give us more time to stabilise it).
I'm afraid, with me moving continents at exactly the wrong time,
anything submitted after the merge window opened has been held over to
the next merge window."
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (63 commits)
[SCSI] ipr: Driver version 2.5.3
[SCSI] ipr: Increase alignment boundary of command blocks
[SCSI] ipr: Increase max concurrent oustanding commands
[SCSI] ipr: Remove unnecessary memory barriers
[SCSI] ipr: Remove unnecessary interrupt clearing on new adapters
[SCSI] ipr: Fix target id allocation re-use problem
[SCSI] atp870u, mpt2sas, qla4xxx use pci_dev->revision
[SCSI] fcoe: Drop the rtnl_mutex before calling fcoe_ctlr_link_up
[SCSI] bfa: Update the driver version to 3.0.23.0
[SCSI] bfa: BSG and User interface fixes.
[SCSI] bfa: Fix to avoid vport delete hang on request queue full scenario.
[SCSI] bfa: Move service parameter programming logic into firmware.
[SCSI] bfa: Revised Fabric Assigned Address(FAA) feature implementation.
[SCSI] bfa: Flash controller IOC pll init fixes.
[SCSI] bfa: Serialize the IOC hw semaphore unlock logic.
[SCSI] bfa: Modify ISR to process pending completions
[SCSI] bfa: Add fc host issue lip support
[SCSI] mpt2sas: remove extraneous sas_log_info messages
[SCSI] libfc: fcoe_transport_create fails in single-CPU environment
[SCSI] fcoe: reduce contention for fcoe_rx_list lock [v2]
...
Adapt comment and printk string after renaming sd_init_command to sd_prep_fn
Adapt comment and printk string after renaming sd_attach to sd_probe
Signed-off-by: Petr Uzel <petr.uzel@suse.cz>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQEcBAABAgAGBQJPZxSnAAoJEDeqqVYsXL0M0Y4IAMX0vrTVZbg6psA5/gMcWGRP
CkFXEQ8n0PL2SCaj6BoDqamJFe5Nc7dnqxM0fGawB4S9vr3rHhiOlwO+NbV9zFYC
2skBTpeL3sjgtN/jTBdfeeAa7xTYpu/XGyei0NS1A5c2AyMVXV0uYV2s4VNZxe44
tVIn1OEzM2giZ9EB1OZslDMrg5XXm3MBIUECP0LbWUhBm/35caSFKzMXRwhh7WiK
+AVmc2AZYtdEwuknDyiH7KlsaoB3vGL9pPrAUJzIgEhy2pOo2A7W72HfA4Fj+y6a
uF9HBS5zciMp1+sGWry62AjNbWgin9BRlozBEO/lJhIfMGDV1nXEIJsOkOgkdoE=
=1TxB
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
SCSI updates from James Bottomley:
"The update includes the usual assortment of driver updates (lpfc,
qla2xxx, qla4xxx, bfa, bnx2fc, bnx2i, isci, fcoe, hpsa) plus a huge
amount of infrastructure work in the SAS library and transport class
as well as an iSCSI update. There's also a new SCSI based virtio
driver."
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (177 commits)
[SCSI] qla4xxx: Update driver version to 5.02.00-k15
[SCSI] qla4xxx: trivial cleanup
[SCSI] qla4xxx: Fix sparse warning
[SCSI] qla4xxx: Add support for multiple session per host.
[SCSI] qla4xxx: Export CHAP index as sysfs attribute
[SCSI] scsi_transport: Export CHAP index as sysfs attribute
[SCSI] qla4xxx: Add support to display CHAP list and delete CHAP entry
[SCSI] iscsi_transport: Add support to display CHAP list and delete CHAP entry
[SCSI] pm8001: fix endian issue with code optimization.
[SCSI] pm8001: Fix possible racing condition.
[SCSI] pm8001: Fix bogus interrupt state flag issue.
[SCSI] ipr: update PCI ID definitions for new adapters
[SCSI] qla2xxx: handle default case in qla2x00_request_firmware()
[SCSI] isci: improvements in driver unloading routine
[SCSI] isci: improve phy event warnings
[SCSI] isci: debug, provide state-enum-to-string conversions
[SCSI] scsi_transport_sas: 'enable' phys on reset
[SCSI] libsas: don't recover end devices attached to disabled phys
[SCSI] libsas: fixup target_port_protocols for expanders that don't report sata
[SCSI] libsas: set attached device type and target protocols for local phys
...
The sd_check_event() will be called periodly even when the device is in the
suspended status to check media event. The scsi_test_unit_ready() in the
sd_check_event() will issue scsi cmd request. Issuing scsi request when the
device is in the suspeneded status will cause problem. For example, when a usb
flash disk in the suspended status, scsi_test_unit_ready() issues a scsi
request. The request will be returned as failed because the usb device is not
active. The patch adds scsi_autopm_get_device() and scsi_autopm_put_device()
around scsi_test_unit_ready() in the sd_check_event() to resolve such problem.
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
We have experienced several devices which fail in a fashion we do not
currently handle gracefully in SCSI. After a failure these devices will
respond to the SCSI primary command set (INQUIRY, TEST UNIT READY, etc.)
but any command accessing the storage medium will time out.
The following patch adds an callback that can be used by upper level
drivers to inspect the results of an error handling command. This in
turn has been used to implement additional checking in the SCSI disk
driver.
If a medium access command fails twice but TEST UNIT READY succeeds both
times in the subsequent error handling we will offline the device. The
maximum number of failed commands required to take a device offline can
be tweaked in sysfs.
Also add a new error flag to scsi_debug which allows this scenario to be
easily reproduced.
[jejb: fix up integer parsing to use kstrtouint]
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The provisioning_mode parameter in sysfs did not get updated in the
SD_LBP_DISABLE case. Make sure the provisioning mode is always set
correctly.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
This patch (as1507) adds a skip_vpd_pages flag to struct scsi_device
and a no_report_luns flag to struct scsi_target. The first is used to
control whether sd will look at VPD pages for information on block
provisioning, limits, and characteristics. The second prevents
scsi_report_lun_scan() from issuing a REPORT LUNS command.
The patch also modifies usb-storage to set the new flag bits for all
USB devices and targets, and to stop adjusting the scsi_level value.
Historically we have seen that USB mass-storage devices often don't
support VPD pages or REPORT LUNS properly. Until now we have avoided
these things by setting the scsi_level to SCSI_2 for all USB devices.
But this has the side effect of storing the LUN bits into the second
byte of each CDB, and now we have a report of a device which doesn't
like that. The best solution is to stop abusing scsi_level and
instead have separate flags for VPD pages and REPORT LUNS.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Perry Wagle <wagle@mac.com>
CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linux allows executing the SG_IO ioctl on a partition or LVM volume, and
will pass the command to the underlying block device. This is
well-known, but it is also a large security problem when (via Unix
permissions, ACLs, SELinux or a combination thereof) a program or user
needs to be granted access only to part of the disk.
This patch lets partitions forward a small set of harmless ioctls;
others are logged with printk so that we can see which ioctls are
actually sent. In my tests only CDROM_GET_CAPABILITY actually occurred.
Of course it was being sent to a (partition on a) hard disk, so it would
have failed with ENOTTY and the patch isn't changing anything in
practice. Still, I'm treating it specially to avoid spamming the logs.
In principle, this restriction should include programs running with
CAP_SYS_RAWIO. If for example I let a program access /dev/sda2 and
/dev/sdb, it still should not be able to read/write outside the
boundaries of /dev/sda2 independent of the capabilities. However, for
now programs with CAP_SYS_RAWIO will still be allowed to send the
ioctls. Their actions will still be logged.
This patch does not affect the non-libata IDE driver. That driver
however already tests for bd != bd->bd_contains before issuing some
ioctl; it could be restricted further to forbid these ioctls even for
programs running with CAP_SYS_ADMIN/CAP_SYS_RAWIO.
Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: James Bottomley <JBottomley@parallels.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Make it also print the command name when warning - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a wrapper around scsi_cmd_ioctl that takes a block device.
The function will then be enhanced to detect partition block devices
and, in that case, subject the ioctls to whitelisting.
Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: James Bottomley <JBottomley@parallels.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sd_shutdown is called during reboot/poweroff.
It may fail if parent device, for example, ata port, was runtime suspended.
Fix it by checking runtime PM status of sd.
Exit immediately if sd was runtime suspended already.
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
There is no reason to limit the SCSI disk namespace to sdXXX.
Add new error messages to sd_probe() in the unlikely event that either
ida_get_new() or sd_format_disk_name() fail.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
sd_ioctl() still use printk() for log output.
It should use sd_printk() instead of printk(), as well as other sd_*.
All SCSI messages should output via s*_printk() instead of printk().
Signed-off-by: Nao Nishijima <nao.nishijima.xt@hitachi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Some kernel transport drivers unconditionally disable
retrieval of the Caching mode page. One such for example is
the BBB/CBI transport over USB. Such a restraint is too
harsh as some devices do support the Caching mode
page. Unconditionally enabling the retrieval of this mode
page over those transports at their transport code level may
result in some devices failing and becoming unusable.
This patch implements a method of retrieving the Caching
mode page without unconditionally enabling it in the
transports which unconditionally disable it. The idea is to
ask for all supported pages, page code 0x3F, and then search
for the Caching mode page in the mode parameter data
returned. The sd driver already asks for all the mode pages
supported by the attached device by setting the page code to
0x3F in order to find out if the media is write protected by
reading the WP bit in the Device Specific Parameter
field. It then attempts to retrieve only the Caching mode
page by setting the page code to 8 and actually attempting
to retrieve it if and only if the transport allows it.
The method implemented here is that if the transport doesn't
allow retrieval of the Caching mode page and the device is
not RBC, then we ask for all pages supported by setting the
page code to 0x3F (similarly to how the WP bit is retrieved
above), and then we search for the Caching mode page in the
mode parameter data returned.
With this patch, devices over SATA, report this (no change):
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/465 GiB)
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Smart devices report their Caching mode page. This is a
change where we'd previously see the kernel making
assumption about the device's cache being write-through:
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: Attached scsi generic sg2 type 0
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] 610472646 4096-byte logical blocks: (2.50 TB/2.27 TiB)
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] Write Protect is off
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] Mode Sense: 47 00 10 08
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
And "dumb" devices over BBB, are correctly shown not to
support reporting the Caching mode page:
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] 15663104 512-byte logical blocks: (8.01 GB/7.46 GiB)
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] Write Protect is off
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] Mode Sense: 23 00 00 00
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] No Caching mode page present
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] Assuming drive cache: write through
Version 2 adds this:
Some devices don't support page code 0x3F, and others require a
fixed transfer length of 192 bytes. This single commit includes a
patch by Alan Stern which fixes this.
Reported-and-tested-by: Richard Senior <richard@r-senior.demon.co.uk>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
The block layer discard alignment is reported in bytes, not in units of
the logical block size.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
This reverts commit 24d720b726.
Previously we thought there was little possibility that devices would
crash with this, but some have been found.
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Ensure that we kill discard requests after logical block provisioning
has been disabled in sysfs.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SBC3r26 contains many changes to the Logical Block Provisioning
interfaces (formerly known as Thin Provisioning ditto). This patch
implements support for both the old and new schemes using the same
heuristic as before (whether the LBP VPD page is present).
The new code also allows the provisioning mode (i.e. choice of command)
to be overridden on a per-device basis via sysfs. Two additional modes
are supported in this version:
- WRITE SAME(10) with the UNMAP bit set
- WRITE SAME(10) without the UNMAP bit set. This allows us to support
devices that predate the TP/LBP enhancements in SBC3 and which work
by way zero-detection
Switching between modes has been consolidated in a helper function that
also updates the block layer topology according to the limitations of
the chosen command.
I experimented with trying WRITE SAME(16) if UNMAP fails, WRITE SAME(10)
if WRITE SAME(16) fails, etc. but found several devices that got
cranky. So for now we'll disable discard if one of the commands
fail. The user still has the option of selecting a different mode in
sysfs.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
SDEV_MEDIA_CHANGE event was first added by commit a341cd0f (SCSI: add
asynchronous event notification API) for SATA AN support and then
extended to cover generic media change events by commit 285e9670
([SCSI] sr,sd: send media state change modification events).
This event was mapped to block device in userland with all properties
stripped to simulate CHANGE event on the block device, which, in turn,
was used to trigger further userspace action on media change.
The recent addition of disk event framework kept this event for
backward compatibility but it turns out to be unnecessary and causes
erratic and inefficient behavior. The new disk event generates proper
events on the block devices and the compat events are mapped to block
device with all properties stripped, so the block device ends up
generating multiple duplicate events for single actual event.
This patch removes the compat event generation from both sr and sd as
suggested by Kay Sievers. Both existing and newer versions of udev
and the associated tools will behave better with the removal of these
events as they from the beginning were expecting events on the block
devices.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Replace sd_media_change() with sd_check_events().
* Move media removed logic into set_media_not_present() and
media_not_present() and set sdev->changed iff an existing media is
removed or the device indicates UNIT_ATTENTION.
* Make sd_check_events() sets sdev->changed if previously missing
media becomes present.
* Event is reported only if sdev->changed is set.
This makes media presence event reported if scsi_disk->media_present
actually changed or the device indicated UNIT_ATTENTION. For backward
compatibility, SDEV_EVT_MEDIA_CHANGE is generated each time
sd_check_events() detects media change event.
[jejb: fix boot failure]
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
block: ensure that completion error gets properly traced
blktrace: add missing probe argument to block_bio_complete
block cfq: don't use atomic_t for cfq_group
block cfq: don't use atomic_t for cfq_queue
block: trace event block fix unassigned field
block: add internal hd part table references
block: fix accounting bug on cross partition merges
kref: add kref_test_and_get
bio-integrity: mark kintegrityd_wq highpri and CPU intensive
block: make kblockd_workqueue smarter
Revert "sd: implement sd_check_events()"
block: Clean up exit_io_context() source code.
Fix compile warnings due to missing removal of a 'ret' variable
fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
cfq-iosched: don't check cfqg in choose_service_tree()
fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
cdrom: export cdrom_check_events()
sd: implement sd_check_events()
sr: implement sr_check_events()
...
Our current handling of medium error assumes that data is returned up
to the bad sector. This assumption holds good for all disk devices,
all DIF arrays and most ordinary arrays. However, an LSI array engine
was recently discovered which reports a medium error without returning
any data. This means that when we report good data up to the medium
error, we've reported junk originally in the buffer as good. Worse,
if the read consists of requested data plus a readahead, and the error
occurs in readahead, we'll just strip off the readahead and report
junk up to userspace as good data with no error.
The fix for this is to have the error position computation take into
account the amount of data returned by the driver using the scsi
residual data. Unfortunately, not every driver fills in this data,
but for those who don't, it's set to zero, which means we'll think a
full set of data was transferred and the behaviour will be identical
to the prior behaviour of the code (believe the buffer up to the error
sector). All modern drivers seem to set the residual, so that should
fix up the LSI failure/corruption case.
Reported-by: Douglas Gilbert <dgilbert@interlog.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This reverts commit c8d2e93735.
We run into merging problems with the SCSI tree, revert this one
so it can be handled by a postmerge tree there.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Some kernel transport drivers unconditionally disable
retrieval of the Caching mode page. One such for example is
the BBB/CBI transport over USB. Such a restraint is too
harsh as some devices do support the Caching mode
page. Unconditionally enabling the retrieval of this mode
page over those transports at their transport code level may
result in some devices failing and becoming unusable.
This patch implements a method of retrieving the Caching
mode page without unconditionally enabling it in the
transports which unconditionally disable it. The idea is to
ask for all supported pages, page code 0x3F, and then search
for the Caching mode page in the mode parameter data
returned. The sd driver already asks for all the mode pages
supported by the attached device by setting the page code to
0x3F in order to find out if the media is write protected by
reading the WP bit in the Device Specific Parameter
field. It then attempts to retrieve only the Caching mode
page by setting the page code to 8 and actually attempting
to retrieve it if and only if the transport allows it.
The method implemented here is that if the transport doesn't
allow retrieval of the Caching mode page and the device is
not RBC, then we ask for all pages supported by setting the
page code to 0x3F (similarly to how the WP bit is retrieved
above), and then we search for the Caching mode page in the
mode parameter data returned.
With this patch, devices over SATA, report this (no change):
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/465 GiB)
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Smart devices report their Caching mode page. This is a
change where we'd previously see the kernel making
assumption about the device's cache being write-through:
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: Attached scsi generic sg2 type 0
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] 610472646 4096-byte logical blocks: (2.50 TB/2.27 TiB)
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] Write Protect is off
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] Mode Sense: 47 00 10 08
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
And "dumb" devices over BBB, are correctly shown not to
support reporting the Caching mode page:
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] 15663104 512-byte logical blocks: (8.01 GB/7.46 GiB)
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] Write Protect is off
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] Mode Sense: 23 00 00 00
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] No Caching mode page present
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] Assuming drive cache: write through
Signed-off-by: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This patch (as1415) improves the formerly incomprehensible logic in
sd_media_changed() (the current code refers to "changed" as a state,
whereas in fact it is a relation between two states). It also adds a
big comment so that everyone can understand what is really going on.
The patch also improves efficiency by not reporting a media change
when no medium was ever present. If no medium was present the last
time we checked and there's still no medium, it's not necessary to
tell the caller that a change occurred. Doing so merely causes the
caller to attempt to revalidate a non-existent disk, which is a waste
of time.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>