Commit Graph

14736 Commits

Author SHA1 Message Date
Matthew R. Ochs 3065267a80 scsi: cxlflash: Add hardware queues attribute
As staging for supporting multiple hardware queues, add an attribute to show
and set the current number of hardware queues for the host. Support specifying
a hard limit or a CPU affinitized value. This will allow the number of
hardware queues to be tuned by a system administrator.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:55:42 -04:00
Uma Krishnan bfc0bab172 scsi: cxlflash: Support multiple hardware queues
Introduce multiple hardware queues to improve legacy I/O path performance.
Each hardware queue is comprised of a master context and associated I/O
resources. The hardware queues are initially implemented as a static array
embedded in the AFU. This will be transitioned to a dynamic allocation in a
later series to improve the memory footprint of the driver.

Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:55:42 -04:00
Matthew R. Ochs e2ef33fa59 scsi: cxlflash: Improve asynchronous interrupt processing
The method used to decode asynchronous interrupts involves unnecessary loops
to match up bits that are set with corresponding entries in the asynchronous
interrupt information table. This algorithm is wasteful and does not scale
well as new status bits are supported.

As an improvement, use the for_each_set_bit() service to iterate over the
asynchronous status bits and refactor the information table such that it can
be indexed by bit position.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:55:41 -04:00
Matthew R. Ochs fcc87e74a9 scsi: cxlflash: Fix warnings/errors
As a general cleanup, address all reasonable checkpatch warnings and
errors. These include enforcement of comment styles and including named
identifiers in function prototypes.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:55:41 -04:00
Matthew R. Ochs cd41e18daf scsi: cxlflash: Fix power-of-two validations
Validation statements to enforce assumptions about specific defines are not
being evaluated by the compiler due to the fact that they reside in a routine
that is not used. To activate them, call the routine as part of module
initialization. As an additional, related cleanup, remove the now-defunct
CXLFLASH_NUM_CMDS.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:55:41 -04:00
Matthew R. Ochs 50b787f723 scsi: cxlflash: Remove unnecessary DMA mapping
Devices supported by the cxlflash driver are fully coherent and do not require
a bus address mapping. Avoid unnecessary path length by using the virtual
address and length already present in the scatter-gather entry.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:55:41 -04:00
Matthew R. Ochs 323e33428e scsi: cxlflash: Fence EEH during probe
An EEH during probe can lead to a crash as the recovery thread races with the
probe thread. To avoid this issue, introduce new states to fence out EEH
recovery until probe has completed. Also ensure the reset wait queue is
flushed during device removal to avoid orphaned threads.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:55:41 -04:00
Matthew R. Ochs 1cd7fabc82 scsi: cxlflash: Support up to 4 ports
Update the driver to allow for future cards with 4 ports.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:55:41 -04:00
Matthew R. Ochs 5651807232 scsi: cxlflash: SISlite updates to support 4 ports
Update the SISlite header to support 4 ports as outlined in the SISlite
specification. Address fallout from structure renames and refreshed
organization throughout the driver. Determine the number of ports supported by
a card from the global port selection mask register reset value.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:55:41 -04:00
Matthew R. Ochs 0aa14887c6 scsi: cxlflash: Hide FC internals behind common access routine
As staging to support FC-related updates to the SISlite specification,
introduce helper routines to obtain references to FC resources that exist
within the global map. This will allow changes to the underlying global map
structure without impacting existing code paths.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:55:41 -04:00
Matthew R. Ochs 8fa4f1770d scsi: cxlflash: Remove port configuration assumptions
At present, the cxlflash driver only supports hardware with two FC ports. The
code was initially designed with this assumption and is dependent on having
two FC ports - adding more ports will break logic within the driver.

To mitigate this issue, remove the existing port assumptions and transition
the code to support more than two ports. As a side effect, clarify the
interpretation of the DK_CXLFLASH_ALL_PORTS_ACTIVE flag.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:55:41 -04:00
Matthew R. Ochs 78ae028e82 scsi: cxlflash: Support dynamic number of FC ports
Transition from a static number of FC ports to a value that is derived during
probe. For now, a static value is used but this will later be based on the
type of card being configured.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:55:41 -04:00
Matthew R. Ochs 3b225cd32a scsi: cxlflash: Update sysfs helper routines to pass config structure
As staging for future function, pass the config pointer instead of the AFU
pointer for port-related sysfs helper routines.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:55:41 -04:00
Matthew R. Ochs cba06e6de4 scsi: cxlflash: Implement IRQ polling for RRQ processing
Currently, RRQ processing takes place on hardware interrupt context. This can
be a heavy burden in some environments due to the overhead encountered while
completing RRQ entries. In an effort to improve system performance, use the
IRQ polling API to schedule this processing on softirq context.

This function will be disabled by default until starting values can be
established for the hardware supported by this driver.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:55:41 -04:00
Matthew R. Ochs f918b4a8e6 scsi: cxlflash: Serialize RRQ access and support offlevel processing
As further staging to support processing the HRRQ by other means, access to
the HRRQ needs to be serialized by a disabled lock. This will allow safe
access in other non-hardware interrupt contexts. In an effort to minimize the
period where interrupts are disabled, support is added to queue up commands
harvested from the RRQ such that they can be processed with hardware
interrupts enabled. While this doesn't offer any improvement with processing
on a hardware interrupt it will help when IRQ polling is supported and the
command completions can execute on softirq context.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:55:41 -04:00
Matthew R. Ochs 76a6ebbeef scsi: cxlflash: Separate RRQ processing from the RRQ interrupt handler
In order to support processing the HRRQ by other means (e.g. polling), the
processing portion of the current RRQ interrupt handler needs to be broken out
into a separate routine. This will allow RRQ processing from places other than
the RRQ hardware interrupt handler.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:55:41 -04:00
Colin Ian King 0d4bd8fea4 scsi: snic: fix spelling mistake: "Cann't" -> "Cannot"
Trivial fix to spelling mistake in SNIC_ERR error message text, one
cannot have "Cann't".

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:24:02 -04:00
Johannes Thumshirn e7661a8e5c scsi: return correct blkprep status code in case scsi_init_io() fails.
When instrumenting the SCSI layer to run into the
!blk_rq_nr_phys_segments(rq) case the following warning emitted from the
block layer:

blk_peek_request: bad return=-22

This happens because since commit fd3fc0b4d7 ("scsi: don't BUG_ON()
empty DMA transfers") we return the wrong error value from
scsi_prep_fn() back to the block layer.

[mkp: silenced checkpatch]

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: fd3fc0b4d7 scsi: don't BUG_ON() empty DMA transfers
Cc: <stable@vger.kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-13 22:10:06 -04:00
Johannes Berg 2d4bc93368 netlink: extended ACK reporting
Add the base infrastructure and UAPI for netlink extended ACK
reporting. All "manual" calls to netlink_ack() pass NULL for now and
thus don't get extended ACK reporting.

Big thanks goes to Pablo Neira Ayuso for not only bringing up the
whole topic at netconf (again) but also coming up with the nlattr
passing trick and various other ideas.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13 13:58:20 -04:00
James Bottomley 0e1bfea999 Merge remote-tracking branch 'mkp-scsi/4.11/scsi-fixes' into fixes 2017-04-12 07:29:17 -07:00
Xiang Chen e281f42f41 scsi: hisi_sas: controller reset for multi-bits ECC and AXI fatal errors
For 1 bit ECC errors, those errors can be recovered by hw. But for
multi-bits ECC and AXI errors, there are something wrong with whole
module or system, so try reset the controller to recover those errors
instead of calling panic().

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-11 22:01:04 -04:00
John Garry d3c4dd4e3d scsi: hisi_sas: fix NULL deference when TMF timeouts
If a TMF timeouts (maybe due to unlikely scenario of an expander being
unplugged when TMF for remote device is active), when we eventually try
to free the slot, we crash as we dereference the slot's task, which has
already been released.

As a fix, add checks in the slot release code for a NULL task.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-11 22:01:04 -04:00
John Garry 0844a3ff00 scsi: hisi_sas: add v2 hw internal abort timeout workaround
This patch is a workaround for a SoC bug where an internal abort command
may timeout. In v2 hw, the channel should become idle in order to finish
abort process. If the target side has been sending HOLD, host side
channel failed to complete the frame to send, and can not enter the idle
state. Then internal abort command will timeout.

As this issue is only in v2 hw, we deal with it in the hw layer.  Our
workaround solution is: If abort is not finished within a certain period
of time, we will check HOLD status. If HOLD has been sending, we will
send break command.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-11 22:01:04 -04:00
Xiaofei Tan 819cbf1895 scsi: hisi_sas: workaround SoC about abort timeout bug
This patch adds a workaround solution for a SoC bug which may cause SoC
logic fatal error when disabling a PHY.  Then we find internal abort IO
timeout may occur, and the controller IO breakpoint may be corrupted.

We work around this SoC bug by optimizing the flow of disabling a PHY.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-11 22:01:04 -04:00
Xiaofei Tan 32ccba52f1 scsi: hisi_sas: workaround a SoC SATA IO processing bug
This patch provides a workaround a SoC bug where SATA IPTTs for
different devices may conflict.

The workaround solution requests the following:
1. SATA device id must be even and not equal to SAS IPTT.
2. SATA device can not share the same IPTT with other SAS or
   SATA device.

Besides we shall consider IPTT value 0 is reserved for another SoC bug
(STP device open link at firstly after SAS controller reset).

To sum up, the solution is: Each SATA device uses independent and
continuous 32 even IPTT from 64 to 4094, then v2 hw can only support 63
SATA devices.  All SAS device(SSP/SMP devices) share odd IPTT value from
1 to 4095.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-11 22:01:04 -04:00
Xiaofei Tan c7b9d369e4 scsi: hisi_sas: workaround STP link SoC bug
After resetting the controller, the process of scanning SATA disks
attached to an expander may fail occasionally. The issue is that the
controller can't close the STP link created by target if the max link
time is 0.

To workaround this issue, we reject STP link after resetting the
controller, and change the corresponding PHY to accept STP link only
after receiving data.

We do this check in cq interrupt handler. In order not to reduce
efficiency, we use an variable to control whether we should check and
change PHY to accept STP link.

The function phys_reject_stp_links_v2_hw() should be called after
resetting the controller.

The solution of another SoC bug "SATA IO timeout", that also uses the
same register to control STP link, is not effective before the PHY
accepts STP link.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-11 22:01:03 -04:00
Mauricio Faria de Oliveira 785a470496 scsi: ipr: do not set DID_PASSTHROUGH on CHECK CONDITION
On a dual controller setup with multipath enabled, some MEDIUM ERRORs
caused both paths to be failed, thus I/O got queued/blocked since the
'queue_if_no_path' feature is enabled by default on IPR controllers.

This example disabled 'queue_if_no_path' so the I/O failure is seen at
the sg_dd program.  Notice that after the sg_dd test-case, both paths
are in 'failed' state, and both path/priority groups are in 'enabled'
state (not 'active') -- which would block I/O with 'queue_if_no_path'.

    # sg_dd if=/dev/dm-2 bs=4096 count=1 dio=1 verbose=4 blk_sgio=0
    <...>
    read(unix): count=4096, res=-1
    sg_dd: reading, skip=0 : Input/output error
    <...>

    # dmesg
    [...] sd 2:2:16:0: [sds] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [...] sd 2:2:16:0: [sds] Sense Key : Medium Error [current]
    [...] sd 2:2:16:0: [sds] Add. Sense: Unrecovered read error - recommend rewrite the data
    [...] sd 2:2:16:0: [sds] CDB: Read(10) 28 00 00 00 00 00 00 00 20 00
    [...] blk_update_request: I/O error, dev sds, sector 0
    [...] device-mapper: multipath: Failing path 65:32.
    <...>
    [...] device-mapper: multipath: Failing path 65:224.

    # multipath -l
    1IBM_IPR-0_59C2AE0000001F80 dm-2 IBM     ,IPR-0   59C2AE00
    size=5.2T features='0' hwhandler='1 alua' wp=rw
    |-+- policy='service-time 0' prio=0 status=enabled
    | `- 2:2:16:0 sds  65:32  failed undef running
    `-+- policy='service-time 0' prio=0 status=enabled
      `- 1:2:7:0  sdae 65:224 failed undef running

This is not the desired behavior. The dm-multipath explicitly checks
for the MEDIUM ERROR case (and a few others) so not to fail the path
(e.g., I/O to other sectors could potentially happen without problems).
See dm-mpath.c :: do_end_io_bio() -> noretry_error() !->! fail_path().

The problem trace is:

1) ipr_scsi_done()  // SENSE KEY/CHECK CONDITION detected, go to..
2) ipr_erp_start()  // ipr_is_gscsi() and masked_ioasc OK, go to..
3) ipr_gen_sense()  // masked_ioasc is IPR_IOASC_MED_DO_NOT_REALLOC,
                    // so set DID_PASSTHROUGH.

4) scsi_decide_disposition()  // check for DID_PASSTHROUGH and return
                              // early on, faking a DID_OK.. *instead*
                              // of reaching scsi_check_sense().

                              // Had it reached the latter, that would
                              // set host_byte to DID_MEDIUM_ERROR.

5) scsi_finish_command()
6) scsi_io_completion()
7) __scsi_error_from_host_byte()  // That would be converted to -ENODATA
<...>
8) dm_softirq_done()
9) multipath_end_io()
10) do_end_io()
11) noretry_error()  // And that is checked in dm-mpath :: noretry_error()
                     // which would cause fail_path() not to be called.

With this patch applied, the I/O is failed but the paths are not.  This
multipath device continues accepting more I/O requests without blocking.
(and notice the different host byte/driver byte handling per SCSI layer).

    # dmesg
    [...] sd 2:2:7:0: [sdaf] Done: SUCCESS Result: hostbyte=0x13 driverbyte=DRIVER_OK
    [...] sd 2:2:7:0: [sdaf] CDB: Read(10) 28 00 00 00 00 00 00 00 40 00
    [...] sd 2:2:7:0: [sdaf] Sense Key : Medium Error [current]
    [...] sd 2:2:7:0: [sdaf] Add. Sense: Unrecovered read error - recommend rewrite the data
    [...] blk_update_request: critical medium error, dev sdaf, sector 0
    [...] blk_update_request: critical medium error, dev dm-6, sector 0
    [...] sd 2:2:7:0: [sdaf] Done: SUCCESS Result: hostbyte=0x13 driverbyte=DRIVER_OK
    [...] sd 2:2:7:0: [sdaf] CDB: Read(10) 28 00 00 00 00 00 00 00 10 00
    [...] sd 2:2:7:0: [sdaf] Sense Key : Medium Error [current]
    [...] sd 2:2:7:0: [sdaf] Add. Sense: Unrecovered read error - recommend rewrite the data
    [...] blk_update_request: critical medium error, dev sdaf, sector 0
    [...] blk_update_request: critical medium error, dev dm-6, sector 0
    [...] Buffer I/O error on dev dm-6, logical block 0, async page read

    # multipath -l 1IBM_IPR-0_59C2AE0000001F80
    1IBM_IPR-0_59C2AE0000001F80 dm-6 IBM     ,IPR-0   59C2AE00
    size=5.2T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
    |-+- policy='service-time 0' prio=0 status=active
    | `- 2:2:7:0  sdaf 65:240 active undef running
    `-+- policy='service-time 0' prio=0 status=enabled
      `- 1:2:7:0  sdh  8:112  active undef running

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-11 21:40:19 -04:00
Johannes Thumshirn fcabb09e59 scsi: libfc: directly call ELS request handlers
Directly call ELS request handler functions in fc_lport_recv_els_req
instead of saving the pointer to the handler's receive function and then
later dereferencing this pointer.

This makes the code a bit more obvious.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-11 20:58:26 -04:00
Hannes Reinecke 97d27b0dd0 scsi: sg: close race condition in sg_remove_sfp_usercontext()
sg_remove_sfp_usercontext() is clearing any sg requests, but needs to
take 'rq_list_lock' when modifying the list.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-11 20:55:20 -04:00
Hannes Reinecke 109bade9c6 scsi: sg: use standard lists for sg_requests
'Sg_request' is using a private list implementation; convert it to
standard lists.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-11 20:55:20 -04:00
Johannes Thumshirn 28676d869b scsi: sg: check for valid direction before starting the request
Check for a valid direction before starting the request, otherwise we
risk running into an assertion in the scsi midlayer checking for valid
requests.

[mkp: fixed typo]

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Link: http://www.spinics.net/lists/linux-scsi/msg104400.html
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-11 20:55:20 -04:00
Hannes Reinecke 1bc0eb0446 scsi: sg: protect accesses to 'reserved' page array
The 'reserved' page array is used as a short-cut for mapping data,
saving us to allocate pages per request. However, the 'reserved' array
is only capable of holding one request, so this patch introduces a mutex
for protect 'sg_fd' against concurrent accesses.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-11 20:55:20 -04:00
Hannes Reinecke 136e57bf43 scsi: sg: remove 'save_scat_len'
Unused.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-11 20:55:20 -04:00
Hannes Reinecke 745dfa0d8e scsi: sg: disable SET_FORCE_LOW_DMA
The ioctl SET_FORCE_LOW_DMA has never worked since the initial git
check-in, and the respective setting is nowadays handled correctly. So
disable it entirely.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-11 20:55:20 -04:00
Guilherme G. Piccoli 911e572e98 scsi: aacraid: fix PCI error recovery path
During a PCI error recovery, if aac_check_health() is not aware that a
PCI error happened and we have an offline PCI channel, it might trigger
some errors (like NULL pointer dereference) and inhibit the error
recovery process to complete.

This patch makes the health check procedure aware of PCI channel issues,
and in case of error recovery process, the function
aac_adapter_check_health() returns -1 and let the recovery process to
complete successfully. This patch was tested on upstream kernel
v4.11-rc5 in PowerPC ppc64le architecture with adapter 9005:028d
(VID:DID) - the error recovery procedure was able to recover fine.

Fixes: 5c63f7f710 ("aacraid: Added EEH support")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-11 20:45:59 -04:00
Colin Ian King 1fdcd2d1da scsi: qla2xxx: remove some redundant pointer assignments
There are several local or function parameter pointers that are being
assigned NULL after a kfree where and these have no effect and hence can
be removed.

Fixes various cppcheck warnings:

"Assignment of function parameter has no effect outside the
function. Did you forget dereferencing it"

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-11 20:42:43 -04:00
NeilBrown 717a94b5fc sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags()
It is not safe for one thread to modify the ->flags
of another thread as there is no locking that can protect
the update.

So tsk_restore_flags(), which takes a task pointer and modifies
the flags, is an invitation to do the wrong thing.

All current users pass "current" as the task, so no developers have
accepted that invitation.  It would be best to ensure it remains
that way.

So rename tsk_restore_flags() to current_restore_flags() and don't
pass in a task_struct pointer.  Always operate on current->flags.

Signed-off-by: NeilBrown <neilb@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-11 09:06:32 +02:00
Linus Torvalds 78d91a75b4 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Here's a pull request for 4.11-rc, fixing a set of issues mostly
  centered around the new scheduling framework. These have been brewing
  for a while, but split up into what we absolutely need in 4.11, and
  what we can defer until 4.12. These are well tested, on both single
  queue and multiqueue setups, and with and without shared tags. They
  fix several hangs that have happened in testing.

  This is obviously larger than I would have preferred at this point in
  time, but I don't think we can shave much off this and still get the
  desired results.

  In detail, this pull request contains:

   - a set of five fixes for NVMe, mostly from Christoph and one from
     Roland.

   - a series from Bart, fixing issues with dm-mq and SCSI shared tags
     and scheduling. Note that one of those patches commit messages may
     read like an optimization, but it is in fact an important fix for
     queue restarts in particular.

   - a series from Omar, most importantly fixing a hang with multiple
     hardware queues when we fail to get a driver tag. Another important
     fix in there is for resizing hardware queues, which nbd does when
     handling multiple sockets for one connection.

   - fixing an imbalance in putting the ctx for hctx request allocations
     from Minchan"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: Restart a single queue if tag sets are shared
  dm rq: Avoid that request processing stalls sporadically
  scsi: Avoid that SCSI queues get stuck
  blk-mq: Introduce blk_mq_delay_run_hw_queue()
  blk-mq: remap queues when adding/removing hardware queues
  blk-mq-sched: fix crash in switch error path
  blk-mq-sched: set up scheduler tags when bringing up new queues
  blk-mq-sched: refactor scheduler initialization
  blk-mq: use the right hctx when getting a driver tag fails
  nvmet: fix byte swap in nvmet_parse_io_cmd
  nvmet: fix byte swap in nvmet_execute_write_zeroes
  nvmet: add missing byte swap in nvmet_get_smart_log
  nvme: add missing byte swap in nvme_setup_discard
  nvme: Correct NVMF enum values to match NVMe-oF rev 1.0
  block: do not put mq context in blk_mq_alloc_request_hctx
2017-04-08 11:56:58 -07:00
Martin K. Petersen bcd069bb25 scsi: sd: Remove LBPRZ dependency for discards
Separating discards and zeroout operations allows us to remove the LBPRZ
block zeroing constraints from discards and honor the device preferences
for UNMAP commands.

If supported by the device, we'll also choose UNMAP over one of the
WRITE SAME variants for discards.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Martin K. Petersen e6bd931284 scsi: sd: Separate zeroout and discard command choices
Now that zeroout and discards are distinct operations we need to
separate the policy of choosing the appropriate command. Create a
zeroing_mode which can be one of:

write:			Zeroout assist not present, use regular WRITE
writesame:		Allow WRITE SAME(10/16) with a zeroed payload
writesame_16_unmap:	Allow WRITE SAME(16) with UNMAP
writesame_10_unmap:	Allow WRITE SAME(10) with UNMAP

The last two are conditional on the device being thin provisioned with
LBPRZ=1 and LBPWS=1 or LBPWS10=1 respectively.

Whether to set the UNMAP bit or not depends on the REQ_NOUNMAP flag. And
if none of the _unmap variants are supported, regular WRITE SAME will be
used if the device supports it.

The zeroout_mode is exported in sysfs and the detected mode for a given
device can be overridden using the string constants above.

With this change in place we can now issue WRITE SAME(16) with UNMAP set
for block zeroing applications that require hard guarantees and
logical_block_size granularity. And at the same time use the UNMAP
command with the device's preferred granulary and alignment for discard
operations.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Christoph Hellwig 48920ff2a5 block: remove the discard_zeroes_data flag
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
kill this hack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Christoph Hellwig e4b878371b sd: implement unmapping Write Zeroes
Try to use a write same with unmap bit variant if the device supports it
and the caller allows for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Christoph Hellwig 02d261034f sd: implement REQ_OP_WRITE_ZEROES
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Christoph Hellwig 81d926e8b5 sd: split sd_setup_discard_cmnd
Split sd_setup_discard_cmnd into one function per provisioning type.  While
this creates some very slight duplication of boilerplate code it keeps the
code modular for additions of new provisioning types, and for reusing the
write same functions for the upcoming scsi implementation of the Write Zeroes
operation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Martin K. Petersen 7c856152cb scsi: sd: Fix capacity calculation with 32-bit sector_t
We previously made sure that the reported disk capacity was less than
0xffffffff blocks when the kernel was not compiled with large sector_t
support (CONFIG_LBDAF). However, this check assumed that the capacity
was reported in units of 512 bytes.

Add a sanity check function to ensure that we only enable disks if the
entire reported capacity can be expressed in terms of sector_t.

Cc: <stable@vger.kernel.org>
Reported-by: Steve Magnani <steve.magnani@digidescorp.com>
Cc: Bart Van Assche <Bart.VanAssche@sandisk.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-07 17:07:16 -04:00
Sawan Chandak bf6061b17a scsi: qla2xxx: Add fix to read correct register value for ISP82xx.
Add fix to read correct register value for ISP82xx, during check for
register disconnect.ISP82xx has different base register.

Fixes: a465537ad1 ("qla2xxx: Disable the adapter and skip error recovery in case of register disconnect")
Signed-off-by: Sawan Chandak <sawan.chandak@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-07 17:07:15 -04:00
Chad Dupuis 8eaf7dfcfc scsi: qedf: Fix crash due to unsolicited FIP VLAN response.
We need to initialize qedf->fipvlan_compl in __qedf_probe so that if we
receive an unsolicited FIP VLAN response, the system doesn't crash due
to trying to complete an uninitialized completion.

Also add a check to see if there are any waiters on the completion so we
don't inadvertantly kick start the discovery process due to the
unsolicited frame.

Fixed the crash:

<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [<ffffffff8105ed71>] __wake_up_common+0x31/0x90
<4>PGD 0
<4>Oops: 0000 [#1] SMP
<4>last sysfs file: /sys/devices/system/cpu/online
<4>CPU 7
<4>Modules linked in: autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs bnx2fc cnic fcoe 8021q garp stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat uinput ipmi_devintf microcode power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support dcdbas sg joydev sb_edac edac_core lpc_ich mfd_core shpchp tg3 ptp pps_core ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif qedi(U) iscsi_boot_sysfs libiscsi scsi_transport_iscsi uio qedf(U) libfcoe libfc scsi_transport_fc scsi_tgt qede(U) qed(U) ahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
<4>
<4>Pid: 1485, comm: qedf_11_ll2 Not tainted 2.6.32-642.el6.x86_64 #1 Dell Inc. PowerEdge R730/0599V5
<4>RIP: 0010:[<ffffffff8105ed71>]  [<ffffffff8105ed71>] __wake_up_common+0x31/0x90
<4>RSP: 0018:ffff881068a83d50  EFLAGS: 00010086
<4>RAX: ffffffffffffffe8 RBX: ffff88106bf42de0 RCX: 0000000000000000
<4>RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88106bf42de0
<4>RBP: ffff881068a83d90 R08: 0000000000000000 R09: 00000000fffffffe
<4>R10: 0000000000000000 R11: 000000000000000b R12: 0000000000000286
<4>R13: ffff88106bf42de8 R14: 0000000000000000 R15: 0000000000000000
<4>FS:  0000000000000000(0000) GS:ffff88089c460000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 0000000000000000 CR3: 0000000001a8d000 CR4: 00000000001407e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process qedf_11_ll2 (pid: 1485, threadinfo ffff881068a80000, task ffff881068a70040)
<4>Stack:
<4> ffff88106ef00090 0000000300000001 ffff881068a83d90 ffff88106bf42de0
<4><d> 0000000000000286 ffff88106bf42dd8 ffff88106bf40a50 0000000000000002
<4><d> ffff881068a83dc0 ffffffff810634c7 ffff881000000003 000000000000000b
<4>Call Trace:
<4> [<ffffffff810634c7>] complete+0x47/0x60
<4> [<ffffffffa01d37e7>] qedf_fip_recv+0x1c7/0x450 [qedf]
<4> [<ffffffffa01cb3cb>] qedf_ll2_recv_thread+0x33b/0x510 [qedf]
<4> [<ffffffffa01cb090>] ? qedf_ll2_recv_thread+0x0/0x510 [qedf]
<4> [<ffffffff810a662e>] kthread+0x9e/0xc0
<4> [<ffffffff8100c28a>] child_rip+0xa/0x20
<4> [<ffffffff810a6590>] ? kthread+0x0/0xc0
<4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
<4>Code: 41 56 41 55 41 54 53 48 83 ec 18 0f 1f 44 00 00 89 75 cc 89 55 c8 4c 8d 6f 08 48 8b 57 08 41 89 cf 4d 89 c6 48 8d 42 e8 49 39 d5 <48> 8b 58 18 74 3f 48 83 eb 18 eb 0a 0f 1f 00 48 89 d8 48 8d 5a
<1>RIP  [<ffffffff8105ed71>] __wake_up_common+0x31/0x90
<4> RSP <ffff881068a83d50>
<4>CR2: 0000000000000000

Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-07 17:07:15 -04:00
Martin K. Petersen a00a786251 scsi: sr: Sanity check returned mode data
Kefeng Wang discovered that old versions of the QEMU CD driver would
return mangled mode data causing us to walk off the end of the buffer in
an attempt to parse it. Sanity check the returned mode sense data.

Cc: <stable@vger.kernel.org>
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-07 17:07:14 -04:00
Fam Zheng 6780414519 scsi: sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
If device reports a small max_xfer_blocks and a zero opt_xfer_blocks, we
end up using BLK_DEF_MAX_SECTORS, which is wrong and r/w of that size
may get error.

[mkp: tweaked to avoid setting rw_max twice and added typecast]

Cc: <stable@vger.kernel.org> # v4.4+
Fixes: ca369d51b3 ("block/sd: Fix device-imposed transfer length limits")
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-07 17:05:15 -04:00
Jens Axboe 65f619d253 Merge branch 'for-linus' into for-4.12/block
We've added a considerable amount of fixes for stalls and issues
with the blk-mq scheduling in the 4.11 series since forking
off the for-4.12/block branch. We need to do improvements on
top of that for 4.12, so pull in the previous fixes to make
our lives easier going forward.

Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07 12:45:20 -06:00
Bart Van Assche 36e3cf2739 scsi: Avoid that SCSI queues get stuck
If a .queue_rq() function returns BLK_MQ_RQ_QUEUE_BUSY then the block
driver that implements that function is responsible for rerunning the
hardware queue once requests can be queued again successfully.

commit 52d7f1b5c2 ("blk-mq: Avoid that requeueing starts stopped
queues") removed the blk_mq_stop_hw_queue() call from scsi_queue_rq()
for the BLK_MQ_RQ_QUEUE_BUSY case. Hence change all calls to functions
that are intended to rerun a busy queue such that these examine all
hardware queues instead of only stopped queues.

Since no other functions than scsi_internal_device_block() and
scsi_internal_device_unblock() should ever stop or restart a SCSI
queue, change the blk_mq_delay_queue() call into a
blk_mq_delay_run_hw_queue() call.

Fixes: commit 52d7f1b5c2 ("blk-mq: Avoid that requeueing starts stopped queues")
Fixes: commit 7e79dadce2 ("blk-mq: stop hardware queue in blk_mq_delay_queue()")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Long Li <longli@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07 12:27:08 -06:00
Nicholas Mc Guire bf5ea6fba7 scsi: qla4xxx: drop redundant init_completion
The redundant init_completion() here seems to be a cut&past error as
struct scsi_qla_host only has 4 completion elements to initialize, thus
the duplicate init_completion(disable_acb_comp) is simply removed.

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 13:14:32 -04:00
Hannes Reinecke a06586325f scsi: make asynchronous aborts mandatory
There hasn't been any reports for HBAs where asynchronous abort
would not work, so we should make it mandatory and remove
the fallback.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 13:07:33 -04:00
Hannes Reinecke 2171b6d08b scsi: make scsi_eh_scmd_add() always succeed
scsi_eh_scmd_add() currently only will fail if no
error handler thread is started (which will never be the
case) or if the state machine encounters an illegal transition.

But if we're encountering an invalid state transition
chances is we cannot fixup things with the error handler.
So better add a WARN_ON for illegal host states and
make scsi_dh_scmd_add() a void function.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 13:07:33 -04:00
Hannes Reinecke 8e8c9d01c5 scsi: make eh_eflags persistent
If a failed command is retried and fails again we need
to enter SCSI EH, otherwise we will never be able to
recover the command.
To detect this situation we must not clear scmd->eh_eflags
when EH finishes but rather make it persistent throughout
the lifetime of the command.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 13:07:33 -04:00
Christoph Hellwig 909657615d scsi: libsas: allow async aborts
We now first try to call ->eh_abort_handler from a work queue, but libsas
was always failing that for no good reason.  Allow async aborts.

Reviewed-by: Johannes Thumshirn <jth@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 13:07:32 -04:00
Hannes Reinecke 1bcb93047e scsi: always send command aborts
When a command has timed out we always should be sending an
abort; with the previous code a failed abort might signal
SCSI EH to start, and all other timed out commands will
never be aborted, even though they might belong to a
different ITL nexus.

Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 13:07:32 -04:00
Hannes Reinecke e8f8d50e07 scsi: sd: Return SUCCESS in sd_eh_action() after device offline
If sd_eh_action() decides to take the device offline there is
no point in returning FAILED, as taking the device offline
is the ultimate step in SCSI EH anyway.
So further escalation via SCSI EH is not likely to make a
difference and we can as well return SUCCESS.

Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 13:07:32 -04:00
Hannes Reinecke 7a38dc0bfb scsi: scsi_error: count medium access timeout only once per EH run
The current medium access timeout counter will be increased for
each command, so if there are enough failed commands we'll hit
the medium access timeout for even a single device failure and
the following kernel message is displayed:

sd H:C:T:L: [sdXY] Medium access timeout failure. Offlining disk!

Fix this by making the timeout per EH run, ie the counter will
only be increased once per device and EH run.

Fixes: 18a4d0a ("[SCSI] Handle disk devices which can not process medium access commands")
Cc: Ewan Milne <emilne@redhat.com>
Cc: Lawrence Obermann <loberman@redhat.com>
Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Cc: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 13:07:32 -04:00
Christoph Hellwig 104d9c7f94 scsi: csiostor: switch to pci_alloc_irq_vectors
And get automatic MSI-X affinity for free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 12:57:03 -04:00
Mauricio Faria de Oliveira 75106523f3 scsi: ses: don't get power status of SES device slot on probe
The commit 08024885a2 ("ses: Add power_status to SES device slot")
introduced the 'power_status' attribute to enclosure components and
the associated callbacks.

There are 2 callbacks available to get the power status of a device:
1) ses_get_power_status() for 'struct enclosure_component_callbacks'
2) get_component_power_status() for the sysfs device attribute
(these are available for kernel-space and user-space, respectively.)

However, despite both methods being available to get power status
on demand, that commit also introduced a call to get power status
in ses_enclosure_data_process().

This dramatically increased the total probe time for SCSI devices
on larger configurations, because ses_enclosure_data_process() is
called several times during the SCSI devices probe and loops over
the component devices (but that is another problem, another patch).

That results in a tremendous continuous hammering of SCSI Receive
Diagnostics commands to the enclosure-services device, which does
delay the total probe time for the SCSI devices __significantly__:

  Originally, ~34 minutes on a system attached to ~170 disks:

    [ 9214.490703] mpt3sas version 13.100.00.00 loaded
    ...
    [11256.580231] scsi 17:0:177:0: qdepth(16), tagged(1), simple(0),
                   ordered(0), scsi_level(6), cmd_que(1)

  With this patch, it decreased to ~2.5 minutes -- a 13.6x faster

    [ 1002.992533] mpt3sas version 13.100.00.00 loaded
    ...
    [ 1151.978831] scsi 11:0:177:0: qdepth(16), tagged(1), simple(0),
                   ordered(0), scsi_level(6), cmd_que(1)

Back to the commit discussion.. on the ses_get_power_status() call
introduced in ses_enclosure_data_process(): impact of removing it.

That may possibly be in place to initialize the power status value
on device probe.  However, those 2 functions available to retrieve
that value _do_ automatically refresh/update it.  So the potential
benefit would be a direct access of the 'power_status' field which
does not use the callbacks...

But the only reader of 'struct enclosure_component::power_status'
is the get_component_power_status() callback for sysfs attribute,
and it _does_ check for and call the .get_power_status callback,
(which indeed is defined and implemented by that commit), so the
power status value is, again, automatically updated.

So, the remaining potential for a direct/non-callback access to
the power_status attribute would be out-of-tree modules -- well,
for those, if they are for whatever reason interested in values
that are set during device probe and not up-to-date by the time
they need it.. well, that would be curious.

Well, to handle that more properly, set the initial power state
value to '-1' (i.e., uninitialized) instead of '1' (power 'on'),
and check for it in that callback which may do an direct access
to the field value _if_ a callback function is not defined.

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Fixes: 08024885a2 ("ses: Add power_status to SES device slot")
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 12:48:05 -04:00
Bart Van Assche c02465fa13 scsi: osd_uld: Check scsi_device_get() return value
scsi_device_get() can fail. Hence check its return value.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06 12:43:12 -04:00
David S. Miller 6f14f443d3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Mostly simple cases of overlapping changes (adding code nearby,
a function whose name changes, for example).

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-06 08:24:51 -07:00
Al Viro bf7af0cea8 esas2r: don't open-code memdup_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-06 02:11:00 -04:00
Christoph Hellwig 64c7f1d157 block, scsi: move the retries field to struct scsi_request
Instead of bloating the generic struct request with it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-05 12:05:08 -06:00
Johannes Thumshirn 8690218a4c scsi: sas: remove sas_domain_release_transport
sas_domain_release_transport is unused since at least v3.13, remove it.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-04 20:16:38 -04:00
Milan P Gandhi 5a68a1c29f scsi: qla2xxx: Fix typo in driver
Signed-off-by: Milan P Gandhi <mgandhi@redhat.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-04 20:13:57 -04:00
Arnd Bergmann 44a5b97712 scsi: advansys: fix uninitialized data access
gcc-7.0.1 now warns about a previously unnoticed access of uninitialized
struct members:

drivers/scsi/advansys.c: In function 'AscMsgOutSDTR':
drivers/scsi/advansys.c:3860:26: error: '*((void *)&sdtr_buf+5)' may be used uninitialized in this function [-Werror=maybe-uninitialized]
         ((ushort)s_buffer[i + 1] << 8) | s_buffer[i]);
                          ^
drivers/scsi/advansys.c:3860:26: error: '*((void *)&sdtr_buf+7)' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/scsi/advansys.c:3860:26: error: '*((void *)&sdtr_buf+5)' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/scsi/advansys.c:3860:26: error: '*((void *)&sdtr_buf+7)' may be used uninitialized in this function [-Werror=maybe-uninitialized]

The code has existed in this exact form at least since v2.6.12, and the
warning seems correct. This uses named initializers to ensure we
initialize all members of the structure.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-04 19:39:39 -04:00
Al Viro bee3f412d6 Merge branch 'parisc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux into uaccess.parisc 2017-04-02 10:33:48 -04:00
Linus Torvalds 7d34ddbe47 SCSI fixes on 20170401
Thirteen small fixes: The hopefully final effort to get the lpfc nvme
 kconfig problems sorted, there's one important sg fix (user can induce
 read after end of buffer) and one minor enhancement (adding an extra
 PCI ID to qedi). The rest are a set of minor fixes, which mostly occur
 as user visible in error legs or on specific devices.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJY39E7AAoJEAVr7HOZEZN4WskQALCWMroFglXHofUbwMrKH4gv
 on+6k1k+z0iiUiidpfdPb0gL1D4fSBcDl60fQbSUFFlGiM1/MTpyv10QhqlER1al
 74ZLJ6cavw52WBLJccQBCGeYBIDdvAGPLfcoIHPfa4+6I2yXe1dsAY6T3qDUHsGW
 amEG62qjbpUkrPhyQq+ehTxU4itam2JH17eTis4xVCG0vXuvlp4igecbErzwOZu7
 zhpTvJZezsfiCXmPGyqbyRU1IRU5WglznwiZ7duNtTIFD8vQ9dugs/QH88VL31rh
 25uWiJMn9waC2o4wHuRzHb5VOFQxkhanAc0y+f3I4pxTdX4d5yN7TeNtxUxZM1z7
 CEB4QFVns8YF68WZaodCVqn06uX4REwdIs6n7KTsQT9JGQbnmEFoGLNe1/wCgdGZ
 16gH+0visFCnZQpCDbuFsUcddFglAT1EtvNLbPxKk3sKxAwqZJ+e5Lon7CX9s89f
 rPlvRb68Nw/ctxgXffM2ecRddpvHTeRgy1XBv/STMhGOzJV5k6S3nXPZfyq5kWdH
 Fv9MUu3qu2rVplPKydrOlXkz40a2cl/jS0M8UXueoJwE/JkvoiwquzThLO1BB3W/
 5Dc1NVii67qPlEJ8mAsNYiPnZww7t8IRlHlD+H7/pSo0RE2C4jhNmoZMyEjwlmex
 Fq13DkTbBhIZ0mNCwQ2J
 =umUo
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Thirteen small fixes: The hopefully final effort to get the lpfc nvme
  kconfig problems sorted, there's one important sg fix (user can induce
  read after end of buffer) and one minor enhancement (adding an extra
  PCI ID to qedi). The rest are a set of minor fixes, which mostly occur
  as user visible in error legs or on specific devices"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: remove the duplicated checking for supporting clkscaling
  scsi: lpfc: fix building without debugfs support
  scsi: lpfc: Fix PT2PT PRLI reject
  scsi: hpsa: fix volume offline state
  scsi: libsas: fix ata xfer length
  scsi: scsi_dh_alua: Warn if the first argument of alua_rtpg_queue() is NULL
  scsi: scsi_dh_alua: Ensure that alua_activate() calls the completion function
  scsi: scsi_dh_alua: Check scsi_device_get() return value
  scsi: sg: check length passed to SG_NEXT_CMD_LEN
  scsi: ufshcd-platform: remove the useless cast in ERR_PTR/IS_ERR
  scsi: qedi: Add PCI device-ID for QL41xxx adapters.
  scsi: aacraid: Fix potential null access
  scsi: qla2xxx: Fix crash in qla2xxx_eh_abort on bad ptr
2017-04-01 20:07:31 -07:00
Eric Biggers f363b089be blk-mq: constify struct blk_mq_ops
Constify all instances of blk_mq_ops, as they are never modified.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-31 08:28:58 -06:00
Christoph Hellwig 831488669a scsi: be2iscsi: switch to pci_alloc_irq_vectors
And get automatic MSI-X affinity for free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-30 11:37:16 -04:00
Szymon Mielczarek dbd34a61c3 Revert "scsi: ufs: add queries retry mechanism"
This reverts commit 61e073590b.

The patch introduced redundant query retries as we already had such
mechanism provided with _retry functions.  Both ufshcd_read_desc and
ufshcd_read_unit_desc_param functions call ufshcd_query_descriptor_retry
wrapper.

Signed-off-by: Szymon Mielczarek <szymonx.mielczarek@intel.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:51:40 -04:00
Don Brace 5765d180fd scsi: hpsa: change driver version
Reviewed-by: Gerry Morong <gerry.morong@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:50:09 -04:00
Don Brace 7f1974a76d scsi: hpsa: update pci ids
Reviewed-by: Gerry Morong <gerry.morong@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:49:48 -04:00
Arnd Bergmann 8bb74d3661 scsi: hisi_sas: fix SATA dependency
Removing the 'select SCSI_SAS_LIBSAS' statement in Kconfig resulted in a
link failure in configurations that have hisi_sas built-in but libsas as
a loadable module:

drivers/scsi/built-in.o: In function `hisi_sas_scan_finished':
hisi_sas_main.c:(.text+0x37ce9): undefined reference to `sas_drain_work'
drivers/scsi/built-in.o: In function `hisi_sas_slave_configure':
hisi_sas_main.c:(.text+0x37d17): undefined reference to `sas_slave_configure'
hisi_sas_main.c:(.text+0x37d40): undefined reference to `sas_change_queue_depth'
drivers/scsi/built-in.o: In function `hisi_sas_remove':

All other libsas users have the 'select' statement, so we should do the
same here for consistency. For all I can tell, the patch that added the
sata softreset does not actually introduce a dependency on SCSI_SAS_ATA
but instead adds calls into libata itself, so we can express that with a
more specific dependency.

We cannot have 'select SCSI_SAS_LIBSAS; depends on SCSI_SAS_ATA' as that
would cause a dependency loop.

Fixes: 7c594f0407 ("scsi: hisi_sas: add softreset function for SATA disk")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:44:53 -04:00
Tomohiro Kusumi d985c6eaa8 scsi: ufs: just use sizeof() for snprintf()
Not much reason to use ARRAY_SIZE() when we know it's for a C string.

Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:42:21 -04:00
Tomohiro Kusumi 0b9961be10 scsi: ufs: remove deprecated enum for hw interrupt
These flags are no longer needed after 2fbd009b in 2013.

Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:42:21 -04:00
Tomohiro Kusumi f6b254513c scsi: ufs: add missing macros for register bits from UFSHCI spec
Add macros for register bits that can be found in JESD223C (v2.1).

Not all registers are defined in ufshci.h (i.e. some are unused whether
macros are defined or undefined), but all the bits for those registers
that are already defined should appear here.

Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:42:21 -04:00
Tomohiro Kusumi 9c490d2dd3 scsi: ufs: non functional macro fix
Not having () isn't likely to do any harm in this case, but all the
other macros below do have it. Also add "are" in a comment.

Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:42:21 -04:00
Tomohiro Kusumi 4a8eec2bac scsi: ufs: use existing macro CONTROLLER_ENABLE to test register bit
Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:42:21 -04:00
Tomohiro Kusumi c9e6010b4e scsi: ufs: make ufshcd_is_{device_present, hba_active}() return bool
ufshcd driver generally uses bool for is_xxx type things instead of int,
so conform to its style.

Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:42:21 -04:00
Colin Ian King a28b259b43 scsi: hisi_sas: add missing break in switch statement
It appears that a break in the TRANS_TX_OPEN_CNX_ERR_NO_DESTINATION case
got accidentally removed in an earlier commit, as it stands, the
ts->stat and ts->open_rej_reason are being updated twice for this case
which looks incorrect.  Fix this by adding in the missing break
statement.

Detected by CoverityScan, CID#1422110 ("Missing break in switch")

Fixes: 634a9585f4 ("scsi: hisi_sas: process error codes according to their priority")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:38:53 -04:00
James Bottomley 0917ac4f53 Merge remote-tracking branch 'mkp-scsi/4.11/scsi-fixes' into fixes 2017-03-29 10:10:30 -04:00
Al Viro db68ce10c4 new helper: uaccess_kernel()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-28 16:43:25 -04:00
Jitendra Bhivare cbe9fc8594 scsi: be2iscsi: Update driver version
Version 11.4.0.0

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:42 -04:00
Jitendra Bhivare 942b76542e scsi: be2iscsi: Update Copyright
Update Broadcom Copyright markings in all files.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:41 -04:00
Jitendra Bhivare 0ddee50e3f scsi: be2iscsi: Check size before copying ASYNC handle
Data in buffers are gathered into a single buffer before giving to iSCSI
layer. Though less likely to have payload more than 8K in ASYNC PDU, the
data length is provide by FW and check is missing for overrun.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:41 -04:00
Jitendra Bhivare ba6983a745 scsi: be2iscsi: Remove free_list for ASYNC handles
With previous patch adding ASYNC Rx buffers to free_list is not
required.  Remove all free_list related operations.

Add in_use to track if buffer posted is being processed by driver and
purge all buffers received for connection if found so.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:40 -04:00
Jitendra Bhivare 1e2931f134 scsi: be2iscsi: Use num_cons field in Rx CQE
FW runs out of buffer if buffers are not posted back soon.  ASYNC Rx CQE
indicates that FW has consumed 8 RQEs.  Use it to post back buffers
instead of waiting for buffers to be processed and freed by driver.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:39 -04:00
Jitendra Bhivare fecc382469 scsi: be2iscsi: Increase HDQ default queue size
Currently, ASYNC PDU default queue size is set to max connections.  This
leaves only one buffer per connection for any ASYNC PDUs from targets.

Double the size of the default queue.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:39 -04:00
Jitendra Bhivare 90e96313a9 scsi: scsi_transport_iscsi: Use flush_work in iscsi_remove_session
scsi_flush_work flushes workqueue for the Scsi_Host.  In iSCSI offload
enabled host, this would wait for all other sessions under the host.

Use flush_work for the session being removed instead.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:33 -04:00
Jitendra Bhivare d1e1d63b32 scsi: be2iscsi: Replace spin_unlock_bh with spin_lock
spin_unlock_bh back_lock is used in beiscsi_eh_device_reset instead of
spin_lock.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:04 -04:00
Jitendra Bhivare 49fc5152f5 scsi: be2iscsi: Fix closing of connection
CID needs to be freed even when invalidate or upload connection fails.
Attempt to close connection 3 times before freeing CID.

Set cleanup_type to INVALIDATE instead of force TCP_RST.  This
unnecessarily is terminating connection with reset instead of gracefully
closing it.

Set save_cfg to 0 - session not to be saved on flash.

Add delay and process CQ before uploading connection.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:04 -04:00
Jitendra Bhivare eb419229be scsi: be2iscsi: Check tag in beiscsi_mccq_compl_wait
scsi host12: BS_1377 : mgmt_invalidate_connection Failed for cid=256
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff81332ebf>] __list_add+0xf/0xc0
PGD 0
Oops: 0000 [#1] SMP
Modules linked in:
...
CPU: 9 PID: 1542 Comm: iscsid Tainted: G               ------------ T 3.10.0-514.el7.x86_64 #1
Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 09/12/2016
task: ffff88076f310fb0 ti: ffff88076bba8000 task.ti: ffff88076bba8000
RIP: 0010:[<ffffffff81332ebf>]  [<ffffffff81332ebf>] __list_add+0xf/0xc0
RSP: 0018:ffff88076bbab8e8  EFLAGS: 00010046
RAX: 0000000000000246 RBX: ffff88076bbab990 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880468badf58 RDI: ffff88076bbab990
RBP: ffff88076bbab900 R08: 0000000000000246 R09: 00000000000020de
R10: 0000000000000000 R11: ffff88076bbab5be R12: 0000000000000000
R13: ffff880468badf58 R14: 000000000001adb0 R15: ffff88076f310fb0
FS:  00007f377124a880(0000) GS:ffff88046fa40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000771318000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff88076bbab990 ffff880468badf50 0000000000000001 ffff88076bbab938
ffffffff810b128b 0000000000000246 00000000cf9b7040 ffff880468bac7a0
0000000000000000 ffff880468bac7a0 ffff88076bbab9d0 ffffffffa05a6ea3

Call Trace:
[<ffffffff810b128b>] prepare_to_wait+0x7b/0x90
[<ffffffffa05a6ea3>] beiscsi_mccq_compl_wait+0x153/0x330 [be2iscsi]
[<ffffffff810b1600>] ? wake_up_atomic_t+0x30/0x30
[<ffffffffa05981b1>] beiscsi_ep_disconnect+0x91/0x2d0 [be2iscsi]
[<ffffffffa0202ffa>] iscsi_if_ep_disconnect.isra.14+0x5a/0x70 [scsi_transport_iscsi]
[<ffffffffa02042fb>] iscsi_if_recv_msg+0x113b/0x14a0 [scsi_transport_iscsi]
[<ffffffff811dffd8>] ? __kmalloc_node_track_caller+0x58/0x290
[<ffffffffa02046ee>] iscsi_if_rx+0x8e/0x1f0 [scsi_transport_iscsi]
[<ffffffff815a351d>] netlink_unicast+0xed/0x1b0
[<ffffffff815a38fe>] netlink_sendmsg+0x31e/0x690
[<ffffffff815a03e4>] ? netlink_rcv_wake+0x44/0x60
[<ffffffff815a19e3>] ? netlink_recvmsg+0x1e3/0x450

beiscsi_mccq_compl_wait gets called even when MCC tag allocation failed
for mgmt_invalidate_connection.  mcc_wait is not initialized for tag 0
so causes crash in prepare_to_wait.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:04 -04:00
Tomohiro Kusumi 031d1e0f2d scsi: ufs: fix wrong/ambiguous fall through comments
These aren't really falling through to anywhere meaningful.

Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 21:56:03 -04:00
Dan Carpenter 03b1a06203 scsi: osd_uld: remove an unneeded NULL check
We don't call the remove() function unless probe() succeeds so "oud"
can't be NULL here.  Plus, if it were NULL, we dereference it on the
next line so it would crash anyway.

[mkp: applied by hand]

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 21:55:14 -04:00
Jaehoon Chung a3902ee983 scsi: ufs: remove the duplicated checking for supporting clkscaling
There are same conditions for checking whether supporting clkscaling or
not. When ufshcd is supporting clkscaling, active_reqs should be
decreased by one.

[mkp: addressed comment from Bartlomiej]

Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 21:45:41 -04:00
Greg Kroah-Hartman 57c0eabbd5 Merge 4.11-rc4 into char-misc-next
We want the char-misc fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-27 09:13:04 +02:00
Finn Thain d0c2c269a3 drivers: Clean up duplicated email address
My email address may be found in the git commit logs and in MAINTAINERS.
Remove duplicate addresses so they won't have to be kept up-to-date.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-03-24 15:49:56 +01:00
Masanari Iida 0a95160ed3 treewide: Fix typos in printk
This patch fix some spelling typos found in printk.

[jkosina@suse.cz: drop arch/arm64/kernel/hibernate.c that was already
 in place]
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-03-24 15:24:00 +01:00
David S. Miller 16ae1f2236 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/genet/bcmmii.c
	drivers/net/hyperv/netvsc.c
	kernel/bpf/hashtab.c

Almost entirely overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23 16:41:27 -07:00
Arnd Bergmann 6f359f99b8 qedf: fix wrong le16 conversion
gcc points out that we are converting a 16-bit integer into a 32-bit
little-endian type and assigning that to 16-bit little-endian
will end up with a zero:

drivers/scsi/qedf/drv_fcoe_fw_funcs.c: In function 'init_initiator_rw_fcoe_task':
include/uapi/linux/byteorder/big_endian.h:32:26: error: large integer implicitly truncated to unsigned type [-Werror=overflow]
  t_st_ctx->read_write.rx_id = cpu_to_le32(FCOE_RX_ID);

The correct solution appears to be to just use a 16-bit byte swap instead.

Fixes: be086e7c53 ("qed*: Utilize Firmware 8.15.3.0")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23 11:56:08 -07:00
Brian King 16a20b52d1 scsi: ipr: Driver version 2.6.4
Bump driver version

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Brian King ef97d8ae12 scsi: ipr: Fix SATA EH hang
This patch fixes a hang that can occur in ATA EH with ipr. With ipr's
usage of libata, commands should never end up on ap->eh_done_q. The
timeout function we use for ipr, even for SATA devices, is
scsi_times_out, so ATA_QCFLAG_EH_SCHEDULED never gets set for ipr and EH
is driven completely by ipr and SCSI. The SCSI EH thread ends up calling
ipr's eh_device_reset_handler, which then calls
ata_std_error_handler. This ends up calling ipr_sata_reset, which issues
a reset to the device. This should result in all pending commands
getting failed back and having ata_qc_complete called for them, which
should end up clearing ATA_QCFLAG_FAILED as qc->flags gets zeroed in
ata_qc_free.  This ensures that when we end up in ata_eh_finish, we
don't do anything more with the command.

On adapters that only support a single interrupt and when running with
two MSI-X vectors or less, the adapter firmware guarantees that
responses to all outstanding commands are sent back prior to sending the
response to the SATA reset command.  On newer adapters supporting
multiple HRRQs, however, this can no longer be guaranteed, since the
command responses and reset response may be processed on different
HRRQs.

If ipr returns from ipr_sata_reset before the outstanding command was
returned, this sends us down the path of __ata_eh_qc_complete which then
moves the associated scsi_cmd from the work_q in
scsi_eh_bus_device_reset to ap->eh_done_q, which then will sit there
forever and we will be wedged.

This patch fixes this up by ensuring that any outstanding commands are
flushed before returning from eh_device_reset_handler for a SATA device.

Reported-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Brian King f646f325a8 scsi: ipr: Error path locking fixes
This patch closes up some potential race conditions observed in the
error handling paths in ipr while debugging an issue resulting in a hang
with SATA error handling. These patches ensure we are holding the
correct lock when adding and removing commands from the free and pending
queues in some error scenarios.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Brian King 439ae285b9 scsi: ipr: Fix abort path race condition
This fixes a race condition in the error handlomg paths of ipr. While a
command is outstanding to the adapter, it is placed on a pending queue
for the hrrq it is associated with, while holding the HRRQ lock. When a
command is completed, it is removed from the pending queue, under HRRQ
lock, and placed on a local list.  This list is then iterated through
without any locks and each command's done function is invoked, inside of
which, the command gets returned to the free list while grabbing the
HRRQ lock. This fixes two race conditions when commands have been
removed from the pending list but have not yet been added to the free
list. Both of these changes fix race conditions that could result in
returning success from eh_abort_handler and then later calling scsi_done
for the same request.

The first race condition is in ipr_cancel_op. It looks through each
pending queue to see if the command to be aborted is still outstanding
or not. Rather than looking on the pending queue, reverse the logic to
check to look for commands that are NOT on the free queue.  The second
race condition can occur when in ipr_wait_for_ops where we are waiting
for responses for commands we've aborted.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Brian King 960e96480e scsi: ipr: Remove redundant initialization
Removes some code in __ipr_eh_dev_reset which was modifying the ipr_cmd
done function. This should have already been setup at command allocation
time and if its since been changed, it means we are in the ipr_erp*
functions and need to wait for them to complete and don't want to
override that here.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Brian King 66a0d59cdd scsi: ipr: Fix missed EH wakeup
Following a command abort or device reset, ipr's EH handlers wait for
the commands getting aborted to get sent back from the adapter prior to
returning from the EH handler. This fixes up some cases where the
completion handler was not getting called, which would have resulted in
the EH thread waiting until it timed out, greatly extending EH time.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Arnd Bergmann 223b78ea21 scsi: lpfc: fix building without debugfs support
On a randconfig build without CONFIG_SCSI_LPFC_DEBUG_FS, I ran into
multiple compile failures:

drivers/scsi/lpfc/lpfc_debugfs.h: In function 'lpfc_debug_dump_wq':
drivers/scsi/lpfc/lpfc_debugfs.h:405:15: error: 'DUMP_FCP' undeclared (first use in this function); did you mean 'DUMP_VAR'?
drivers/scsi/lpfc/lpfc_debugfs.h:405:15: note: each undeclared identifier is reported only once for each function it appears in
drivers/scsi/lpfc/lpfc_debugfs.h:408:22: error: 'DUMP_NVME' undeclared (first use in this function); did you mean 'DUMP_NONE'?
drivers/scsi/lpfc/lpfc_nvmet.c: In function 'lpfc_nvmet_xmt_ls_rsp_cmp':
drivers/scsi/lpfc/lpfc_nvmet.c:109:2: error: implicit declaration of function 'lpfc_nvmeio_data'; did you mean 'lpfc_mem_free'? [-Werror=implicit-function-declaration]
drivers/scsi/lpfc/lpfc_nvmet.c: In function 'lpfc_nvmet_xmt_fcp_op':
drivers/scsi/lpfc/lpfc_nvmet.c:523:10: error: unused variable 'id' [-Werror=unused-variable]

They are all trivial to fix, so I'm doing it in a combined patch here.

Fixes: 1d9d5a9879 ("scsi: lpfc: refactor debugfs queue dump routines")
Fixes: bd2cdd5e40 ("scsi: lpfc: NVME Initiator: Add debugfs support")
Fixes: 2b65e18202 ("scsi: lpfc: NVME Target: Add debugfs support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:28:43 -04:00
Dick Kennedy a71e3cdcfc scsi: lpfc: Fix PT2PT PRLI reject
lpfc cannot establish connection with targets that send PRLI in P2P
configurations.

If lpfc rejects a PRLI that is sent from a target the target will not
resend and will reject the PRLI send from the initiator.

[mkp: applied by hand]

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:19:37 -04:00
Xiaofei Tan 4935933e07 scsi: hisi_sas: add is_sata_phy_v2_hw()
Add helper function is_sata_phy_v2_hw() to judge whether the attached
device is SATA disk for a root PHY.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
Xiang Chen 6073b7719a scsi: hisi_sas: use dev_is_sata to identify SATA or SAS disk
When SMP IO is sent, sas_protocol_ata couldn't judge whether the disk is
SATA or SAS disk.  So use dev_is_sata to identify SATA or SAS disk.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry 14d3f397f6 scsi: hisi_sas: check hisi_sas_lu_reset() error message
Unless we actually get some sort of failure in hisi_sas_lu_reset(),
don't print a message.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
Xiang Chen ccbfe5a05a scsi: hisi_sas: release SMP slot in lldd_abort_task
When an SMP task timeouts, it will call lldd_abort_task to release the
associated slot, and then will release the sas_task.

Currently in lldd_abort_task, if we fail to internally abort IO, then
the slot of SMP IO is not released, but sas_task will still be later
released, so the slot's sas_task is NULL, which will cause NULL pointer
when hisi_sas_slot_task_free happens later.

To resolve, check the return value of internal abort, and release the
slot if it failed.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry 8b05ad6a9d scsi: hisi_sas: add hisi_sas_clear_nexus_ha()
Add function for upper-layer to reset controller when all else fails.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry 4df642db5b scsi: hisi_sas: rename hisi_sas_link_timeout_{enable, disable}_link
For consistency, remove the "hisi_sas_" prefix.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
Xiaofei Tan 981843c6c4 scsi: hisi_sas: handle PHY UP+DOWN simultaneous irq
Handle the situation that PHY UP and DOWN irq happen simultaneously.
There is no mechanism of SoC HW to ensure this situation will never
happen. So, we add this handle just in case.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry f1dc751876 scsi: hisi_sas: some modifications to v2 hw reg init values
This patch includes:
(1) Disable transport layer retry
(2) Support CQ time and count interrupt coal
(3) fix link FIFO full issue

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Zhao Nenglong <zhaonenglong@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
Xiang Chen 634a9585f4 scsi: hisi_sas: process error codes according to their priority
There are some rules to decide which error code has the high priority
when errors happen together:

(1) Error phase of CQ decides the error happens on RX or TX;

(2) For TX error, when DMA/TRANS TX error happen simultaneously, the
    priority of DMA TX error is higher than TRANS TX error, so for the
    priority of TX error: DW2 (DMA TX part) > DW0;

(3) For RX error, when TRANS/DMA/SIPC RX error happen simultaneously,
    the priority of TRANS RX error is higher than DMA and SIPC RX error,
    and we should also keep the rules (the priority of DW3 > DW2), so
    for the priority of RX error: DW1 > DW3 > DW2(SIPC RX part);

(4) There are also a priority we should keep in the same error type.

So, modify slot error code to handle this.

In addition to this, some some error codes are modified according to
recommendation from SoC designer.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry 6fcdda8051 scsi: hisi_sas: remove task free'ing for timeouts
When a TMF or internal abort times-out, do not free slot. We expect this
to be done upon later escalated error handling.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry 54c9dd2d26 scsi: hisi_sas: fix some sas_task.task_state_lock locking
Some more locking needs to be added/modified for when
read-modify-writing sas_task.task_state_flags.

Note: since we can attempt to grab this lock in interrupt
      context we should use irq variant of spin_lock.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
Xiang Chen 6131243acd scsi: hisi_sas: free slots after hardreset
After hardreset, we clear up IOs of remote disks, so we need to free
those slots in LLDD.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry a305f33775 scsi: hisi_sas: check for SAS_TASK_STATE_ABORTED in slot complete
Check in slot_complete_v2_hw() for whether a task has already been
completed by upper layer.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry 055945df4c scsi: hisi_sas: hardreset for SATA disk in LU reset
When issuing an LU reset for a SATA target, issue an internal abort and
a hard reset.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry c35279f2f1 scsi: hisi_sas: modify hisi_sas_abort_task() for SSP
Currently an internal abort is executed regardless of the result of the
TMF. We should also check the result of the internal abort to see if we
should free the slot.

So change the status code STAT_IO_COMPLETE to TMF_RESP_FUNC_SUCC,
meaning the slot has been successfully aborted.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
Xiang Chen fc86695144 scsi: hisi_sas: modify error handling for v2 hw
For error codes which need abort-and-retry, simulate IO timeout and let
SCSI+ATA layers process those errors.

Previously for SSP, we should try to abort the IO in the LLDD and then
pass back to upper layer, but sometimes this would also error. So
Instead of adding special error handling for this scenario in the LLDD,
allow the upper layer to handle completely.

No performance hit is seen by taking this approach.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry b4c67a6ca7 scsi: hisi_sas: only reset link for PHY_FUNC_LINK_RESET
We currently do a hard reset for a link reset. Change this to do a link
reset only.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry ddabca216c scsi: hisi_sas: error hisi_sas_task_prep() when port down
When sas_port is NULL, then return SAS_PHY_DOWN.

In addition, when the sas_dev is gone then explicitly return
SAS_PHY_DOWN.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:01 -04:00
John Garry 405314df56 scsi: hisi_sas: remove hisi_sas_port_deformed()
Currently when a root PHY is deformed from a asd_sas_port we try to
release the slots in the LLDD, and fail.

Regardless, it is not right to release this early.

This patch removes the deformed function. As it was before, port
deformation is still done in hisi_sas_phy_down().

It would be nice to actually remove the hisi_sas_port_{de}formed() pair,
however we cannot as we need to know the asd_sas_port index libsas has
associated with an asd_sas_phy.

The hw does actually generate a port id for a PHY, but this seems to a
random number, so ignored for this purpose.

This patch also changes the code to link slots to the hisi_sas_device,
and not hisi_sas_port.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:01 -04:00
Xiang Chen 7c594f0407 scsi: hisi_sas: add softreset function for SATA disk
Add softreset to clear IO after internal abort device for SATA disk.

The SATA error handling for the controller is based on device internal
abort and softreset function.

The controller does not support internal abort for single IO, so we need
to execute internal abort for device.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:01 -04:00
John Garry 396b80448f scsi: hisi_sas: move PHY init to hisi_sas_scan_start()
Relocate the PHY init code from LLDD hw init path to
hisi_sas_scan_start().

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:01 -04:00
Xiang Chen 06ec0fb97c scsi: hisi_sas: add controller reset
There are some scenarios that we need to warm-reset to reset registers
of SAS controller. During reset we disable interrupts/DQs/PHYs, and
after reset we re-init the hardware and rescan the topology to see if
anything changed.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:01 -04:00
John Garry 2e244f0f5b scsi: hisi_sas: add to_hisi_sas_port()
Introduce function to get hisi_sas_port from asd_sas_port.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:01 -04:00
Tomas Henzl eb94588dab scsi: hpsa: fix volume offline state
In a previous patch a hpsa_scsi_dev_t.volume_offline update line has
been removed, so let us put it back..

Fixes: 85b29008d8 (hpsa: update check for logical volume status)
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 10:12:29 -04:00
Satish Kharat b8e1aa3c72 scsi: fnic: bug fix for fip.fip_subcode in fnic_fcoe_send_vlan_req
This is a bug introduced when they moved the fip subcodes to central
place. Was sending FIP_SC_VL_NOTE in fip.fip_subcode for VLAN request in
fnic_fcoe_send_vlan_req. Change is to use FIP_SC_VL_REQ instead.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:57:23 -04:00
Satish Kharat 445d296086 scsi: fnic: Adding debug IO and Abort latency counter to fnic stats
The IO and Abort latency counter counts the time taken to complete the
IO and abort command into broad buckets. This is not intended for
performance measurement, just a debug statistic.  current_max_io_time
tries to keep track of the maximum time an IO has taken to complete if
it is > 30sec.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:57:23 -04:00
Satish Kharat 39fcbbc01b scsi: fnic: Adding Check Condition counter to misc fnicstats
Just a simple counter of number of check conditions encountered on that
host.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:57:23 -04:00
Satish Kharat b9202b4ae8 scsi: fnic: Avoid false out-of-order detection for aborted command
If SCSI-ML has already issued abort on a command i.e
FNIC_IOREQ_ABTS_PENDING is set and we get a IO completion, avoid this
being flagged as out-of-order completion by setting the FNIC_IO_DONE
flag in fnic_fcpio_icmnd_cmpl_handler

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:57:23 -04:00
Satish Kharat 7ef539c88d scsi: fnic: Fix for "Number of Active IOs" in fnicstats becoming negative
Fixing the IO stats update (Active IOs and IO completion) to prevent
"Number of Active IOs" from becoming negative in the fnistats output.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:57:23 -04:00
Satish Kharat ccc6d70460 scsi: fnic: minor cleanup in fnic_fcpio_itmf_cmpl_handler, removing else case
Getting rid of else case to make the flow look bit simpler.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:51:25 -04:00
Satish Kharat b43abcbbd5 scsi: fnic: Ratelimit printks to avoid flooding when vlan is not set by the switch.i
This is to avoid the log from being filled with vlan discovery messages
when there is no vlan configured on the switch.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:51:25 -04:00
Christoph Hellwig cca678dfba scsi: fnic: switch to pci_alloc_irq_vectors
Not a full cleanup for the IRQ code, for that we'd need to know if the
max number of the various CQ types is going to stay 1 forever.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:51:10 -04:00
Linus Torvalds 1f02071358 SCSI fixes on 20170321
Nine small fixes: the biggest is probably finally sorting out Kconfig
 issues with lpfc nvme.  There are some performance fixes for megaraid
 and hpsa and a static checker fix.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJY0UHhAAoJEAVr7HOZEZN4Pm4QAII49optwEoUAy616fR4Vj6C
 SA2EmjfAi6tKvvYO88rwfN8KTFkQgsoLa9OmoLFjexTlAAoIG+ixL+SHLwz0nXgG
 4tar9vvRnT97MuaXgIxPrNpAOao2OSiFZd3B4rUItYClWhpjZcBhlQogif6S3/qb
 sLomKklX7bFfzmtwRtkgDjSdV7Iyx+IAOZAAIi9TmRu+uFoySs6c2Z6zKzfTMx9c
 Wc94sS3UvIptpXFncqHgXjOm+/hELjzqC67DrJUu2CbayyRAGF0Aur5QO6FMJOiQ
 dNr4k91kVBuvH+Um9WfHTZdqWD/4DzKblP/DwF0EoOerstQMNZrDwlex47pJ80Oa
 8VcoMrtJmsCD/bUaZZoxZpIo+OFbE20/mMy/JdoyEl59L1B9NGH+njDDigD4m57t
 NhWgyNQN1pbiSbY4dpyT+YYTI0JO2bR9Ye7VVbbeDVhJZlTUrF6D2MbsDMC2zz+f
 oGsoF1wBRDw6Q7rY1I8TmCM0qsdE2dbXpCmnaQK4VJpQgiZs5fyA+HhQ82r1SFq1
 yN6uCcbbjwNIRyVygtRmRmvyz3V67rIryIV625vMj37xi1ttTEdc1NaKgpaclUaK
 yvHlK/izzrU+k8P8V/uOt4oac2T8gyCH1q5GD8IKjzFGbODgQfuUXjFM40083xIQ
 W3miZdQlV0PgUIrO8/B4
 =Z4Xj
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Nine small fixes: the biggest is probably finally sorting out Kconfig
  issues with lpfc nvme.  There are some performance fixes for megaraid
  and hpsa and a static checker fix"

[ Johannes Thumshirn points out that there still seems to be more lpfc
  vs nvme config issues.  Oh well.   - Linus ]

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: lpfc: Finalize Kconfig options for nvme
  scsi: ufs: don't check unsigned type for a negative value
  scsi: hpsa: do not timeout reset operations
  scsi: hpsa: limit outstanding rescans
  scsi: hpsa: update check for logical volume status
  scsi: megaraid_sas: Driver version upgrade
  scsi: megaraid_sas: raid6 also require cpuSel check same as raid5
  scsi: megaraid_sas: add correct return type check for ldio hint logic for raid1
  scsi: megaraid_sas: enable intx only if msix request fails
2017-03-21 13:10:17 -07:00
Logan Gunthorpe ac1ddc584e scsi: utilize new cdev_device_add helper function
This driver did not set kobj.parent so it likely suffered from
a potential use after free race if the user unregistered the
device while it was in use.

This was not so straightforward a conversion but I think this patch
cleans up its probe's error path significantly.

This patch adds device_initialize, which is required for
cdev_device_add. Then it switches to put_device instead of kfree as
recommended by device_initialize's documentation. This removes a lot
from the error path which was already in __remove.
A couple things needed to be re-ordered to be entirely correct, though.
ida_remove is also moved out of __remove and into unregister to
simplify things and follow the pattern other devices are using.

This also drop an extra unnecessary get_device/put_device in the code.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-21 06:44:33 +01:00
John Garry 9702c67c60 scsi: libsas: fix ata xfer length
The total ata xfer length may not be calculated properly, in that we do
not use the proper method to get an sg element dma length.

According to the code comment, sg_dma_len() should be used after
dma_map_sg() is called.

This issue was found by turning on the SMMUv3 in front of the hisi_sas
controller in hip07. Multiple sg elements were being combined into a
single element, but the original first element length was being use as
the total xfer length.

Cc: <stable@vger.kernel.org>
Fixes: ff2aeb1eb6 ("libata: convert to chained sg")
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-20 09:45:08 -04:00
Linus Torvalds 8aa3417255 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "The bulk of the changes are in qla2xxx target driver code to address
  various issues found during Cavium/QLogic's internal testing (stable
  CC's included), along with a few other stability and smaller
  miscellaneous improvements.

  There are also a couple of different patch sets from Mike Christie,
  which have been a result of his work to use target-core ALUA logic
  together with tcm-user backend driver.

  Finally, a patch to address some long standing issues with
  pass-through SCSI export of TYPE_TAPE + TYPE_MEDIUM_CHANGER devices,
  which will make folks using physical (or virtual) magnetic tape happy"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (28 commits)
  qla2xxx: Update driver version to 9.00.00.00-k
  qla2xxx: Fix delayed response to command for loop mode/direct connect.
  qla2xxx: Change scsi host lookup method.
  qla2xxx: Add DebugFS node to display Port Database
  qla2xxx: Use IOCB interface to submit non-critical MBX.
  qla2xxx: Add async new target notification
  qla2xxx: Export DIF stats via debugfs
  qla2xxx: Improve T10-DIF/PI handling in driver.
  qla2xxx: Allow relogin to proceed if remote login did not finish
  qla2xxx: Fix sess_lock & hardware_lock lock order problem.
  qla2xxx: Fix inadequate lock protection for ABTS.
  qla2xxx: Fix request queue corruption.
  qla2xxx: Fix memory leak for abts processing
  qla2xxx: Allow vref count to timeout on vport delete.
  tcmu: Convert cmd_time_out into backend device attribute
  tcmu: make cmd timeout configurable
  tcmu: add helper to check if dev was configured
  target: fix race during implicit transition work flushes
  target: allow userspace to set state to transitioning
  target: fix ALUA transition timeout handling
  ...
2017-03-19 18:06:31 -07:00
Bart Van Assche 0aeccdfe22 scsi: scsi_dh_alua: Warn if the first argument of alua_rtpg_queue() is NULL
Callers must provide a valid port group to alua_rtpg_queue().  Issue a
kernel warning if that is not the case.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-19 13:16:37 -04:00
Bart Van Assche 7cb689fe42 scsi: scsi_dh_alua: Ensure that alua_activate() calls the completion function
Callers of scsi_dh_activate(), e.g. dm-mpath, assume that this function
either returns an error code or calls the completion function. Make
alua_activate() call the completion function even if scsi_device_get()
fails.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Tang Junhui <tang.junhui@zte.com.cn>
Cc: <stable@vger.kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-19 13:16:17 -04:00
Bart Van Assche 625fe857e4 scsi: scsi_dh_alua: Check scsi_device_get() return value
Do not queue ALUA work nor call scsi_device_put() if the
scsi_device_get() call fails. This patch fixes the following crash:

general protection fault: 0000 [#1] SMP
RIP: 0010:scsi_device_put+0xb/0x30
Call Trace:
 scsi_disk_put+0x2d/0x40
 sd_release+0x3d/0xb0
 __blkdev_put+0x29e/0x360
 blkdev_put+0x49/0x170
 dm_put_table_device+0x58/0xc0 [dm_mod]
 dm_put_device+0x70/0xc0 [dm_mod]
 free_priority_group+0x92/0xc0 [dm_multipath]
 free_multipath+0x70/0xc0 [dm_multipath]
 multipath_dtr+0x19/0x20 [dm_multipath]
 dm_table_destroy+0x67/0x120 [dm_mod]
 dev_suspend+0xde/0x240 [dm_mod]
 ctl_ioctl+0x1f5/0x520 [dm_mod]
 dm_ctl_ioctl+0xe/0x20 [dm_mod]
 do_vfs_ioctl+0x8f/0x700
 SyS_ioctl+0x3c/0x70
 entry_SYSCALL_64_fastpath+0x18/0xad

Fixes: commit 03197b61c5 ("scsi_dh_alua: Use workqueue for RTPG")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Tang Junhui <tang.junhui@zte.com.cn>
Cc: <stable@vger.kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-19 13:15:50 -04:00