Commit Graph

692639 Commits

Author SHA1 Message Date
Manish Rangankar 42d7c10f23 scsi: qedi: Limit number for CQ queues.
[qed_sp_iscsi_func_start:189(host_7-0)]Cannot satisfy CQ amount. Queues
requested 8, CQs available 4. Aborting function start

Above condition will resolve as management firmware is capable of
telling us the number of CQs available for a given PF, qed will
communicate the same number to qedi, So that qedi will know how much CQs
are allowed.

Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-24 22:28:50 -04:00
John Garry 30b67de31b scsi: hisi_sas: remove driver versioning
The driver version is not updated with changes to the driver, so it has
no value, so just get rid of it.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-24 22:28:50 -04:00
John Garry 76aae5f60b scsi: hisi_sas: replace kfree with scsi_host_put
Instances of kfree(shost) should be replaced with scsi_host_put().

In addition, a missing scsi_host_put() is added for error path in
hisi_sas_shost_alloc_pci() and v3 driver removal.

Signed-off-by: Pan Bian <bianpan2016@163.com> # For main.c changes
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-24 22:28:49 -04:00
John Garry 5aec704f0d scsi: hisi_sas: remove phy_down_v3_hw() res variable
Variable res only holds value 0, so remove it.

This cleans up a coccicheck warning.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-24 22:28:49 -04:00
Xiang Chen 2400620c1f scsi: hisi_sas: add phy_set_linkrate_v3_hw()
Add function to set linkrate for v3 hw.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-24 22:28:48 -04:00
Xiang Chen 056e4cc66c scsi: hisi_sas: update some v3 register init settings
This patch updates some register setting according to recommendation
from HW designer and experiment.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-24 22:28:48 -04:00
Xiang Chen a25d0d3df2 scsi: hisi_sas: add reset handler for v3 hw
Use ACPI "_RST" method to reset the controller, since FLR is not
supported.

Function hisi_sas_stop_phys() is introduced to remove some code
duplication.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-24 22:28:47 -04:00
Xiang Chen d499669fac scsi: hisi_sas: kill tasklet when destroying irq in v3 hw
This patch adds calls to kill CQ takslets v3 hw during probe failure.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 20:15:02 -04:00
Xiang Chen 4f73575a79 scsi: hisi_sas: fix v3 hw channel interrupt processing
The channel interrupt is to process all the interrupts except PHY
UP/DOWN and broadcast interrupt. So we need to clear all the interrupts
except those 3 interrupts after processing channel interrupts.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 20:15:02 -04:00
Xiang Chen 8103673108 scsi: hisi_sas: Modify v3 hw STP_LINK_TIMER setting
Modify STP link timer from 10ms to 500ms. Also add the register address.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 20:15:02 -04:00
Xiang Chen 031da09c11 scsi: hisi_sas: add status and command buffer for internal abort
For v3 hw, internal abort function required status and command buffer to
be set, so add necessary code for this.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 20:15:02 -04:00
Xiaofei Tan c3fe8a2bbb scsi: hisi_sas: support zone management commands
Add two ATA commands, ATA_CMD_ZAC_MGMT_IN and ATA_CMD_ZAC_MGMT_OUT in
hisi_sas_get_ata_protocol(), to support SATA SMR disk.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 20:15:02 -04:00
Xiang Chen 640acc9a96 scsi: hisi_sas: service interrupt ITCT_CLR interrupt in v2 hw
This patch is a fix related to freeing a device in v2 hw driver.

Before, we polled to ITCT CLR interrupt to check if a device is free.

This was error prone, as if the interrupt doesn't occur in 10us, we miss
processing it.

To avoid this situation, service this interrupt and sync the event with
a completion.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 20:15:02 -04:00
Xiang Chen 8a253888bf scsi: hisi_sas: add irq and tasklet cleanup in v2 hw
This patch adds support to clean-up allocated IRQs and kill tasklets
when probe fails and for driver removal.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 20:15:02 -04:00
Xiang Chen cef4e1ab7a scsi: hisi_sas: remove repeated device config in v2 hw
This patch removes some repeated configurations:

(1) The device id of the device is already set in the alloc function, so
    we don't need to modify in free device function.

(2) Field dev_type and dev_status are configured in hisi_sas_dev_gone(),
    so there is no need for repeated config in free_device_v3_hw.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 20:15:02 -04:00
John Garry 2b3833510d scsi: hisi_sas: use array for v2 hw ECC errors
The code to print ECC errors in v2 hw driver is very repetitive.  This
patch condensed the code by looping an array of errors.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 20:15:02 -04:00
Xiaofei Tan c52108c61b scsi: hisi_sas: add v2 hw DFX feature
Add DFX feature for v2 hw. We are adding support for
the following errors:
- loss_of_dword_sync_count
- invalid_dword_count
- phy_reset_problem_count
- running_disparity_error_count

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 20:15:02 -04:00
Xiang Chen 01b361fc90 scsi: hisi_sas: fix v2 hw underflow residual value
The value dw0 is the residual bytes when UNDERFLOW error happens, but we
filled the residual with the value of dw3 before. So change the residual
from dw3 to dw0.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 20:15:01 -04:00
Xiang Chen c16db73665 scsi: hisi_sas: avoid potential v2 hw interrupt issue
When some interrupts happen together, we need to process every interrupt
one-by-one, and should not return immediately when one interrupt process
is finished being processed.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 20:15:01 -04:00
Xiaofei Tan 917d3bdaf8 scsi: hisi_sas: fix reset and port ID refresh issues
This patch provides fixes for the following issues:

1. Fix issue of controller reset required to send commands. For reset
   process, it may be required to send commands to the controller, but
   not during soft reset.  So add HISI_SAS_NOT_ACCEPT_CMD_BIT to prevent
   executing a task during this period.

2. Send a broadcast event in rescan topology to detect any topology
   changes during reset.

3. Previously it was not ensured that libsas has processed the PHY up
   and down events after reset. Potentially this could cause an issue
   that we still process the PHY event after reset. So resolve this by
   flushing shot workqueue in LLDD reset.

4. Port ID requires refresh after reset. The port ID generated after
   reset is not guaranteed to be the same as before reset, so it needs
   to be refreshed for each device's ITCT.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 20:15:01 -04:00
Kevin Barnett b98117caa0 scsi: smartpqi: change driver version to 1.1.2-125
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:58:26 -04:00
Kevin Barnett 557900640b scsi: smartpqi: add in new controller ids
Update the driver’s PCI IDs to match the latest Microsemi controllers

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:58:26 -04:00
Kevin Barnett b6d478119e scsi: smartpqi: update kexec and power down support
Add PQI reset to driver shutdown callback to work around controller bug.

During an 1.) OS shutdown or 2.) kexec outside of a kdump, the Linux
kernel will clear BME on our controller.

If BME is cleared during a controller/host PCIe transfer, the controller
will lock up.

So we perform a PQI reset in the driver's shutdown callback function to
eliminate the possibility of a controller/host PCIe transfer being
active when the kernel clears BME immediately after calling the driver's
shutdown callback.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:58:26 -04:00
Kevin Barnett 4f078e2408 scsi: smartpqi: cleanup doorbell register usage.
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:58:26 -04:00
Kevin Barnett 41555d540f scsi: smartpqi: update pqi passthru ioctl
- make pass-thru requests bi-directional

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:58:26 -04:00
Kevin Barnett 58322fe006 scsi: smartpqi: enhance BMIC cache flush
- distinguish between shutdown and non-shutdown.

Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:58:25 -04:00
Kevin Barnett 336b681931 scsi: smartpqi: add pqi reset quiesce support
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:58:25 -04:00
Colin Ian King 0b7250f93f scsi: dpt_i2o: remove redundant null check on array device
The null check on pHba->channel[chan].device is redundant because
device is an array and hence can never be null. Remove the check.

Detected by CoverityScan, CID#115362 ("Array compared against 0")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:55:35 -04:00
Pan Bian 0b2ce198fa scsi: qla2xxx: use dma_mapping_error to check map errors
The return value of dma_map_single() should be checked by
dma_mapping_error(). However, in function qla26xx_dport_diagnostics(), its
return value is checked against NULL, which could result in failures.

Signed-off-by: Pan Bian <bianpan2016@163.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:55:35 -04:00
Pan Bian cf99dc30bc scsi: mvsas: replace kfree with scsi_host_put
The return value of scsi_host_alloc() should be released by
scsi_host_put(). However, in function mvs_pci_init(), kfree()
is used. This patch replaces kfree() with scsi_host_put() to avoid
possible memory leaks.

Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:55:35 -04:00
Pan Bian bc1371c181 scsi: pm8001: fix double free in pm8001_pci_probe
In function pm8001_pci_probe(), on errors that the control flow jumps to
label err_out_ha_free, function pm8001_free() is called. In pm8001_free(),
scsi_host_put() is called to release shost, which keeps the return value
of scsi_host_alloc(). After pm8001_free() returns, kfree() is called to
free shost again, resulting in a double free bug. This patch removes
scsi_host_put() from pm8001_free() and explicitly calls scsi_host_put()
to release Scsi_Host in need.

Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:55:35 -04:00
weiping 3b8328e2e0 scsi: megaraid_sas: fix allocate instance->pd_info twice
fix allocate instance->pd_info twice which was introduced by 96188a89cc.

Signed-off-by: weiping zhang <zhangweiping@didichuxing.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:55:35 -04:00
Finn Thain d60e9eec95 scsi: esp_scsi: Always clear msg_out_len after MESSAGE OUT phase
After sending a message, always clear esp->msg_out_len. Otherwise,
eh_abort_handler may subsequently fail to send an ABORT TASK SET
message.

Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:55:35 -04:00
Finn Thain c69edff5c5 scsi: esp_scsi: Avoid sending ABORT TASK SET messages
If an LLD aborts a task set, it should complete the affected commands
with the appropriate result code. In a couple of cases esp_scsi doesn't
do so.

When the initiator receives an unhandled message, just respond by sending
a MESSAGE REJECT instead of ABORT TASK SET, and thus avoid the issue.

OTOH, a MESSAGE REJECT sent by a target can be taken as an indication
that the initiator messed up somehow. It isn't always possible to abort
correctly, so just fall back on a SCSI bus reset, which will complete the
affected commands with the appropriate result code.

For example, certain Apple (Sony) CD-ROM drives, when the non-existent
LUN 1 is scanned, can't handle the INQUIRY command. The problem is not
detected until the initiator gets a MESSAGE REJECT. Whenever esp_scsi
sees that message, it raises ATN and sends ABORT TASK SET -- but
neglects to complete the failed scmd.

The target then goes into DATA OUT phase (probably bogus), while the ESP
device goes into disconnected mode (surprising, given the bus phase).
The next Transfer Information command from esp_scsi then causes
an Invalid Command interrupt because that command is not valid when in
disconnected mode:

mac_esp: using PDMA for controller 0
mac_esp mac_esp.0: esp0: regs[50f10000:(null)] irq[19]
mac_esp mac_esp.0: esp0: is a ESP236, 16 MHz (ccf=4), SCSI ID 7
scsi host0: esp
scsi 0:0:0:0: Direct-Access     SEAGATE  ST318416N        0010 PQ: 0 ANSI: 3
scsi target0:0:0: Beginning Domain Validation
scsi target0:0:0: asynchronous
scsi target0:0:0: Domain Validation skipping write tests
scsi target0:0:0: Ending Domain Validation
scsi 0:0:3:0: CD-ROM            SONY     CD-ROM CDU-8003A 1.9a PQ: 0 ANSI: 2 CCS
scsi target0:0:3: Beginning Domain Validation
scsi target0:0:3: FAST-5 SCSI 2.0 MB/s ST (500 ns, offset 15)
scsi target0:0:3: Domain Validation skipping write tests
scsi target0:0:3: Ending Domain Validation
scsi host0: unexpected IREG 40
scsi host0: Dumping command log
scsi host0: ent[2] CMD val[c2] sreg[90] seqreg[cc] sreg2[00] ireg[20] ss[01] event[0c]
scsi host0: ent[3] CMD val[00] sreg[91] seqreg[04] sreg2[00] ireg[18] ss[00] event[0c]
scsi host0: ent[4] EVENT val[0d] sreg[91] seqreg[04] sreg2[00] ireg[18] ss[00] event[0c]
scsi host0: ent[5] EVENT val[03] sreg[91] seqreg[04] sreg2[00] ireg[18] ss[00] event[0d]
scsi host0: ent[6] CMD val[90] sreg[91] seqreg[04] sreg2[00] ireg[18] ss[00] event[03]
scsi host0: ent[7] EVENT val[05] sreg[91] seqreg[04] sreg2[00] ireg[18] ss[00] event[03]
scsi host0: ent[8] EVENT val[0d] sreg[93] seqreg[cc] sreg2[00] ireg[10] ss[00] event[05]
scsi host0: ent[9] CMD val[01] sreg[93] seqreg[cc] sreg2[00] ireg[10] ss[00] event[0d]
scsi host0: ent[10] CMD val[11] sreg[93] seqreg[cc] sreg2[00] ireg[10] ss[00] event[0d]
scsi host0: ent[11] EVENT val[0b] sreg[93] seqreg[cc] sreg2[00] ireg[10] ss[00] event[0d]
scsi host0: ent[12] CMD val[12] sreg[97] seqreg[cc] sreg2[00] ireg[08] ss[00] event[0b]
scsi host0: ent[13] EVENT val[0c] sreg[97] seqreg[cc] sreg2[00] ireg[08] ss[00] event[0b]
scsi host0: ent[14] CMD val[44] sreg[90] seqreg[cc] sreg2[00] ireg[20] ss[00] event[0c]
scsi host0: ent[15] CMD val[01] sreg[90] seqreg[cc] sreg2[00] ireg[20] ss[01] event[0c]
scsi host0: ent[16] CMD val[c2] sreg[90] seqreg[cc] sreg2[00] ireg[20] ss[01] event[0c]
scsi host0: ent[17] CMD val[00] sreg[87] seqreg[02] sreg2[00] ireg[18] ss[00] event[0c]
scsi host0: ent[18] EVENT val[0d] sreg[87] seqreg[02] sreg2[00] ireg[18] ss[00] event[0c]
scsi host0: ent[19] EVENT val[06] sreg[87] seqreg[02] sreg2[00] ireg[18] ss[00] event[0d]
scsi host0: ent[20] CMD val[01] sreg[87] seqreg[02] sreg2[00] ireg[18] ss[00] event[06]
scsi host0: ent[21] CMD val[10] sreg[87] seqreg[02] sreg2[00] ireg[18] ss[00] event[06]
scsi host0: ent[22] CMD val[1a] sreg[87] seqreg[ca] sreg2[00] ireg[08] ss[00] event[06]
scsi host0: ent[23] CMD val[12] sreg[87] seqreg[ca] sreg2[00] ireg[08] ss[00] event[06]
scsi host0: ent[24] EVENT val[0d] sreg[87] seqreg[ca] sreg2[00] ireg[08] ss[00] event[06]
scsi host0: ent[25] EVENT val[09] sreg[86] seqreg[ca] sreg2[00] ireg[10] ss[00] event[0d]
scsi host0: ent[26] CMD val[01] sreg[86] seqreg[ca] sreg2[00] ireg[10] ss[00] event[09]
scsi host0: ent[27] CMD val[10] sreg[86] seqreg[ca] sreg2[00] ireg[10] ss[00] event[09]
scsi host0: ent[28] EVENT val[0a] sreg[86] seqreg[ca] sreg2[00] ireg[10] ss[00] event[09]
scsi host0: ent[29] EVENT val[0d] sreg[80] seqreg[ca] sreg2[00] ireg[20] ss[00] event[0a]
scsi host0: ent[30] EVENT val[04] sreg[80] seqreg[ca] sreg2[00] ireg[20] ss[00] event[0d]
scsi host0: ent[31] CMD val[01] sreg[80] seqreg[ca] sreg2[00] ireg[20] ss[00] event[04]
scsi host0: ent[0] CMD val[90] sreg[80] seqreg[ca] sreg2[00] ireg[20] ss[00] event[04]
scsi host0: ent[1] EVENT val[05] sreg[80] seqreg[ca] sreg2[00] ireg[20] ss[00] event[04]
scsi target0:0:3: FAST-5 SCSI 2.0 MB/s ST (500 ns, offset 15)
scsi target0:0:0: asynchronous
sr 0:0:3:0: [sr0] scsi-1 drive
cdrom: Uniform CD-ROM driver Revision: 3.20
sd 0:0:0:0: Attached scsi generic sg0 type 0
sr 0:0:3:0: Attached scsi generic sg1 type 5

This patch resolves this issue because the bus reset causes the INQUIRY
command to fail earlier, and return the appropriate result code.

Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:55:35 -04:00
Finn Thain 201c37d7bf scsi: esp_scsi: Clean up control flow and dead code
This patch improves readability. There are no functional changes.

Since this touches on a questionable ESP_INTR_DC conditional, add some
commentary to help others who may (as I did) find themselves chasing an
"Invalid Command" error after the device flags this condition.

This cleanup also eliminates a warning from "make W=1":
drivers/scsi/esp_scsi.c: In function 'esp_finish_select':
drivers/scsi/esp_scsi.c:1233:5: warning: variable 'orig_select_state' set but not used [-Wunused-but-set-variable]
  u8 orig_select_state;

Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:55:34 -04:00
Finn Thain 7640d91d28 scsi: mac_esp: Fix PIO transfers for MESSAGE IN phase
When in MESSAGE IN phase, the ESP device does not automatically
acknowledge each byte that is transferred by PIO. The mac_esp driver
neglects to explicitly ack them, which causes a timeout during messages
larger than one byte (e.g. tag bytes during reconnect). Fix this with an
ESP_CMD_MOK command after each byte.

The MESSAGE IN phase is also different in that each byte transferred
raises ESP_INTR_FDONE. So don't exit the transfer loop for this interrupt,
for this phase.

That resolves the "Reconnect IRQ2 timeout" error on those Macs which use
PIO transfers instead of PDMA. This patch also improves on the weak tests
for unexpected interrupts and phase changes during PIO transfers.

Tested-by: Stan Johnson <userm57@yahoo.com>
Fixes: 02507a80b3 ("[PATCH] [SCSI] mac_esp: fix PIO mode, take 2")
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:55:34 -04:00
Finn Thain b36c7db977 scsi: mac_esp: Avoid type warning from sparse
Avoid the following warning from "make C=1":

  CHECK   drivers/scsi/mac_esp.c
drivers/scsi/mac_esp.c:357:30: warning: incorrect type in initializer (different address spaces)
drivers/scsi/mac_esp.c:357:30:    expected unsigned char [usertype] *fifo
drivers/scsi/mac_esp.c:357:30:    got void [noderef] <asn:2>*

Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:55:34 -04:00
Bhumika Goyal 5f3342d757 arcmsr: add const to bin_attribute structures
Add const to bin_attribute structures as they are only passed to the
functions system_{remove/create}_bin_file. The arguments passed are of
type const, so declare the structures to be const.

Done using Coccinelle.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:40:50 -04:00
Martin Peschke f32c9e03d4 scsi: zfcp: early returns for traces disabled via level
This patch adds early checks to avoid burning CPU cycles on
the assembly of trace entries which would be skipped anyway.

Introduce a static const variable to keep the trace level to check with
debug_level_enabled() in sync with the actual trace emit with
debug_event(). In order not to refactor the SAN tracing too much,
simply use a define instead.

This change is only for the non / semi hot paths,
while the actual (I/O) hot path was already improved earlier:
zfcp_dbf_scsi() is already guarded by its only caller _zfcp_dbf_scsi()
since commit dcd20e2316 ("[SCSI] zfcp: Only collect SCSI debug data for
matching trace levels").
zfcp_dbf_hba_fsf_res() is already guarded by its only caller
zfcp_dbf_hba_fsf_response() since commit 2e261af84c ("[SCSI] zfcp: Only
collect FSF/HBA debug data for matching trace levels").

Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
[maier@linux.vnet.ibm.com: rebase, reword, default level 3 branch prediction]
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:37:03 -04:00
Martin Peschke b096ef863e scsi: zfcp: clean up unnecessary module_param_named() with no_auto_port_rescan
Improves commit 43f60cbd56 ("[SCSI] zfcp: No automatic port_rescan on
events")

Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
[maier@linux.vnet.ibm.com: reword, underscore in description to match sysfs]
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:37:03 -04:00
Martin Peschke 5ec2196060 scsi: zfcp: clean up a member of struct zfcp_qdio that was assigned but never used
v2.6.38 commit a54ca0f62f ("[SCSI] zfcp: Redesign of the debug tracing
for HBA records.")
dropped trace information previously introduced with
v2.6.27 commit c3baa9a26c ("[SCSI] zfcp: Add information about interrupt
to trace.")
but kept and needlessly assigned a now no longer used struct field.

Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
[maier@linux.vnet.ibm.com: reword, added git history]
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:37:02 -04:00
Steffen Maier 46e5ee1f74 scsi: zfcp: clean up no longer existent prototype from zfcp API header
Commit a54ca0f62f ("[SCSI] zfcp: Redesign of the debug tracing for HBA
records.") refactored zfcp_dbf_hba_berr into zfcp_dbf_hba_bit_err
but added the prototype for the latter without removing it for the former.

Suggested-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:37:02 -04:00
Martin Peschke 5f03e98b0f scsi: zfcp: clean up redundant code with fall through in link down SRB switch case
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
[maier@linux.vnet.ibm.com: re-worded short description for more details]
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:37:01 -04:00
Steffen Maier 5b2fc2a12c scsi: zfcp: fix kernel doc comment typos for struct zfcp_dbf_scsi
Improves commit 250a1352b9 ("[SCSI] zfcp: Redesign of the debug tracing
for SCSI records.")

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:37:01 -04:00
Steffen Maier 9d464fc1b1 scsi: zfcp: use endianness conversions with common FC(P) struct fields
Just to silence sparse. Since zfcp only exists for s390 and
s390 is big endian, this has been working correctly without conversions
and all the new conversions are NOPs so no performance impact.

Nonetheless, use the conversion on the constant expression where possible.

NB: N_Port-IDs have always been handled with hton24 or ntoh24 conversions
because they also convert to / from character array.

Affected common code structs and .fields are:

HOT I/O PATH:
fcp_cmnd .fc_dl
   FCP command: regular SCSI I/O, including DIX case

SEMI-HOT I/O PATH:
fcp_cmnd .fc_dl
   recovery FCP command: task management function (LUN / target reset)
fcp_resp_ext
   FCP response having FCP_SNS_LEN_VAL with .fr_rsp_len .fr_sns_len
   FCP response having FCP_RESID_UNDER with .fr_resid

RECOVERY / DISCOVERY PATHS:
fc_ct_hdr .ct_cmd .ct_mr_size
   zfcp auto port scan [GPN_FT] with fc_gpn_ft_resp.fp_wwpn,
   recovery for returned port [GID_PN] with fc_ns_gid_pn.fn_wwpn,
   get symbolic port name [GSPN],
   register symbolic port name [RSPN] (NPIV only).
fc_els_rscn .rscn_plen
   incoming ELS (RSCN).
fc_els_flogi .fl_wwpn .fl_wwnn
   incoming ELS (PLOGI),
   port open response with .fl_csp.sp_bb_data .fl_cssp[0..3].cp_class,
   FCP channel physical port,
   point-to-point peer (P2P only).
fc_els_logo .fl_n_port_wwn
   incoming ELS (LOGO).
fc_els_adisc .adisc_wwnn .adisc_wwpn
   path test after RSCN for gone target port.

Since v4.10 commit 05de97003c ("linux/types.h: enable endian checks for
all sparse builds"), below sparse endianness reports appear by default.
Previously, one needed to pass argument CF="-D__CHECK_ENDIAN__" to make
as in: $ make C=1 CF="-D__CHECK_ENDIAN__" M=drivers/s390/scsi.

Silenced sparse warnings and one error:

$ make C=1 M=drivers/s390/scsi
...
  CHECK   drivers/s390/scsi/zfcp_dbf.c
drivers/s390/scsi/zfcp_dbf.c:463:22: warning: restricted __be16 degrades to integer
drivers/s390/scsi/zfcp_dbf.c:476:28: warning: restricted __be16 degrades to integer
  CC      drivers/s390/scsi/zfcp_dbf.o
...
  CHECK   drivers/s390/scsi/zfcp_fc.c
drivers/s390/scsi/zfcp_fc.c:263:26: warning: restricted __be16 degrades to integer
drivers/s390/scsi/zfcp_fc.c:299:41: warning: incorrect type in argument 2 (different base types)
drivers/s390/scsi/zfcp_fc.c:299:41:    expected unsigned long long [unsigned] [usertype] wwpn
drivers/s390/scsi/zfcp_fc.c:299:41:    got restricted __be64 [usertype] fl_wwpn
drivers/s390/scsi/zfcp_fc.c:309:40: warning: incorrect type in argument 2 (different base types)
drivers/s390/scsi/zfcp_fc.c:309:40:    expected unsigned long long [unsigned] [usertype] wwpn
drivers/s390/scsi/zfcp_fc.c:309:40:    got restricted __be64 [usertype] fl_n_port_wwn
drivers/s390/scsi/zfcp_fc.c:338:31: warning: restricted __be16 degrades to integer
drivers/s390/scsi/zfcp_fc.c:355:24: warning: incorrect type in assignment (different base types)
drivers/s390/scsi/zfcp_fc.c:355:24:    expected restricted __be16 [usertype] ct_cmd
drivers/s390/scsi/zfcp_fc.c:355:24:    got unsigned short [unsigned] [usertype] cmd
drivers/s390/scsi/zfcp_fc.c:356:28: warning: incorrect type in assignment (different base types)
drivers/s390/scsi/zfcp_fc.c:356:28:    expected restricted __be16 [usertype] ct_mr_size
drivers/s390/scsi/zfcp_fc.c:356:28:    got int
drivers/s390/scsi/zfcp_fc.c:379:36: warning: incorrect type in assignment (different base types)
drivers/s390/scsi/zfcp_fc.c:379:36:    expected restricted __be64 [usertype] fn_wwpn
drivers/s390/scsi/zfcp_fc.c:379:36:    got unsigned long long [unsigned] [usertype] wwpn
drivers/s390/scsi/zfcp_fc.c:463:18: warning: restricted __be64 degrades to integer
drivers/s390/scsi/zfcp_fc.c:465:17: warning: cast from restricted __be64
drivers/s390/scsi/zfcp_fc.c:473:20: warning: incorrect type in assignment (different base types)
drivers/s390/scsi/zfcp_fc.c:473:20:    expected unsigned long long [unsigned] [usertype] wwnn
drivers/s390/scsi/zfcp_fc.c:473:20:    got restricted __be64 [usertype] fl_wwnn
drivers/s390/scsi/zfcp_fc.c:474:29: warning: incorrect type in assignment (different base types)
drivers/s390/scsi/zfcp_fc.c:474:29:    expected unsigned int [unsigned] [usertype] maxframe_size
drivers/s390/scsi/zfcp_fc.c:474:29:    got restricted __be16 [usertype] sp_bb_data
drivers/s390/scsi/zfcp_fc.c:476:30: warning: restricted __be16 degrades to integer
drivers/s390/scsi/zfcp_fc.c:478:30: warning: restricted __be16 degrades to integer
drivers/s390/scsi/zfcp_fc.c:480:30: warning: restricted __be16 degrades to integer
drivers/s390/scsi/zfcp_fc.c:482:30: warning: restricted __be16 degrades to integer
drivers/s390/scsi/zfcp_fc.c:500:28: warning: incorrect type in assignment (different base types)
drivers/s390/scsi/zfcp_fc.c:500:28:    expected unsigned long long [unsigned] [usertype] wwnn
drivers/s390/scsi/zfcp_fc.c:500:28:    got restricted __be64 [usertype] adisc_wwnn
drivers/s390/scsi/zfcp_fc.c:502:38: warning: restricted __be64 degrades to integer
drivers/s390/scsi/zfcp_fc.c:541:40: warning: incorrect type in assignment (different base types)
drivers/s390/scsi/zfcp_fc.c:541:40:    expected restricted __be64 [usertype] adisc_wwpn
drivers/s390/scsi/zfcp_fc.c:541:40:    got unsigned long long [unsigned] [usertype] port_name
drivers/s390/scsi/zfcp_fc.c:542:40: warning: incorrect type in assignment (different base types)
drivers/s390/scsi/zfcp_fc.c:542:40:    expected restricted __be64 [usertype] adisc_wwnn
drivers/s390/scsi/zfcp_fc.c:542:40:    got unsigned long long [unsigned] [usertype] node_name
drivers/s390/scsi/zfcp_fc.c:669:16: warning: restricted __be16 degrades to integer
drivers/s390/scsi/zfcp_fc.c:696:24: warning: restricted __be64 degrades to integer
drivers/s390/scsi/zfcp_fc.c:699:54: warning: incorrect type in argument 2 (different base types)
drivers/s390/scsi/zfcp_fc.c:699:54:    expected unsigned long long [unsigned] [usertype] <noident>
drivers/s390/scsi/zfcp_fc.c:699:54:    got restricted __be64 [usertype] fp_wwpn
  CC      drivers/s390/scsi/zfcp_fc.o
  CHECK   drivers/s390/scsi/zfcp_fsf.c
drivers/s390/scsi/zfcp_fsf.c:479:34: warning: incorrect type in assignment (different base types)
drivers/s390/scsi/zfcp_fsf.c:479:34:    expected unsigned long long [unsigned] [usertype] port_name
drivers/s390/scsi/zfcp_fsf.c:479:34:    got restricted __be64 [usertype] fl_wwpn
drivers/s390/scsi/zfcp_fsf.c:480:34: warning: incorrect type in assignment (different base types)
drivers/s390/scsi/zfcp_fsf.c:480:34:    expected unsigned long long [unsigned] [usertype] node_name
drivers/s390/scsi/zfcp_fsf.c:480:34:    got restricted __be64 [usertype] fl_wwnn
drivers/s390/scsi/zfcp_fsf.c:506:36: warning: incorrect type in assignment (different base types)
drivers/s390/scsi/zfcp_fsf.c:506:36:    expected unsigned long long [unsigned] [usertype] peer_wwpn
drivers/s390/scsi/zfcp_fsf.c:506:36:    got restricted __be64 [usertype] fl_wwpn
drivers/s390/scsi/zfcp_fsf.c:507:36: warning: incorrect type in assignment (different base types)
drivers/s390/scsi/zfcp_fsf.c:507:36:    expected unsigned long long [unsigned] [usertype] peer_wwnn
drivers/s390/scsi/zfcp_fsf.c:507:36:    got restricted __be64 [usertype] fl_wwnn
drivers/s390/scsi/zfcp_fc.h:269:46: warning: restricted __be32 degrades to integer
drivers/s390/scsi/zfcp_fc.h:270:29: error: incompatible types in comparison expression (different base types)

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:37:00 -04:00
Steffen Maier df00d7b8d5 scsi: zfcp: use common code fcp_cmnd and fcp_resp with union in fsf_qtcb_bottom_io
This eases crash dump analysis by automatically dissecting these
protocol headers at least somewhat rather than getting a string
interpretation of large unstructured character array buffer fields.

Also, we can get rid of some unnecessary and error-prone type casts.

This change is possible since v2.6.33 commit 4318e08c84
("[SCSI] zfcp: Update FCP protocol related code").

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:37:00 -04:00
Steffen Maier 394134fd9f scsi: zfcp: clarify that we don't need "link" test on failed open port
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:36:59 -04:00
Steffen Maier ab8ab4be78 scsi: zfcp: more fitting constant for fc_ct_hdr.ct_reason on port scan response
v2.6.33 commit dbf5dfe9db ("[SCSI] zfcp: Use common code definitions for
FC CT structs") replaced own definitions with common code definitions.
While FC_BA_RJT_UNABLE happens to be defined with the same value 9 as
FC_FS_RJT_UNABL and thus also works, here we should use the latter from
fc_gs.h.
See also its use in libfc's fc_disc_gpn_ft_resp().

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:36:59 -04:00
Steffen Maier 5d4a3d0a2f scsi: zfcp: trace high part of "new" 64 bit SCSI LUN
Complements debugging aspects of the otherwise functionally complete
v3.17 commit 9cb78c16f5 ("scsi: use 64-bit LUNs").

While I don't have access to a target exporting 3 or 4 level LUNs,
I did test it by explicitly attaching a non-existent fake 4 level LUN
by means of zfcp sysfs attribute "unit_add".
In order to see corresponding trace records of otherwise successful
events, we had to increase the trace level of area SCSI and HBA to 6.

$ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_scsi/level
$ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_hba/level

$ echo 0x4011402240334044 > \
  /sys/bus/ccw/drivers/zfcp/0.0.1880/0x50050763031bd327/unit_add

Example output formatted by an updated zfcpdbf from the s390-tools
package interspersed with kernel messages at scsi_logging_level=4605:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : scsla_1
LUN            : 0x4011402240334044
WWPN           : 0x50050763031bd327
D_ID           : 0x00......
Adapter status : 0x5400050b
Port status    : 0x54000001
LUN status     : 0x41000000
Ready count    : 0x00000001
Running count  : 0x00000000
ERP want       : 0x01
ERP need       : 0x01

scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 1 length 36
scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0

Timestamp      : ...
Area           : HBA
Subarea        : 00
Level          : 6
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : fs_norm
Request ID     : 0x<inquiry2-req-id>
Request status : 0x00000010
FSF cmnd       : 0x00000001
FSF sequence no: 0x...
FSF issued     : ...
FSF stat       : 0x00000000
FSF stat qual  : 00000000 00000000 00000000 00000000
Prot stat      : 0x00000001
Prot stat qual : ........ ........ 00000000 00000000
Port handle    : 0x...
LUN handle     : 0x...
|
Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 6
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : rsl_nor
Request ID     : 0x<inquiry2-req-id>
SCSI ID        : 0x00000000
SCSI LUN       : 0x40224011
SCSI LUN high  : 0x40444033 <=======================
SCSI result    : 0x00000000
SCSI retries   : 0x00
SCSI allowed   : 0x03
SCSI scribble  : 0x<inquiry2-req-id>
SCSI opcode    : 12000000 a4000000 00000000 00000000
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000000 00000000
                 00000000 00000000

scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 2 length 164
scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0
scsi 2:0:0:4630896905707208721: scsi scan: peripheral device type of 31, \
no device added

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 9cb78c16f5 ("scsi: use 64-bit LUNs")
Cc: <stable@vger.kernel.org> #3.17+
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:36:58 -04:00
Steffen Maier fdb7cee3b9 scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response
At the default trace level, we only trace unsuccessful events including
FSF responses.

zfcp_dbf_hba_fsf_response() only used protocol status and FSF status to
decide on an unsuccessful response. However, this is only one of multiple
possible sources determining a failed struct zfcp_fsf_req.

An FSF request can also "fail" if its response runs into an ERP timeout
or if it gets dismissed because a higher level recovery was triggered
[trace tags "erscf_1" or "erscf_2" in zfcp_erp_strategy_check_fsfreq()].
FSF requests with ERP timeout are:
FSF_QTCB_EXCHANGE_CONFIG_DATA, FSF_QTCB_EXCHANGE_PORT_DATA,
FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT or
FSF_QTCB_CLOSE_PHYSICAL_PORT for target ports,
FSF_QTCB_OPEN_LUN, FSF_QTCB_CLOSE_LUN.
One example is slow queue processing which can cause follow-on errors,
e.g. FSF_PORT_ALREADY_OPEN after FSF_QTCB_OPEN_PORT_WITH_DID timed out.
In order to see the root cause, we need to see late responses even if the
channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD.
Example trace records formatted with zfcpdbf from the s390-tools package:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : ...
Record ID      : 1
Tag            : fcegpf1
LUN            : 0xffffffffffffffff
WWPN           : 0x<WWPN>
D_ID           : 0x00<D_ID>
Adapter status : 0x5400050b
Port status    : 0x41200000
LUN status     : 0x00000000
Ready count    : 0x00000001
Running count  : 0x...
ERP want       : 0x02				ZFCP_ERP_ACTION_REOPEN_PORT
ERP need       : 0x02				ZFCP_ERP_ACTION_REOPEN_PORT
|
Timestamp      : ...				30 seconds later
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : ...
Record ID      : 2
Tag            : erscf_2
LUN            : 0xffffffffffffffff
WWPN           : 0x<WWPN>
D_ID           : 0x00<D_ID>
Adapter status : 0x5400050b
Port status    : 0x41200000
LUN status     : 0x00000000
Request ID     : 0x<request_ID>
ERP status     : 0x10000000			ZFCP_STATUS_ERP_TIMEDOUT
ERP step       : 0x0800				ZFCP_ERP_STEP_PORT_OPENING
ERP action     : 0x02				ZFCP_ERP_ACTION_REOPEN_PORT
ERP count      : 0x00
|
Timestamp      : ...				later than previous record
Area           : HBA
Subarea        : 00
Level          : 5	> default level		=> 3	<= default level
Exception      : -
CPU ID         : 00
Caller         : ...
Record ID      : 1
Tag            : fs_qtcb			=> fs_rerr
Request ID     : 0x<request_ID>
Request status : 0x00001010			ZFCP_STATUS_FSFREQ_DISMISSED
						| ZFCP_STATUS_FSFREQ_CLEANUP
FSF cmnd       : 0x00000005
FSF sequence no: 0x...
FSF issued     : ...				> 30 seconds ago
FSF stat       : 0x00000000			FSF_GOOD
FSF stat qual  : 00000000 00000000 00000000 00000000
Prot stat      : 0x00000001			FSF_PROT_GOOD
Prot stat qual : 00000000 00000000 00000000 00000000
Port handle    : 0x...
LUN handle     : 0x00000000
QTCB log length: ...
QTCB log info  : ...

In case of problems detecting that new responses are waiting on the input
queue, we sooner or later trigger adapter recovery due to an FSF request
timeout (trace tag "fsrth_1").
FSF requests with FSF request timeout are:
typically FSF_QTCB_ABORT_FCP_CMND; but theoretically also
FSF_QTCB_EXCHANGE_CONFIG_DATA or FSF_QTCB_EXCHANGE_PORT_DATA via sysfs,
FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT for WKA ports,
FSF_QTCB_FCP_CMND for task management function (LUN / target reset).
One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD
because the channel filled in the response via DMA into the request's QTCB.

In a theroretical case, inject code can create an erroneous FSF request
on purpose. If data router is enabled, it uses deferred error reporting.
A READ SCSI command can succeed with FSF_PROT_GOOD, FSF_GOOD, and
SAM_STAT_GOOD. But on writing the read data to host memory via DMA,
it can still fail, e.g. if an intentionally wrong scatter list does not
provide enough space. Rather than getting an unsuccessful response,
we get a QDIO activate check which in turn triggers adapter recovery.
One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD
because the channel filled in the response via DMA into the request's QTCB.
Example trace records formatted with zfcpdbf from the s390-tools package:

Timestamp      : ...
Area           : HBA
Subarea        : 00
Level          : 6	> default level		=> 3	<= default level
Exception      : -
CPU ID         : ..
Caller         : ...
Record ID      : 1
Tag            : fs_norm			=> fs_rerr
Request ID     : 0x<request_ID2>
Request status : 0x00001010			ZFCP_STATUS_FSFREQ_DISMISSED
						| ZFCP_STATUS_FSFREQ_CLEANUP
FSF cmnd       : 0x00000001
FSF sequence no: 0x...
FSF issued     : ...
FSF stat       : 0x00000000			FSF_GOOD
FSF stat qual  : 00000000 00000000 00000000 00000000
Prot stat      : 0x00000001			FSF_PROT_GOOD
Prot stat qual : ........ ........ 00000000 00000000
Port handle    : 0x...
LUN handle     : 0x...
|
Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 3
Exception      : -
CPU ID         : ..
Caller         : ...
Record ID      : 1
Tag            : rsl_err
Request ID     : 0x<request_ID2>
SCSI ID        : 0x...
SCSI LUN       : 0x...
SCSI result    : 0x000e0000			DID_TRANSPORT_DISRUPTED
SCSI retries   : 0x00
SCSI allowed   : 0x05
SCSI scribble  : 0x<request_ID2>
SCSI opcode    : 28...				Read(10)
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000000 00000000
                                         ^^	SAM_STAT_GOOD
                 00000000 00000000

Only with luck in both above cases, we could see a follow-on trace record
of an unsuccesful event following a successful but late FSF response with
FSF_PROT_GOOD and FSF_GOOD. Typically this was the case for I/O requests
resulting in a SCSI trace record "rsl_err" with DID_TRANSPORT_DISRUPTED
[On ZFCP_STATUS_FSFREQ_DISMISSED, zfcp_fsf_protstatus_eval() sets
ZFCP_STATUS_FSFREQ_ERROR seen by the request handler functions as failure].
However, the reason for this follow-on trace was invisible because the
corresponding HBA trace record was missing at the default trace level
(by default hidden records with tags "fs_norm", "fs_qtcb", or "fs_open").

On adapter recovery, after we had shut down the QDIO queues, we perform
unsuccessful pseudo completions with flag ZFCP_STATUS_FSFREQ_DISMISSED
for each pending FSF request in zfcp_fsf_req_dismiss_all().
In order to find the root cause, we need to see all pseudo responses even
if the channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD.

Therefore, check zfcp_fsf_req.status for ZFCP_STATUS_FSFREQ_DISMISSED
or ZFCP_STATUS_FSFREQ_ERROR and trace with a new tag "fs_rerr".

It does not matter that there are numerous places which set
ZFCP_STATUS_FSFREQ_ERROR after the location where we trace an FSF response
early. These cases are based on protocol status != FSF_PROT_GOOD or
== FSF_PROT_FSF_STATUS_PRESENTED and are thus already traced by default
as trace tag "fs_perr" or "fs_ferr" respectively.

NB: The trace record with tag "fssrh_1" for status read buffers on dismiss
all remains. zfcp_fsf_req_complete() handles this and returns early.
All other FSF request types are handled separately and as described above.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 8a36e4532e ("[SCSI] zfcp: enhancement of zfcp debug features")
Fixes: 2e261af84c ("[SCSI] zfcp: Only collect FSF/HBA debug data for matching trace levels")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-10 19:36:57 -04:00