OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Manish Rangankar	42d7c10f23	scsi: qedi: Limit number for CQ queues. [qed_sp_iscsi_func_start:189(host_7-0)]Cannot satisfy CQ amount. Queues requested 8, CQs available 4. Aborting function start Above condition will resolve as management firmware is capable of telling us the number of CQs available for a given PF, qed will communicate the same number to qedi, So that qedi will know how much CQs are allowed. Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-24 22:28:50 -04:00
John Garry	30b67de31b	scsi: hisi_sas: remove driver versioning The driver version is not updated with changes to the driver, so it has no value, so just get rid of it. Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-24 22:28:50 -04:00
John Garry	76aae5f60b	scsi: hisi_sas: replace kfree with scsi_host_put Instances of kfree(shost) should be replaced with scsi_host_put(). In addition, a missing scsi_host_put() is added for error path in hisi_sas_shost_alloc_pci() and v3 driver removal. Signed-off-by: Pan Bian <bianpan2016@163.com> # For main.c changes Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-24 22:28:49 -04:00
John Garry	5aec704f0d	scsi: hisi_sas: remove phy_down_v3_hw() res variable Variable res only holds value 0, so remove it. This cleans up a coccicheck warning. Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-24 22:28:49 -04:00
Xiang Chen	2400620c1f	scsi: hisi_sas: add phy_set_linkrate_v3_hw() Add function to set linkrate for v3 hw. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-24 22:28:48 -04:00
Xiang Chen	056e4cc66c	scsi: hisi_sas: update some v3 register init settings This patch updates some register setting according to recommendation from HW designer and experiment. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-24 22:28:48 -04:00
Xiang Chen	a25d0d3df2	scsi: hisi_sas: add reset handler for v3 hw Use ACPI "_RST" method to reset the controller, since FLR is not supported. Function hisi_sas_stop_phys() is introduced to remove some code duplication. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-24 22:28:47 -04:00
Xiang Chen	d499669fac	scsi: hisi_sas: kill tasklet when destroying irq in v3 hw This patch adds calls to kill CQ takslets v3 hw during probe failure. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 20:15:02 -04:00
Xiang Chen	4f73575a79	scsi: hisi_sas: fix v3 hw channel interrupt processing The channel interrupt is to process all the interrupts except PHY UP/DOWN and broadcast interrupt. So we need to clear all the interrupts except those 3 interrupts after processing channel interrupts. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 20:15:02 -04:00
Xiang Chen	8103673108	scsi: hisi_sas: Modify v3 hw STP_LINK_TIMER setting Modify STP link timer from 10ms to 500ms. Also add the register address. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 20:15:02 -04:00
Xiang Chen	031da09c11	scsi: hisi_sas: add status and command buffer for internal abort For v3 hw, internal abort function required status and command buffer to be set, so add necessary code for this. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 20:15:02 -04:00
Xiaofei Tan	c3fe8a2bbb	scsi: hisi_sas: support zone management commands Add two ATA commands, ATA_CMD_ZAC_MGMT_IN and ATA_CMD_ZAC_MGMT_OUT in hisi_sas_get_ata_protocol(), to support SATA SMR disk. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 20:15:02 -04:00
Xiang Chen	640acc9a96	scsi: hisi_sas: service interrupt ITCT_CLR interrupt in v2 hw This patch is a fix related to freeing a device in v2 hw driver. Before, we polled to ITCT CLR interrupt to check if a device is free. This was error prone, as if the interrupt doesn't occur in 10us, we miss processing it. To avoid this situation, service this interrupt and sync the event with a completion. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 20:15:02 -04:00
Xiang Chen	8a253888bf	scsi: hisi_sas: add irq and tasklet cleanup in v2 hw This patch adds support to clean-up allocated IRQs and kill tasklets when probe fails and for driver removal. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 20:15:02 -04:00
Xiang Chen	cef4e1ab7a	scsi: hisi_sas: remove repeated device config in v2 hw This patch removes some repeated configurations: (1) The device id of the device is already set in the alloc function, so we don't need to modify in free device function. (2) Field dev_type and dev_status are configured in hisi_sas_dev_gone(), so there is no need for repeated config in free_device_v3_hw. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 20:15:02 -04:00
John Garry	2b3833510d	scsi: hisi_sas: use array for v2 hw ECC errors The code to print ECC errors in v2 hw driver is very repetitive. This patch condensed the code by looping an array of errors. Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 20:15:02 -04:00
Xiaofei Tan	c52108c61b	scsi: hisi_sas: add v2 hw DFX feature Add DFX feature for v2 hw. We are adding support for the following errors: - loss_of_dword_sync_count - invalid_dword_count - phy_reset_problem_count - running_disparity_error_count Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 20:15:02 -04:00
Xiang Chen	01b361fc90	scsi: hisi_sas: fix v2 hw underflow residual value The value dw0 is the residual bytes when UNDERFLOW error happens, but we filled the residual with the value of dw3 before. So change the residual from dw3 to dw0. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 20:15:01 -04:00
Xiang Chen	c16db73665	scsi: hisi_sas: avoid potential v2 hw interrupt issue When some interrupts happen together, we need to process every interrupt one-by-one, and should not return immediately when one interrupt process is finished being processed. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 20:15:01 -04:00
Xiaofei Tan	917d3bdaf8	scsi: hisi_sas: fix reset and port ID refresh issues This patch provides fixes for the following issues: 1. Fix issue of controller reset required to send commands. For reset process, it may be required to send commands to the controller, but not during soft reset. So add HISI_SAS_NOT_ACCEPT_CMD_BIT to prevent executing a task during this period. 2. Send a broadcast event in rescan topology to detect any topology changes during reset. 3. Previously it was not ensured that libsas has processed the PHY up and down events after reset. Potentially this could cause an issue that we still process the PHY event after reset. So resolve this by flushing shot workqueue in LLDD reset. 4. Port ID requires refresh after reset. The port ID generated after reset is not guaranteed to be the same as before reset, so it needs to be refreshed for each device's ITCT. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 20:15:01 -04:00
Kevin Barnett	b98117caa0	scsi: smartpqi: change driver version to 1.1.2-125 Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:58:26 -04:00
Kevin Barnett	557900640b	scsi: smartpqi: add in new controller ids Update the driver’s PCI IDs to match the latest Microsemi controllers Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:58:26 -04:00
Kevin Barnett	b6d478119e	scsi: smartpqi: update kexec and power down support Add PQI reset to driver shutdown callback to work around controller bug. During an 1.) OS shutdown or 2.) kexec outside of a kdump, the Linux kernel will clear BME on our controller. If BME is cleared during a controller/host PCIe transfer, the controller will lock up. So we perform a PQI reset in the driver's shutdown callback function to eliminate the possibility of a controller/host PCIe transfer being active when the kernel clears BME immediately after calling the driver's shutdown callback. Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:58:26 -04:00
Kevin Barnett	4f078e2408	scsi: smartpqi: cleanup doorbell register usage. Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:58:26 -04:00
Kevin Barnett	41555d540f	scsi: smartpqi: update pqi passthru ioctl - make pass-thru requests bi-directional Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:58:26 -04:00
Kevin Barnett	58322fe006	scsi: smartpqi: enhance BMIC cache flush - distinguish between shutdown and non-shutdown. Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:58:25 -04:00
Kevin Barnett	336b681931	scsi: smartpqi: add pqi reset quiesce support Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:58:25 -04:00
Colin Ian King	0b7250f93f	scsi: dpt_i2o: remove redundant null check on array device The null check on pHba->channel[chan].device is redundant because device is an array and hence can never be null. Remove the check. Detected by CoverityScan, CID#115362 ("Array compared against 0") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:55:35 -04:00
Pan Bian	0b2ce198fa	scsi: qla2xxx: use dma_mapping_error to check map errors The return value of dma_map_single() should be checked by dma_mapping_error(). However, in function qla26xx_dport_diagnostics(), its return value is checked against NULL, which could result in failures. Signed-off-by: Pan Bian <bianpan2016@163.com> Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:55:35 -04:00
Pan Bian	cf99dc30bc	scsi: mvsas: replace kfree with scsi_host_put The return value of scsi_host_alloc() should be released by scsi_host_put(). However, in function mvs_pci_init(), kfree() is used. This patch replaces kfree() with scsi_host_put() to avoid possible memory leaks. Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:55:35 -04:00
Pan Bian	bc1371c181	scsi: pm8001: fix double free in pm8001_pci_probe In function pm8001_pci_probe(), on errors that the control flow jumps to label err_out_ha_free, function pm8001_free() is called. In pm8001_free(), scsi_host_put() is called to release shost, which keeps the return value of scsi_host_alloc(). After pm8001_free() returns, kfree() is called to free shost again, resulting in a double free bug. This patch removes scsi_host_put() from pm8001_free() and explicitly calls scsi_host_put() to release Scsi_Host in need. Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:55:35 -04:00
weiping	3b8328e2e0	scsi: megaraid_sas: fix allocate instance->pd_info twice fix allocate instance->pd_info twice which was introduced by `96188a89cc`. Signed-off-by: weiping zhang <zhangweiping@didichuxing.com> Acked-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:55:35 -04:00
Finn Thain	d60e9eec95	scsi: esp_scsi: Always clear msg_out_len after MESSAGE OUT phase After sending a message, always clear esp->msg_out_len. Otherwise, eh_abort_handler may subsequently fail to send an ABORT TASK SET message. Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:55:35 -04:00
Finn Thain	c69edff5c5	scsi: esp_scsi: Avoid sending ABORT TASK SET messages If an LLD aborts a task set, it should complete the affected commands with the appropriate result code. In a couple of cases esp_scsi doesn't do so. When the initiator receives an unhandled message, just respond by sending a MESSAGE REJECT instead of ABORT TASK SET, and thus avoid the issue. OTOH, a MESSAGE REJECT sent by a target can be taken as an indication that the initiator messed up somehow. It isn't always possible to abort correctly, so just fall back on a SCSI bus reset, which will complete the affected commands with the appropriate result code. For example, certain Apple (Sony) CD-ROM drives, when the non-existent LUN 1 is scanned, can't handle the INQUIRY command. The problem is not detected until the initiator gets a MESSAGE REJECT. Whenever esp_scsi sees that message, it raises ATN and sends ABORT TASK SET -- but neglects to complete the failed scmd. The target then goes into DATA OUT phase (probably bogus), while the ESP device goes into disconnected mode (surprising, given the bus phase). The next Transfer Information command from esp_scsi then causes an Invalid Command interrupt because that command is not valid when in disconnected mode: mac_esp: using PDMA for controller 0 mac_esp mac_esp.0: esp0: regs[50f10000:(null)] irq[19] mac_esp mac_esp.0: esp0: is a ESP236, 16 MHz (ccf=4), SCSI ID 7 scsi host0: esp scsi 0:0:0:0: Direct-Access SEAGATE ST318416N 0010 PQ: 0 ANSI: 3 scsi target0:0:0: Beginning Domain Validation scsi target0:0:0: asynchronous scsi target0:0:0: Domain Validation skipping write tests scsi target0:0:0: Ending Domain Validation scsi 0:0:3:0: CD-ROM SONY CD-ROM CDU-8003A 1.9a PQ: 0 ANSI: 2 CCS scsi target0:0:3: Beginning Domain Validation scsi target0:0:3: FAST-5 SCSI 2.0 MB/s ST (500 ns, offset 15) scsi target0:0:3: Domain Validation skipping write tests scsi target0:0:3: Ending Domain Validation scsi host0: unexpected IREG 40 scsi host0: Dumping command log scsi host0: ent[2] CMD val[c2] sreg[90] seqreg[cc] sreg2[00] ireg[20] ss[01] event[0c] scsi host0: ent[3] CMD val[00] sreg[91] seqreg[04] sreg2[00] ireg[18] ss[00] event[0c] scsi host0: ent[4] EVENT val[0d] sreg[91] seqreg[04] sreg2[00] ireg[18] ss[00] event[0c] scsi host0: ent[5] EVENT val[03] sreg[91] seqreg[04] sreg2[00] ireg[18] ss[00] event[0d] scsi host0: ent[6] CMD val[90] sreg[91] seqreg[04] sreg2[00] ireg[18] ss[00] event[03] scsi host0: ent[7] EVENT val[05] sreg[91] seqreg[04] sreg2[00] ireg[18] ss[00] event[03] scsi host0: ent[8] EVENT val[0d] sreg[93] seqreg[cc] sreg2[00] ireg[10] ss[00] event[05] scsi host0: ent[9] CMD val[01] sreg[93] seqreg[cc] sreg2[00] ireg[10] ss[00] event[0d] scsi host0: ent[10] CMD val[11] sreg[93] seqreg[cc] sreg2[00] ireg[10] ss[00] event[0d] scsi host0: ent[11] EVENT val[0b] sreg[93] seqreg[cc] sreg2[00] ireg[10] ss[00] event[0d] scsi host0: ent[12] CMD val[12] sreg[97] seqreg[cc] sreg2[00] ireg[08] ss[00] event[0b] scsi host0: ent[13] EVENT val[0c] sreg[97] seqreg[cc] sreg2[00] ireg[08] ss[00] event[0b] scsi host0: ent[14] CMD val[44] sreg[90] seqreg[cc] sreg2[00] ireg[20] ss[00] event[0c] scsi host0: ent[15] CMD val[01] sreg[90] seqreg[cc] sreg2[00] ireg[20] ss[01] event[0c] scsi host0: ent[16] CMD val[c2] sreg[90] seqreg[cc] sreg2[00] ireg[20] ss[01] event[0c] scsi host0: ent[17] CMD val[00] sreg[87] seqreg[02] sreg2[00] ireg[18] ss[00] event[0c] scsi host0: ent[18] EVENT val[0d] sreg[87] seqreg[02] sreg2[00] ireg[18] ss[00] event[0c] scsi host0: ent[19] EVENT val[06] sreg[87] seqreg[02] sreg2[00] ireg[18] ss[00] event[0d] scsi host0: ent[20] CMD val[01] sreg[87] seqreg[02] sreg2[00] ireg[18] ss[00] event[06] scsi host0: ent[21] CMD val[10] sreg[87] seqreg[02] sreg2[00] ireg[18] ss[00] event[06] scsi host0: ent[22] CMD val[1a] sreg[87] seqreg[ca] sreg2[00] ireg[08] ss[00] event[06] scsi host0: ent[23] CMD val[12] sreg[87] seqreg[ca] sreg2[00] ireg[08] ss[00] event[06] scsi host0: ent[24] EVENT val[0d] sreg[87] seqreg[ca] sreg2[00] ireg[08] ss[00] event[06] scsi host0: ent[25] EVENT val[09] sreg[86] seqreg[ca] sreg2[00] ireg[10] ss[00] event[0d] scsi host0: ent[26] CMD val[01] sreg[86] seqreg[ca] sreg2[00] ireg[10] ss[00] event[09] scsi host0: ent[27] CMD val[10] sreg[86] seqreg[ca] sreg2[00] ireg[10] ss[00] event[09] scsi host0: ent[28] EVENT val[0a] sreg[86] seqreg[ca] sreg2[00] ireg[10] ss[00] event[09] scsi host0: ent[29] EVENT val[0d] sreg[80] seqreg[ca] sreg2[00] ireg[20] ss[00] event[0a] scsi host0: ent[30] EVENT val[04] sreg[80] seqreg[ca] sreg2[00] ireg[20] ss[00] event[0d] scsi host0: ent[31] CMD val[01] sreg[80] seqreg[ca] sreg2[00] ireg[20] ss[00] event[04] scsi host0: ent[0] CMD val[90] sreg[80] seqreg[ca] sreg2[00] ireg[20] ss[00] event[04] scsi host0: ent[1] EVENT val[05] sreg[80] seqreg[ca] sreg2[00] ireg[20] ss[00] event[04] scsi target0:0:3: FAST-5 SCSI 2.0 MB/s ST (500 ns, offset 15) scsi target0:0:0: asynchronous sr 0:0:3:0: [sr0] scsi-1 drive cdrom: Uniform CD-ROM driver Revision: 3.20 sd 0:0:0:0: Attached scsi generic sg0 type 0 sr 0:0:3:0: Attached scsi generic sg1 type 5 This patch resolves this issue because the bus reset causes the INQUIRY command to fail earlier, and return the appropriate result code. Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:55:35 -04:00
Finn Thain	201c37d7bf	scsi: esp_scsi: Clean up control flow and dead code This patch improves readability. There are no functional changes. Since this touches on a questionable ESP_INTR_DC conditional, add some commentary to help others who may (as I did) find themselves chasing an "Invalid Command" error after the device flags this condition. This cleanup also eliminates a warning from "make W=1": drivers/scsi/esp_scsi.c: In function 'esp_finish_select': drivers/scsi/esp_scsi.c:1233:5: warning: variable 'orig_select_state' set but not used [-Wunused-but-set-variable] u8 orig_select_state; Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:55:34 -04:00
Finn Thain	7640d91d28	scsi: mac_esp: Fix PIO transfers for MESSAGE IN phase When in MESSAGE IN phase, the ESP device does not automatically acknowledge each byte that is transferred by PIO. The mac_esp driver neglects to explicitly ack them, which causes a timeout during messages larger than one byte (e.g. tag bytes during reconnect). Fix this with an ESP_CMD_MOK command after each byte. The MESSAGE IN phase is also different in that each byte transferred raises ESP_INTR_FDONE. So don't exit the transfer loop for this interrupt, for this phase. That resolves the "Reconnect IRQ2 timeout" error on those Macs which use PIO transfers instead of PDMA. This patch also improves on the weak tests for unexpected interrupts and phase changes during PIO transfers. Tested-by: Stan Johnson <userm57@yahoo.com> Fixes: `02507a80b3` ("[PATCH] [SCSI] mac_esp: fix PIO mode, take 2") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:55:34 -04:00
Finn Thain	b36c7db977	scsi: mac_esp: Avoid type warning from sparse Avoid the following warning from "make C=1": CHECK drivers/scsi/mac_esp.c drivers/scsi/mac_esp.c:357:30: warning: incorrect type in initializer (different address spaces) drivers/scsi/mac_esp.c:357:30: expected unsigned char [usertype] fifo drivers/scsi/mac_esp.c:357:30: got void [noderef] <asn:2> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:55:34 -04:00
Bhumika Goyal	5f3342d757	arcmsr: add const to bin_attribute structures Add const to bin_attribute structures as they are only passed to the functions system_{remove/create}_bin_file. The arguments passed are of type const, so declare the structures to be const. Done using Coccinelle. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:40:50 -04:00
Martin Peschke	f32c9e03d4	scsi: zfcp: early returns for traces disabled via level This patch adds early checks to avoid burning CPU cycles on the assembly of trace entries which would be skipped anyway. Introduce a static const variable to keep the trace level to check with debug_level_enabled() in sync with the actual trace emit with debug_event(). In order not to refactor the SAN tracing too much, simply use a define instead. This change is only for the non / semi hot paths, while the actual (I/O) hot path was already improved earlier: zfcp_dbf_scsi() is already guarded by its only caller _zfcp_dbf_scsi() since commit `dcd20e2316` ("[SCSI] zfcp: Only collect SCSI debug data for matching trace levels"). zfcp_dbf_hba_fsf_res() is already guarded by its only caller zfcp_dbf_hba_fsf_response() since commit `2e261af84c` ("[SCSI] zfcp: Only collect FSF/HBA debug data for matching trace levels"). Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com> [maier@linux.vnet.ibm.com: rebase, reword, default level 3 branch prediction] Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:37:03 -04:00
Martin Peschke	b096ef863e	scsi: zfcp: clean up unnecessary module_param_named() with no_auto_port_rescan Improves commit `43f60cbd56` ("[SCSI] zfcp: No automatic port_rescan on events") Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com> [maier@linux.vnet.ibm.com: reword, underscore in description to match sysfs] Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:37:03 -04:00
Martin Peschke	5ec2196060	scsi: zfcp: clean up a member of struct zfcp_qdio that was assigned but never used v2.6.38 commit `a54ca0f62f` ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") dropped trace information previously introduced with v2.6.27 commit `c3baa9a26c` ("[SCSI] zfcp: Add information about interrupt to trace.") but kept and needlessly assigned a now no longer used struct field. Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com> [maier@linux.vnet.ibm.com: reword, added git history] Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:37:02 -04:00
Steffen Maier	46e5ee1f74	scsi: zfcp: clean up no longer existent prototype from zfcp API header Commit `a54ca0f62f` ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.") refactored zfcp_dbf_hba_berr into zfcp_dbf_hba_bit_err but added the prototype for the latter without removing it for the former. Suggested-by: Martin Peschke <mpeschke@linux.vnet.ibm.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:37:02 -04:00
Martin Peschke	5f03e98b0f	scsi: zfcp: clean up redundant code with fall through in link down SRB switch case Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com> [maier@linux.vnet.ibm.com: re-worded short description for more details] Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:37:01 -04:00
Steffen Maier	5b2fc2a12c	scsi: zfcp: fix kernel doc comment typos for struct zfcp_dbf_scsi Improves commit `250a1352b9` ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.") Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:37:01 -04:00
Steffen Maier	9d464fc1b1	scsi: zfcp: use endianness conversions with common FC(P) struct fields Just to silence sparse. Since zfcp only exists for s390 and s390 is big endian, this has been working correctly without conversions and all the new conversions are NOPs so no performance impact. Nonetheless, use the conversion on the constant expression where possible. NB: N_Port-IDs have always been handled with hton24 or ntoh24 conversions because they also convert to / from character array. Affected common code structs and .fields are: HOT I/O PATH: fcp_cmnd .fc_dl FCP command: regular SCSI I/O, including DIX case SEMI-HOT I/O PATH: fcp_cmnd .fc_dl recovery FCP command: task management function (LUN / target reset) fcp_resp_ext FCP response having FCP_SNS_LEN_VAL with .fr_rsp_len .fr_sns_len FCP response having FCP_RESID_UNDER with .fr_resid RECOVERY / DISCOVERY PATHS: fc_ct_hdr .ct_cmd .ct_mr_size zfcp auto port scan [GPN_FT] with fc_gpn_ft_resp.fp_wwpn, recovery for returned port [GID_PN] with fc_ns_gid_pn.fn_wwpn, get symbolic port name [GSPN], register symbolic port name [RSPN] (NPIV only). fc_els_rscn .rscn_plen incoming ELS (RSCN). fc_els_flogi .fl_wwpn .fl_wwnn incoming ELS (PLOGI), port open response with .fl_csp.sp_bb_data .fl_cssp[0..3].cp_class, FCP channel physical port, point-to-point peer (P2P only). fc_els_logo .fl_n_port_wwn incoming ELS (LOGO). fc_els_adisc .adisc_wwnn .adisc_wwpn path test after RSCN for gone target port. Since v4.10 commit `05de97003c` ("linux/types.h: enable endian checks for all sparse builds"), below sparse endianness reports appear by default. Previously, one needed to pass argument CF="-D__CHECK_ENDIAN__" to make as in: $ make C=1 CF="-D__CHECK_ENDIAN__" M=drivers/s390/scsi. Silenced sparse warnings and one error: $ make C=1 M=drivers/s390/scsi ... CHECK drivers/s390/scsi/zfcp_dbf.c drivers/s390/scsi/zfcp_dbf.c:463:22: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_dbf.c:476:28: warning: restricted __be16 degrades to integer CC drivers/s390/scsi/zfcp_dbf.o ... CHECK drivers/s390/scsi/zfcp_fc.c drivers/s390/scsi/zfcp_fc.c:263:26: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:299:41: warning: incorrect type in argument 2 (different base types) drivers/s390/scsi/zfcp_fc.c:299:41: expected unsigned long long [unsigned] [usertype] wwpn drivers/s390/scsi/zfcp_fc.c:299:41: got restricted __be64 [usertype] fl_wwpn drivers/s390/scsi/zfcp_fc.c:309:40: warning: incorrect type in argument 2 (different base types) drivers/s390/scsi/zfcp_fc.c:309:40: expected unsigned long long [unsigned] [usertype] wwpn drivers/s390/scsi/zfcp_fc.c:309:40: got restricted __be64 [usertype] fl_n_port_wwn drivers/s390/scsi/zfcp_fc.c:338:31: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:355:24: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:355:24: expected restricted __be16 [usertype] ct_cmd drivers/s390/scsi/zfcp_fc.c:355:24: got unsigned short [unsigned] [usertype] cmd drivers/s390/scsi/zfcp_fc.c:356:28: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:356:28: expected restricted __be16 [usertype] ct_mr_size drivers/s390/scsi/zfcp_fc.c:356:28: got int drivers/s390/scsi/zfcp_fc.c:379:36: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:379:36: expected restricted __be64 [usertype] fn_wwpn drivers/s390/scsi/zfcp_fc.c:379:36: got unsigned long long [unsigned] [usertype] wwpn drivers/s390/scsi/zfcp_fc.c:463:18: warning: restricted __be64 degrades to integer drivers/s390/scsi/zfcp_fc.c:465:17: warning: cast from restricted __be64 drivers/s390/scsi/zfcp_fc.c:473:20: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:473:20: expected unsigned long long [unsigned] [usertype] wwnn drivers/s390/scsi/zfcp_fc.c:473:20: got restricted __be64 [usertype] fl_wwnn drivers/s390/scsi/zfcp_fc.c:474:29: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:474:29: expected unsigned int [unsigned] [usertype] maxframe_size drivers/s390/scsi/zfcp_fc.c:474:29: got restricted __be16 [usertype] sp_bb_data drivers/s390/scsi/zfcp_fc.c:476:30: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:478:30: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:480:30: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:482:30: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:500:28: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:500:28: expected unsigned long long [unsigned] [usertype] wwnn drivers/s390/scsi/zfcp_fc.c:500:28: got restricted __be64 [usertype] adisc_wwnn drivers/s390/scsi/zfcp_fc.c:502:38: warning: restricted __be64 degrades to integer drivers/s390/scsi/zfcp_fc.c:541:40: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:541:40: expected restricted __be64 [usertype] adisc_wwpn drivers/s390/scsi/zfcp_fc.c:541:40: got unsigned long long [unsigned] [usertype] port_name drivers/s390/scsi/zfcp_fc.c:542:40: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:542:40: expected restricted __be64 [usertype] adisc_wwnn drivers/s390/scsi/zfcp_fc.c:542:40: got unsigned long long [unsigned] [usertype] node_name drivers/s390/scsi/zfcp_fc.c:669:16: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:696:24: warning: restricted __be64 degrades to integer drivers/s390/scsi/zfcp_fc.c:699:54: warning: incorrect type in argument 2 (different base types) drivers/s390/scsi/zfcp_fc.c:699:54: expected unsigned long long [unsigned] [usertype] <noident> drivers/s390/scsi/zfcp_fc.c:699:54: got restricted __be64 [usertype] fp_wwpn CC drivers/s390/scsi/zfcp_fc.o CHECK drivers/s390/scsi/zfcp_fsf.c drivers/s390/scsi/zfcp_fsf.c:479:34: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fsf.c:479:34: expected unsigned long long [unsigned] [usertype] port_name drivers/s390/scsi/zfcp_fsf.c:479:34: got restricted __be64 [usertype] fl_wwpn drivers/s390/scsi/zfcp_fsf.c:480:34: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fsf.c:480:34: expected unsigned long long [unsigned] [usertype] node_name drivers/s390/scsi/zfcp_fsf.c:480:34: got restricted __be64 [usertype] fl_wwnn drivers/s390/scsi/zfcp_fsf.c:506:36: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fsf.c:506:36: expected unsigned long long [unsigned] [usertype] peer_wwpn drivers/s390/scsi/zfcp_fsf.c:506:36: got restricted __be64 [usertype] fl_wwpn drivers/s390/scsi/zfcp_fsf.c:507:36: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fsf.c:507:36: expected unsigned long long [unsigned] [usertype] peer_wwnn drivers/s390/scsi/zfcp_fsf.c:507:36: got restricted __be64 [usertype] fl_wwnn drivers/s390/scsi/zfcp_fc.h:269:46: warning: restricted __be32 degrades to integer drivers/s390/scsi/zfcp_fc.h:270:29: error: incompatible types in comparison expression (different base types) Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:37:00 -04:00
Steffen Maier	df00d7b8d5	scsi: zfcp: use common code fcp_cmnd and fcp_resp with union in fsf_qtcb_bottom_io This eases crash dump analysis by automatically dissecting these protocol headers at least somewhat rather than getting a string interpretation of large unstructured character array buffer fields. Also, we can get rid of some unnecessary and error-prone type casts. This change is possible since v2.6.33 commit `4318e08c84` ("[SCSI] zfcp: Update FCP protocol related code"). Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:37:00 -04:00
Steffen Maier	394134fd9f	scsi: zfcp: clarify that we don't need "link" test on failed open port Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:36:59 -04:00
Steffen Maier	ab8ab4be78	scsi: zfcp: more fitting constant for fc_ct_hdr.ct_reason on port scan response v2.6.33 commit `dbf5dfe9db` ("[SCSI] zfcp: Use common code definitions for FC CT structs") replaced own definitions with common code definitions. While FC_BA_RJT_UNABLE happens to be defined with the same value 9 as FC_FS_RJT_UNABL and thus also works, here we should use the latter from fc_gs.h. See also its use in libfc's fc_disc_gpn_ft_resp(). Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:36:59 -04:00
Steffen Maier	5d4a3d0a2f	scsi: zfcp: trace high part of "new" 64 bit SCSI LUN Complements debugging aspects of the otherwise functionally complete v3.17 commit `9cb78c16f5` ("scsi: use 64-bit LUNs"). While I don't have access to a target exporting 3 or 4 level LUNs, I did test it by explicitly attaching a non-existent fake 4 level LUN by means of zfcp sysfs attribute "unit_add". In order to see corresponding trace records of otherwise successful events, we had to increase the trace level of area SCSI and HBA to 6. $ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_scsi/level $ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_hba/level $ echo 0x4011402240334044 > \ /sys/bus/ccw/drivers/zfcp/0.0.1880/0x50050763031bd327/unit_add Example output formatted by an updated zfcpdbf from the s390-tools package interspersed with kernel messages at scsi_logging_level=4605: Timestamp : ... Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : scsla_1 LUN : 0x4011402240334044 WWPN : 0x50050763031bd327 D_ID : 0x00...... Adapter status : 0x5400050b Port status : 0x54000001 LUN status : 0x41000000 Ready count : 0x00000001 Running count : 0x00000000 ERP want : 0x01 ERP need : 0x01 scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 1 length 36 scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0 Timestamp : ... Area : HBA Subarea : 00 Level : 6 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : fs_norm Request ID : 0x<inquiry2-req-id> Request status : 0x00000010 FSF cmnd : 0x00000001 FSF sequence no: 0x... FSF issued : ... FSF stat : 0x00000000 FSF stat qual : 00000000 00000000 00000000 00000000 Prot stat : 0x00000001 Prot stat qual : ........ ........ 00000000 00000000 Port handle : 0x... LUN handle : 0x... \| Timestamp : ... Area : SCSI Subarea : 00 Level : 6 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : rsl_nor Request ID : 0x<inquiry2-req-id> SCSI ID : 0x00000000 SCSI LUN : 0x40224011 SCSI LUN high : 0x40444033 <======================= SCSI result : 0x00000000 SCSI retries : 0x00 SCSI allowed : 0x03 SCSI scribble : 0x<inquiry2-req-id> SCSI opcode : 12000000 a4000000 00000000 00000000 FCP rsp inf cod: 0x00 FCP rsp IU : 00000000 00000000 00000000 00000000 00000000 00000000 scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 2 length 164 scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0 scsi 2:0:0:4630896905707208721: scsi scan: peripheral device type of 31, \ no device added Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: `9cb78c16f5` ("scsi: use 64-bit LUNs") Cc: <stable@vger.kernel.org> #3.17+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:36:58 -04:00
Steffen Maier	fdb7cee3b9	scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response At the default trace level, we only trace unsuccessful events including FSF responses. zfcp_dbf_hba_fsf_response() only used protocol status and FSF status to decide on an unsuccessful response. However, this is only one of multiple possible sources determining a failed struct zfcp_fsf_req. An FSF request can also "fail" if its response runs into an ERP timeout or if it gets dismissed because a higher level recovery was triggered [trace tags "erscf_1" or "erscf_2" in zfcp_erp_strategy_check_fsfreq()]. FSF requests with ERP timeout are: FSF_QTCB_EXCHANGE_CONFIG_DATA, FSF_QTCB_EXCHANGE_PORT_DATA, FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT or FSF_QTCB_CLOSE_PHYSICAL_PORT for target ports, FSF_QTCB_OPEN_LUN, FSF_QTCB_CLOSE_LUN. One example is slow queue processing which can cause follow-on errors, e.g. FSF_PORT_ALREADY_OPEN after FSF_QTCB_OPEN_PORT_WITH_DID timed out. In order to see the root cause, we need to see late responses even if the channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD. Example trace records formatted with zfcpdbf from the s390-tools package: Timestamp : ... Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : ... Record ID : 1 Tag : fcegpf1 LUN : 0xffffffffffffffff WWPN : 0x<WWPN> D_ID : 0x00<D_ID> Adapter status : 0x5400050b Port status : 0x41200000 LUN status : 0x00000000 Ready count : 0x00000001 Running count : 0x... ERP want : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT ERP need : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT \| Timestamp : ... 30 seconds later Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : ... Record ID : 2 Tag : erscf_2 LUN : 0xffffffffffffffff WWPN : 0x<WWPN> D_ID : 0x00<D_ID> Adapter status : 0x5400050b Port status : 0x41200000 LUN status : 0x00000000 Request ID : 0x<request_ID> ERP status : 0x10000000 ZFCP_STATUS_ERP_TIMEDOUT ERP step : 0x0800 ZFCP_ERP_STEP_PORT_OPENING ERP action : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT ERP count : 0x00 \| Timestamp : ... later than previous record Area : HBA Subarea : 00 Level : 5 > default level => 3 <= default level Exception : - CPU ID : 00 Caller : ... Record ID : 1 Tag : fs_qtcb => fs_rerr Request ID : 0x<request_ID> Request status : 0x00001010 ZFCP_STATUS_FSFREQ_DISMISSED \| ZFCP_STATUS_FSFREQ_CLEANUP FSF cmnd : 0x00000005 FSF sequence no: 0x... FSF issued : ... > 30 seconds ago FSF stat : 0x00000000 FSF_GOOD FSF stat qual : 00000000 00000000 00000000 00000000 Prot stat : 0x00000001 FSF_PROT_GOOD Prot stat qual : 00000000 00000000 00000000 00000000 Port handle : 0x... LUN handle : 0x00000000 QTCB log length: ... QTCB log info : ... In case of problems detecting that new responses are waiting on the input queue, we sooner or later trigger adapter recovery due to an FSF request timeout (trace tag "fsrth_1"). FSF requests with FSF request timeout are: typically FSF_QTCB_ABORT_FCP_CMND; but theoretically also FSF_QTCB_EXCHANGE_CONFIG_DATA or FSF_QTCB_EXCHANGE_PORT_DATA via sysfs, FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT for WKA ports, FSF_QTCB_FCP_CMND for task management function (LUN / target reset). One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD because the channel filled in the response via DMA into the request's QTCB. In a theroretical case, inject code can create an erroneous FSF request on purpose. If data router is enabled, it uses deferred error reporting. A READ SCSI command can succeed with FSF_PROT_GOOD, FSF_GOOD, and SAM_STAT_GOOD. But on writing the read data to host memory via DMA, it can still fail, e.g. if an intentionally wrong scatter list does not provide enough space. Rather than getting an unsuccessful response, we get a QDIO activate check which in turn triggers adapter recovery. One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD because the channel filled in the response via DMA into the request's QTCB. Example trace records formatted with zfcpdbf from the s390-tools package: Timestamp : ... Area : HBA Subarea : 00 Level : 6 > default level => 3 <= default level Exception : - CPU ID : .. Caller : ... Record ID : 1 Tag : fs_norm => fs_rerr Request ID : 0x<request_ID2> Request status : 0x00001010 ZFCP_STATUS_FSFREQ_DISMISSED \| ZFCP_STATUS_FSFREQ_CLEANUP FSF cmnd : 0x00000001 FSF sequence no: 0x... FSF issued : ... FSF stat : 0x00000000 FSF_GOOD FSF stat qual : 00000000 00000000 00000000 00000000 Prot stat : 0x00000001 FSF_PROT_GOOD Prot stat qual : ........ ........ 00000000 00000000 Port handle : 0x... LUN handle : 0x... \| Timestamp : ... Area : SCSI Subarea : 00 Level : 3 Exception : - CPU ID : .. Caller : ... Record ID : 1 Tag : rsl_err Request ID : 0x<request_ID2> SCSI ID : 0x... SCSI LUN : 0x... SCSI result : 0x000e0000 DID_TRANSPORT_DISRUPTED SCSI retries : 0x00 SCSI allowed : 0x05 SCSI scribble : 0x<request_ID2> SCSI opcode : 28... Read(10) FCP rsp inf cod: 0x00 FCP rsp IU : 00000000 00000000 00000000 00000000 ^^ SAM_STAT_GOOD 00000000 00000000 Only with luck in both above cases, we could see a follow-on trace record of an unsuccesful event following a successful but late FSF response with FSF_PROT_GOOD and FSF_GOOD. Typically this was the case for I/O requests resulting in a SCSI trace record "rsl_err" with DID_TRANSPORT_DISRUPTED [On ZFCP_STATUS_FSFREQ_DISMISSED, zfcp_fsf_protstatus_eval() sets ZFCP_STATUS_FSFREQ_ERROR seen by the request handler functions as failure]. However, the reason for this follow-on trace was invisible because the corresponding HBA trace record was missing at the default trace level (by default hidden records with tags "fs_norm", "fs_qtcb", or "fs_open"). On adapter recovery, after we had shut down the QDIO queues, we perform unsuccessful pseudo completions with flag ZFCP_STATUS_FSFREQ_DISMISSED for each pending FSF request in zfcp_fsf_req_dismiss_all(). In order to find the root cause, we need to see all pseudo responses even if the channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD. Therefore, check zfcp_fsf_req.status for ZFCP_STATUS_FSFREQ_DISMISSED or ZFCP_STATUS_FSFREQ_ERROR and trace with a new tag "fs_rerr". It does not matter that there are numerous places which set ZFCP_STATUS_FSFREQ_ERROR after the location where we trace an FSF response early. These cases are based on protocol status != FSF_PROT_GOOD or == FSF_PROT_FSF_STATUS_PRESENTED and are thus already traced by default as trace tag "fs_perr" or "fs_ferr" respectively. NB: The trace record with tag "fssrh_1" for status read buffers on dismiss all remains. zfcp_fsf_req_complete() handles this and returns early. All other FSF request types are handled separately and as described above. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: `8a36e4532e` ("[SCSI] zfcp: enhancement of zfcp debug features") Fixes: `2e261af84c` ("[SCSI] zfcp: Only collect FSF/HBA debug data for matching trace levels") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-08-10 19:36:57 -04:00

1 2 3 4 5 ...

692639 Commits All Branches Search

692639 Commits

All Branches