OpenCloudOS-Kernel/drivers/s390/scsi/zfcp_scsi.c

775 lines
22 KiB
C
Raw Normal View History

/*
* zfcp device driver
*
* Interface to Linux SCSI midlayer.
*
scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA Without this fix we get SCSI trace records on task management functions which cannot be correlated to HBA trace records because all fields related to the FSF request are empty (zero). Also, the FCP_RSP_IU is missing as well as any sense data if available. This was caused by v2.6.14 commit 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features") introducing trace records for TMFs but hard coding NULL for a possibly existing TMF FSF request. The scsi_cmnd scribble is also zero or unrelated for the TMF request so it also could not lookup a suitable FSF request from there. A broken example trace record formatted with zfcpdbf from the s390-tools package: Timestamp : ... Area : SCSI Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : lr_fail Request ID : 0x0000000000000000 ^^^^^^^^^^^^^^^^ no correlation to HBA record SCSI ID : 0x<scsitarget> SCSI LUN : 0x<scsilun> SCSI result : 0x000e0000 SCSI retries : 0x00 SCSI allowed : 0x05 SCSI scribble : 0x0000000000000000 SCSI opcode : 2a000017 3bb80000 08000000 00000000 FCP rsp inf cod: 0x00 ^^ no TMF response FCP rsp IU : 00000000 00000000 00000000 00000000 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 00000000 00000000 ^^^^^^^^^^^^^^^^^ no interesting FCP_RSP_IU Sense len : ... ^^^^^^^^^^^^^^^^^^^^ no sense data length Sense info : ... ^^^^^^^^^^^^^^^^^^^^ no sense data content, even if present There are some true cases where we really do not have an FSF request: "rsl_fai" from zfcp_dbf_scsi_fail_send() called for early returns / completions in zfcp_scsi_queuecommand(), "abrt_or", "abrt_bl", "abrt_ru", "abrt_ar" from zfcp_scsi_eh_abort_handler() where we did not get as far, "lr_nres", "tr_nres" from zfcp_task_mgmt_function() where we're successful and do not need to do anything because adapter stopped. For these cases it's correct to pass NULL for fsf_req to _zfcp_dbf_scsi(). Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-07-28 18:30:54 +08:00
* Copyright IBM Corp. 2002, 2017
*/
#define KMSG_COMPONENT "zfcp"
#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
#include <linux/module.h>
#include <linux/types.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
#include <linux/slab.h>
#include <scsi/fc/fc_fcp.h>
#include <scsi/scsi_eh.h>
#include <linux/atomic.h>
#include "zfcp_ext.h"
#include "zfcp_dbf.h"
#include "zfcp_fc.h"
#include "zfcp_reqlist.h"
static unsigned int default_depth = 32;
module_param_named(queue_depth, default_depth, uint, 0600);
MODULE_PARM_DESC(queue_depth, "Default queue depth for new SCSI devices");
static bool enable_dif;
module_param_named(dif, enable_dif, bool, 0400);
MODULE_PARM_DESC(dif, "Enable DIF/DIX data integrity support");
static bool allow_lun_scan = true;
module_param(allow_lun_scan, bool, 0600);
MODULE_PARM_DESC(allow_lun_scan, "For NPIV, scan and attach all storage LUNs");
static void zfcp_scsi_slave_destroy(struct scsi_device *sdev)
{
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev);
/* if previous slave_alloc returned early, there is nothing to do */
if (!zfcp_sdev->port)
return;
zfcp_erp_lun_shutdown_wait(sdev, "scssd_1");
put_device(&zfcp_sdev->port->dev);
}
static int zfcp_scsi_slave_configure(struct scsi_device *sdp)
{
if (sdp->tagged_supported)
scsi_change_queue_depth(sdp, default_depth);
return 0;
}
static void zfcp_scsi_command_fail(struct scsi_cmnd *scpnt, int result)
{
set_host_byte(scpnt, result);
zfcp_dbf_scsi_fail_send(scpnt);
scpnt->scsi_done(scpnt);
}
[SCSI] zfcp: Issue FCP command without holding SCSI host_lock Interrupting the connection to the FCP channel while I/O requests are being issued can lead to this deadlock. scsi_dispatch_cmd already holds the host_lock while the recovery trigger tries to acquire the host_lock again when iterating through the scsi_devices. INFO: lockdep is turned off. BUG: spinlock lockup on CPU#1, blast/9660, 0000000078f38878 CPU: 1 Not tainted 2.6.35.7SWEN2 #2 Process blast (pid: 9660, task: 0000000071f75940, ksp: 0000000074393ac0) 0000000074393640 00000000743935c0 0000000000000002 0000000000000000 0000000074393660 00000000743935d8 00000000743935d8 00000000005590c2 0000000000000000 0000000078f38878 0000000026ede800 0000000078f38878 000000000000000d 040000000000000c 0000000074393628 0000000000000000 0000000000000000 0000000000100b2a 00000000743935c0 0000000074393600 Call Trace: ([<0000000000100a32>] show_trace+0xee/0x144) [<00000000003be202>] do_raw_spin_lock+0x112/0x178 [<000000000055d408>] _raw_spin_lock_irqsave+0x90/0xb0 [<00000000003f1514>] __scsi_iterate_devices+0x38/0xbc [<00000000004849b0>] zfcp_erp_clear_adapter_status+0xd0/0x16c [<000000000048587a>] zfcp_erp_adapter_reopen+0x3a/0xb4 [<0000000000489812>] zfcp_fsf_req_send+0x166/0x180 [<000000000048c8d6>] zfcp_fsf_fcp_cmnd+0x272/0x408 [<000000000048f864>] zfcp_scsi_queuecommand+0x11c/0x1e0 [<00000000003f1f2a>] scsi_dispatch_cmd+0x1d6/0x324 [<00000000003f9910>] scsi_request_fn+0x42c/0x56c [<00000000003828ae>] __blk_run_queue+0x86/0x140 [<000000000037f742>] elv_insert+0x11a/0x208 [<000000000038104c>] blk_insert_cloned_request+0x84/0xe4 [<000003c0032b7c64>] dm_dispatch_request+0x6c/0x94 [dm_mod] [<000003c0032b7d5c>] map_request+0xd0/0x100 [dm_mod] [<000003c0032b9a78>] dm_request_fn+0xec/0x1bc [dm_mod] [<0000000000382c0e>] generic_unplug_device+0x5a/0x6c [<000003c0032b7f98>] dm_unplug_all+0x74/0x9c [dm_mod] [<00000000001d1272>] sync_page+0x76/0x9c [<00000000001d12ba>] sync_page_killable+0x22/0x60 [<000000000055a768>] __wait_on_bit_lock+0xc0/0x124 [<00000000001d1140>] __lock_page_killable+0x78/0x84 [<00000000001d351c>] generic_file_aio_read+0x5a4/0x7e8 [<0000000000228ec0>] do_sync_read+0xc8/0x12c [<0000000000229edc>] vfs_read+0xac/0x1ac [<000000000022a0d8>] SyS_read+0x58/0xa8 [<00000000001146de>] sysc_noemu+0x10/0x16 [<00000200000493c4>] 0x200000493c4 INFO: lockdep is turned off. Call zfcp_fsf_fcp_cmnd without the host_lock and disable the interrupts when acquiring the req_q_lock. According to the patch description in "[PATCH] Eliminate error handler overload of the SCSI serial number", the serial_number is not used, so simply drop the queuecommand wrapper function and run zfcp_scsi_queuecommand without holding the host_lock. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-11-18 21:53:18 +08:00
static
int zfcp_scsi_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scpnt)
{
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(scpnt->device);
struct fc_rport *rport = starget_to_rport(scsi_target(scpnt->device));
int status, scsi_result, ret;
/* reset the status for this request */
scpnt->result = 0;
scpnt->host_scribble = NULL;
scsi_result = fc_remote_port_chkready(rport);
if (unlikely(scsi_result)) {
scpnt->result = scsi_result;
zfcp_dbf_scsi_fail_send(scpnt);
scpnt->scsi_done(scpnt);
return 0;
}
status = atomic_read(&zfcp_sdev->status);
if (unlikely(status & ZFCP_STATUS_COMMON_ERP_FAILED) &&
!(atomic_read(&zfcp_sdev->port->status) &
ZFCP_STATUS_COMMON_ERP_FAILED)) {
/* only LUN access denied, but port is good
* not covered by FC transport, have to fail here */
zfcp_scsi_command_fail(scpnt, DID_ERROR);
return 0;
}
if (unlikely(!(status & ZFCP_STATUS_COMMON_UNBLOCKED))) {
scsi: zfcp: fix rport unblock race with LUN recovery It is unavoidable that zfcp_scsi_queuecommand() has to finish requests with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time window when zfcp detected an unavailable rport but fc_remote_port_delete(), which is asynchronous via zfcp_scsi_schedule_rport_block(), has not yet blocked the rport. However, for the case when the rport becomes available again, we should prevent unblocking the rport too early. In contrast to other FCP LLDDs, zfcp has to open each LUN with the FCP channel hardware before it can send I/O to a LUN. So if a port already has LUNs attached and we unblock the rport just after port recovery, recoveries of LUNs behind this port can still be pending which in turn force zfcp_scsi_queuecommand() to unnecessarily finish requests with DID_IMM_RETRY. This also opens a time window with unblocked rport (until the followup LUN reopen recovery has finished). If a scsi_cmnd timeout occurs during this time window fc_timed_out() cannot work as desired and such command would indeed time out and trigger scsi_eh. This prevents a clean and timely path failover. This should not happen if the path issue can be recovered on FC transport layer such as path issues involving RSCNs. Fix this by only calling zfcp_scsi_schedule_rport_register(), to asynchronously trigger fc_remote_port_add(), after all LUN recoveries as children of the rport have finished and no new recoveries of equal or higher order were triggered meanwhile. Finished intentionally includes any recovery result no matter if successful or failed (still unblock rport so other successful LUNs work). For simplicity, we check after each finished LUN recovery if there is another LUN recovery pending on the same port and then do nothing. We handle the special case of a successful recovery of a port without LUN children the same way without changing this case's semantics. For debugging we introduce 2 new trace records written if the rport unblock attempt was aborted due to still unfinished or freshly triggered recovery. The records are only written above the default trace level. Benjamin noticed the important special case of new recovery that can be triggered between having given up the erp_lock and before calling zfcp_erp_action_cleanup() within zfcp_erp_strategy(). We must avoid the following sequence: ERP thread rport_work other context ------------------------- -------------- -------------------------------- port is unblocked, rport still blocked, due to pending/running ERP action, so ((port->status & ...UNBLOCK) != 0) and (port->rport == NULL) unlock ERP zfcp_erp_action_cleanup() case ZFCP_ERP_ACTION_REOPEN_LUN: zfcp_erp_try_rport_unblock() ((status & ...UNBLOCK) != 0) [OLD!] zfcp_erp_port_reopen() lock ERP zfcp_erp_port_block() port->status clear ...UNBLOCK unlock ERP zfcp_scsi_schedule_rport_block() port->rport_task = RPORT_DEL queue_work(rport_work) zfcp_scsi_rport_work() (port->rport_task != RPORT_ADD) port->rport_task = RPORT_NONE zfcp_scsi_rport_block() if (!port->rport) return zfcp_scsi_schedule_rport_register() port->rport_task = RPORT_ADD queue_work(rport_work) zfcp_scsi_rport_work() (port->rport_task == RPORT_ADD) port->rport_task = RPORT_NONE zfcp_scsi_rport_register() (port->rport == NULL) rport = fc_remote_port_add() port->rport = rport; Now the rport was erroneously unblocked while the zfcp_port is blocked. This is another situation we want to avoid due to scsi_eh potential. This state would at least remain until the new recovery from the other context finished successfully, or potentially forever if it failed. In order to close this race, we take the erp_lock inside zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or LUN. With that, the possible corresponding rport state sequences would be: (unblock[ERP thread],block[other context]) if the ERP thread gets erp_lock first and still sees ((port->status & ...UNBLOCK) != 0), (block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock after the other context has already cleard ...UNBLOCK from port->status. Since checking fields of struct erp_action is unsafe because they could have been overwritten (re-used for new recovery) meanwhile, we only check status of zfcp_port and LUN since these are only changed under erp_lock elsewhere. Regarding the check of the proper status flags (port or port_forced are similar to the shown adapter recovery): [zfcp_erp_adapter_shutdown()] zfcp_erp_adapter_reopen() zfcp_erp_adapter_block() * clear UNBLOCK ---------------------------------------+ zfcp_scsi_schedule_rports_block() | write_lock_irqsave(&adapter->erp_lock, flags);-------+ | zfcp_erp_action_enqueue() | | zfcp_erp_setup_act() | | * set ERP_INUSE -----------------------------------|--|--+ write_unlock_irqrestore(&adapter->erp_lock, flags);--+ | | .context-switch. | | zfcp_erp_thread() | | zfcp_erp_strategy() | | write_lock_irqsave(&adapter->erp_lock, flags);------+ | | ... | | | zfcp_erp_strategy_check_target() | | | zfcp_erp_strategy_check_adapter() | | | zfcp_erp_adapter_unblock() | | | * set UNBLOCK -----------------------------------|--+ | zfcp_erp_action_dequeue() | | * clear ERP_INUSE ---------------------------------|-----+ ... | write_unlock_irqrestore(&adapter->erp_lock, flags);-+ Hence, we should check for both UNBLOCK and ERP_INUSE because they are interleaved. Also we need to explicitly check ERP_FAILED for the link down case which currently does not clear the UNBLOCK flag in zfcp_fsf_link_down_info_eval(). Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 8830271c4819 ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport") Fixes: a2fa0aede07c ("[SCSI] zfcp: Block FC transport rports early on errors") Fixes: 5f852be9e11d ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI") Fixes: 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable") Fixes: 3859f6a248cb ("[PATCH] zfcp: add rports to enable scsi_add_device to work again") Cc: <stable@vger.kernel.org> #2.6.32+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-12-10 00:16:33 +08:00
/* This could be
* call to rport_delete pending: mimic retry from
* fc_remote_port_chkready until rport is BLOCKED
*/
zfcp_scsi_command_fail(scpnt, DID_IMM_RETRY);
return 0;
}
ret = zfcp_fsf_fcp_cmnd(scpnt);
if (unlikely(ret == -EBUSY))
return SCSI_MLQUEUE_DEVICE_BUSY;
else if (unlikely(ret < 0))
return SCSI_MLQUEUE_HOST_BUSY;
return ret;
}
static int zfcp_scsi_slave_alloc(struct scsi_device *sdev)
{
struct fc_rport *rport = starget_to_rport(scsi_target(sdev));
struct zfcp_adapter *adapter =
(struct zfcp_adapter *) sdev->host->hostdata[0];
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev);
struct zfcp_port *port;
struct zfcp_unit *unit;
int npiv = adapter->connection_features & FSF_FEATURE_NPIV_MODE;
port = zfcp_get_port_by_wwpn(adapter, rport->port_name);
if (!port)
return -ENXIO;
unit = zfcp_unit_find(port, zfcp_scsi_dev_lun(sdev));
if (unit)
put_device(&unit->dev);
if (!unit && !(allow_lun_scan && npiv)) {
put_device(&port->dev);
return -ENXIO;
}
zfcp_sdev->port = port;
zfcp_sdev->latencies.write.channel.min = 0xFFFFFFFF;
zfcp_sdev->latencies.write.fabric.min = 0xFFFFFFFF;
zfcp_sdev->latencies.read.channel.min = 0xFFFFFFFF;
zfcp_sdev->latencies.read.fabric.min = 0xFFFFFFFF;
zfcp_sdev->latencies.cmd.channel.min = 0xFFFFFFFF;
zfcp_sdev->latencies.cmd.fabric.min = 0xFFFFFFFF;
spin_lock_init(&zfcp_sdev->latencies.lock);
zfcp_erp_set_lun_status(sdev, ZFCP_STATUS_COMMON_RUNNING);
zfcp_erp_lun_reopen(sdev, 0, "scsla_1");
zfcp_erp_wait(port->adapter);
return 0;
}
static int zfcp_scsi_eh_abort_handler(struct scsi_cmnd *scpnt)
{
struct Scsi_Host *scsi_host = scpnt->device->host;
struct zfcp_adapter *adapter =
(struct zfcp_adapter *) scsi_host->hostdata[0];
struct zfcp_fsf_req *old_req, *abrt_req;
unsigned long flags;
unsigned long old_reqid = (unsigned long) scpnt->host_scribble;
int retval = SUCCESS, ret;
int retry = 3;
char *dbf_tag;
/* avoid race condition between late normal completion and abort */
write_lock_irqsave(&adapter->abort_lock, flags);
old_req = zfcp_reqlist_find(adapter->req_list, old_reqid);
if (!old_req) {
write_unlock_irqrestore(&adapter->abort_lock, flags);
zfcp_dbf_scsi_abort("abrt_or", scpnt, NULL);
return FAILED; /* completion could be in progress */
}
old_req->data = NULL;
/* don't access old fsf_req after releasing the abort_lock */
write_unlock_irqrestore(&adapter->abort_lock, flags);
while (retry--) {
abrt_req = zfcp_fsf_abort_fcp_cmnd(scpnt);
if (abrt_req)
break;
zfcp_erp_wait(adapter);
ret = fc_block_scsi_eh(scpnt);
if (ret) {
zfcp_dbf_scsi_abort("abrt_bl", scpnt, NULL);
return ret;
}
if (!(atomic_read(&adapter->status) &
ZFCP_STATUS_COMMON_RUNNING)) {
zfcp_dbf_scsi_abort("abrt_ru", scpnt, NULL);
return SUCCESS;
}
}
if (!abrt_req) {
zfcp_dbf_scsi_abort("abrt_ar", scpnt, NULL);
return FAILED;
}
wait_for_completion(&abrt_req->completion);
if (abrt_req->status & ZFCP_STATUS_FSFREQ_ABORTSUCCEEDED)
dbf_tag = "abrt_ok";
else if (abrt_req->status & ZFCP_STATUS_FSFREQ_ABORTNOTNEEDED)
dbf_tag = "abrt_nn";
else {
dbf_tag = "abrt_fa";
retval = FAILED;
}
zfcp_dbf_scsi_abort(dbf_tag, scpnt, abrt_req);
zfcp_fsf_req_free(abrt_req);
return retval;
}
scsi: zfcp: fix use-after-"free" in FC ingress path after TMF When SCSI EH invokes zFCP's callbacks for eh_device_reset_handler() and eh_target_reset_handler(), it expects us to relent the ownership over the given scsi_cmnd and all other scsi_cmnds within the same scope - LUN or target - when returning with SUCCESS from the callback ('release' them). SCSI EH can then reuse those commands. We did not follow this rule to release commands upon SUCCESS; and if later a reply arrived for one of those supposed to be released commands, we would still make use of the scsi_cmnd in our ingress tasklet. This will at least result in undefined behavior or a kernel panic because of a wrong kernel pointer dereference. To fix this, we NULLify all pointers to scsi_cmnds (struct zfcp_fsf_req *)->data in the matching scope if a TMF was successful. This is done under the locks (struct zfcp_adapter *)->abort_lock and (struct zfcp_reqlist *)->lock to prevent the requests from being removed from the request-hashtable, and the ingress tasklet from making use of the scsi_cmnd-pointer in zfcp_fsf_fcp_cmnd_handler(). For cases where a reply arrives during SCSI EH, but before we get a chance to NULLify the pointer - but before we return from the callback -, we assume that the code is protected from races via the CAS operation in blk_complete_request() that is called in scsi_done(). The following stacktrace shows an example for a crash resulting from the previous behavior: Unable to handle kernel pointer dereference at virtual kernel address fffffee17a672000 Oops: 0038 [#1] SMP CPU: 2 PID: 0 Comm: swapper/2 Not tainted task: 00000003f7ff5be0 ti: 00000003f3d38000 task.ti: 00000003f3d38000 Krnl PSW : 0404d00180000000 00000000001156b0 (smp_vcpu_scheduled+0x18/0x40) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3 Krnl GPRS: 000000200000007e 0000000000000000 fffffee17a671fd8 0000000300000015 ffffffff80000000 00000000005dfde8 07000003f7f80e00 000000004fa4e800 000000036ce8d8f8 000000036ce8d9c0 00000003ece8fe00 ffffffff969c9e93 00000003fffffffd 000000036ce8da10 00000000003bf134 00000003f3b07918 Krnl Code: 00000000001156a2: a7190000 lghi %r1,0 00000000001156a6: a7380015 lhi %r3,21 #00000000001156aa: e32050000008 ag %r2,0(%r5) >00000000001156b0: 482022b0 lh %r2,688(%r2) 00000000001156b4: ae123000 sigp %r1,%r2,0(%r3) 00000000001156b8: b2220020 ipm %r2 00000000001156bc: 8820001c srl %r2,28 00000000001156c0: c02700000001 xilf %r2,1 Call Trace: ([<0000000000000000>] 0x0) [<000003ff807bdb8e>] zfcp_fsf_fcp_cmnd_handler+0x3de/0x490 [zfcp] [<000003ff807be30a>] zfcp_fsf_req_complete+0x252/0x800 [zfcp] [<000003ff807c0a48>] zfcp_fsf_reqid_check+0xe8/0x190 [zfcp] [<000003ff807c194e>] zfcp_qdio_int_resp+0x66/0x188 [zfcp] [<000003ff80440c64>] qdio_kick_handler+0xdc/0x310 [qdio] [<000003ff804463d0>] __tiqdio_inbound_processing+0xf8/0xcd8 [qdio] [<0000000000141fd4>] tasklet_action+0x9c/0x170 [<0000000000141550>] __do_softirq+0xe8/0x258 [<000000000010ce0a>] do_softirq+0xba/0xc0 [<000000000014187c>] irq_exit+0xc4/0xe8 [<000000000046b526>] do_IRQ+0x146/0x1d8 [<00000000005d6a3c>] io_return+0x0/0x8 [<00000000005d6422>] vtime_stop_cpu+0x4a/0xa0 ([<0000000000000000>] 0x0) [<0000000000103d8a>] arch_cpu_idle+0xa2/0xb0 [<0000000000197f94>] cpu_startup_entry+0x13c/0x1f8 [<0000000000114782>] smp_start_secondary+0xda/0xe8 [<00000000005d6efe>] restart_int_handler+0x56/0x6c [<0000000000000000>] 0x0 Last Breaking-Event-Address: [<00000000003bf12e>] arch_spin_lock_wait+0x56/0xb0 Suggested-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Fixes: ea127f9754 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git) Cc: <stable@vger.kernel.org> #2.6.32+ Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-12-10 00:16:31 +08:00
struct zfcp_scsi_req_filter {
u8 tmf_scope;
u32 lun_handle;
u32 port_handle;
};
static void zfcp_scsi_forget_cmnd(struct zfcp_fsf_req *old_req, void *data)
{
struct zfcp_scsi_req_filter *filter =
(struct zfcp_scsi_req_filter *)data;
/* already aborted - prevent side-effects - or not a SCSI command */
if (old_req->data == NULL || old_req->fsf_command != FSF_QTCB_FCP_CMND)
return;
/* (tmf_scope == FCP_TMF_TGT_RESET || tmf_scope == FCP_TMF_LUN_RESET) */
if (old_req->qtcb->header.port_handle != filter->port_handle)
return;
if (filter->tmf_scope == FCP_TMF_LUN_RESET &&
old_req->qtcb->header.lun_handle != filter->lun_handle)
return;
zfcp_dbf_scsi_nullcmnd((struct scsi_cmnd *)old_req->data, old_req);
old_req->data = NULL;
}
static void zfcp_scsi_forget_cmnds(struct zfcp_scsi_dev *zsdev, u8 tm_flags)
{
struct zfcp_adapter *adapter = zsdev->port->adapter;
struct zfcp_scsi_req_filter filter = {
.tmf_scope = FCP_TMF_TGT_RESET,
.port_handle = zsdev->port->handle,
};
unsigned long flags;
if (tm_flags == FCP_TMF_LUN_RESET) {
filter.tmf_scope = FCP_TMF_LUN_RESET;
filter.lun_handle = zsdev->lun_handle;
}
/*
* abort_lock secures against other processings - in the abort-function
* and normal cmnd-handler - of (struct zfcp_fsf_req *)->data
*/
write_lock_irqsave(&adapter->abort_lock, flags);
zfcp_reqlist_apply_for_all(adapter->req_list, zfcp_scsi_forget_cmnd,
&filter);
write_unlock_irqrestore(&adapter->abort_lock, flags);
}
static int zfcp_task_mgmt_function(struct scsi_cmnd *scpnt, u8 tm_flags)
{
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(scpnt->device);
struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
struct zfcp_fsf_req *fsf_req = NULL;
int retval = SUCCESS, ret;
int retry = 3;
while (retry--) {
fsf_req = zfcp_fsf_fcp_task_mgmt(scpnt, tm_flags);
if (fsf_req)
break;
zfcp_erp_wait(adapter);
ret = fc_block_scsi_eh(scpnt);
if (ret) {
zfcp_dbf_scsi_devreset("fiof", scpnt, tm_flags, NULL);
return ret;
}
if (!(atomic_read(&adapter->status) &
ZFCP_STATUS_COMMON_RUNNING)) {
scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA Without this fix we get SCSI trace records on task management functions which cannot be correlated to HBA trace records because all fields related to the FSF request are empty (zero). Also, the FCP_RSP_IU is missing as well as any sense data if available. This was caused by v2.6.14 commit 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features") introducing trace records for TMFs but hard coding NULL for a possibly existing TMF FSF request. The scsi_cmnd scribble is also zero or unrelated for the TMF request so it also could not lookup a suitable FSF request from there. A broken example trace record formatted with zfcpdbf from the s390-tools package: Timestamp : ... Area : SCSI Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : lr_fail Request ID : 0x0000000000000000 ^^^^^^^^^^^^^^^^ no correlation to HBA record SCSI ID : 0x<scsitarget> SCSI LUN : 0x<scsilun> SCSI result : 0x000e0000 SCSI retries : 0x00 SCSI allowed : 0x05 SCSI scribble : 0x0000000000000000 SCSI opcode : 2a000017 3bb80000 08000000 00000000 FCP rsp inf cod: 0x00 ^^ no TMF response FCP rsp IU : 00000000 00000000 00000000 00000000 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 00000000 00000000 ^^^^^^^^^^^^^^^^^ no interesting FCP_RSP_IU Sense len : ... ^^^^^^^^^^^^^^^^^^^^ no sense data length Sense info : ... ^^^^^^^^^^^^^^^^^^^^ no sense data content, even if present There are some true cases where we really do not have an FSF request: "rsl_fai" from zfcp_dbf_scsi_fail_send() called for early returns / completions in zfcp_scsi_queuecommand(), "abrt_or", "abrt_bl", "abrt_ru", "abrt_ar" from zfcp_scsi_eh_abort_handler() where we did not get as far, "lr_nres", "tr_nres" from zfcp_task_mgmt_function() where we're successful and do not need to do anything because adapter stopped. For these cases it's correct to pass NULL for fsf_req to _zfcp_dbf_scsi(). Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-07-28 18:30:54 +08:00
zfcp_dbf_scsi_devreset("nres", scpnt, tm_flags, NULL);
return SUCCESS;
}
}
if (!fsf_req) {
zfcp_dbf_scsi_devreset("reqf", scpnt, tm_flags, NULL);
return FAILED;
}
wait_for_completion(&fsf_req->completion);
if (fsf_req->status & ZFCP_STATUS_FSFREQ_TMFUNCFAILED) {
scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA Without this fix we get SCSI trace records on task management functions which cannot be correlated to HBA trace records because all fields related to the FSF request are empty (zero). Also, the FCP_RSP_IU is missing as well as any sense data if available. This was caused by v2.6.14 commit 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features") introducing trace records for TMFs but hard coding NULL for a possibly existing TMF FSF request. The scsi_cmnd scribble is also zero or unrelated for the TMF request so it also could not lookup a suitable FSF request from there. A broken example trace record formatted with zfcpdbf from the s390-tools package: Timestamp : ... Area : SCSI Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : lr_fail Request ID : 0x0000000000000000 ^^^^^^^^^^^^^^^^ no correlation to HBA record SCSI ID : 0x<scsitarget> SCSI LUN : 0x<scsilun> SCSI result : 0x000e0000 SCSI retries : 0x00 SCSI allowed : 0x05 SCSI scribble : 0x0000000000000000 SCSI opcode : 2a000017 3bb80000 08000000 00000000 FCP rsp inf cod: 0x00 ^^ no TMF response FCP rsp IU : 00000000 00000000 00000000 00000000 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 00000000 00000000 ^^^^^^^^^^^^^^^^^ no interesting FCP_RSP_IU Sense len : ... ^^^^^^^^^^^^^^^^^^^^ no sense data length Sense info : ... ^^^^^^^^^^^^^^^^^^^^ no sense data content, even if present There are some true cases where we really do not have an FSF request: "rsl_fai" from zfcp_dbf_scsi_fail_send() called for early returns / completions in zfcp_scsi_queuecommand(), "abrt_or", "abrt_bl", "abrt_ru", "abrt_ar" from zfcp_scsi_eh_abort_handler() where we did not get as far, "lr_nres", "tr_nres" from zfcp_task_mgmt_function() where we're successful and do not need to do anything because adapter stopped. For these cases it's correct to pass NULL for fsf_req to _zfcp_dbf_scsi(). Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-07-28 18:30:54 +08:00
zfcp_dbf_scsi_devreset("fail", scpnt, tm_flags, fsf_req);
retval = FAILED;
scsi: zfcp: fix use-after-"free" in FC ingress path after TMF When SCSI EH invokes zFCP's callbacks for eh_device_reset_handler() and eh_target_reset_handler(), it expects us to relent the ownership over the given scsi_cmnd and all other scsi_cmnds within the same scope - LUN or target - when returning with SUCCESS from the callback ('release' them). SCSI EH can then reuse those commands. We did not follow this rule to release commands upon SUCCESS; and if later a reply arrived for one of those supposed to be released commands, we would still make use of the scsi_cmnd in our ingress tasklet. This will at least result in undefined behavior or a kernel panic because of a wrong kernel pointer dereference. To fix this, we NULLify all pointers to scsi_cmnds (struct zfcp_fsf_req *)->data in the matching scope if a TMF was successful. This is done under the locks (struct zfcp_adapter *)->abort_lock and (struct zfcp_reqlist *)->lock to prevent the requests from being removed from the request-hashtable, and the ingress tasklet from making use of the scsi_cmnd-pointer in zfcp_fsf_fcp_cmnd_handler(). For cases where a reply arrives during SCSI EH, but before we get a chance to NULLify the pointer - but before we return from the callback -, we assume that the code is protected from races via the CAS operation in blk_complete_request() that is called in scsi_done(). The following stacktrace shows an example for a crash resulting from the previous behavior: Unable to handle kernel pointer dereference at virtual kernel address fffffee17a672000 Oops: 0038 [#1] SMP CPU: 2 PID: 0 Comm: swapper/2 Not tainted task: 00000003f7ff5be0 ti: 00000003f3d38000 task.ti: 00000003f3d38000 Krnl PSW : 0404d00180000000 00000000001156b0 (smp_vcpu_scheduled+0x18/0x40) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3 Krnl GPRS: 000000200000007e 0000000000000000 fffffee17a671fd8 0000000300000015 ffffffff80000000 00000000005dfde8 07000003f7f80e00 000000004fa4e800 000000036ce8d8f8 000000036ce8d9c0 00000003ece8fe00 ffffffff969c9e93 00000003fffffffd 000000036ce8da10 00000000003bf134 00000003f3b07918 Krnl Code: 00000000001156a2: a7190000 lghi %r1,0 00000000001156a6: a7380015 lhi %r3,21 #00000000001156aa: e32050000008 ag %r2,0(%r5) >00000000001156b0: 482022b0 lh %r2,688(%r2) 00000000001156b4: ae123000 sigp %r1,%r2,0(%r3) 00000000001156b8: b2220020 ipm %r2 00000000001156bc: 8820001c srl %r2,28 00000000001156c0: c02700000001 xilf %r2,1 Call Trace: ([<0000000000000000>] 0x0) [<000003ff807bdb8e>] zfcp_fsf_fcp_cmnd_handler+0x3de/0x490 [zfcp] [<000003ff807be30a>] zfcp_fsf_req_complete+0x252/0x800 [zfcp] [<000003ff807c0a48>] zfcp_fsf_reqid_check+0xe8/0x190 [zfcp] [<000003ff807c194e>] zfcp_qdio_int_resp+0x66/0x188 [zfcp] [<000003ff80440c64>] qdio_kick_handler+0xdc/0x310 [qdio] [<000003ff804463d0>] __tiqdio_inbound_processing+0xf8/0xcd8 [qdio] [<0000000000141fd4>] tasklet_action+0x9c/0x170 [<0000000000141550>] __do_softirq+0xe8/0x258 [<000000000010ce0a>] do_softirq+0xba/0xc0 [<000000000014187c>] irq_exit+0xc4/0xe8 [<000000000046b526>] do_IRQ+0x146/0x1d8 [<00000000005d6a3c>] io_return+0x0/0x8 [<00000000005d6422>] vtime_stop_cpu+0x4a/0xa0 ([<0000000000000000>] 0x0) [<0000000000103d8a>] arch_cpu_idle+0xa2/0xb0 [<0000000000197f94>] cpu_startup_entry+0x13c/0x1f8 [<0000000000114782>] smp_start_secondary+0xda/0xe8 [<00000000005d6efe>] restart_int_handler+0x56/0x6c [<0000000000000000>] 0x0 Last Breaking-Event-Address: [<00000000003bf12e>] arch_spin_lock_wait+0x56/0xb0 Suggested-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Fixes: ea127f9754 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git) Cc: <stable@vger.kernel.org> #2.6.32+ Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-12-10 00:16:31 +08:00
} else {
scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA Without this fix we get SCSI trace records on task management functions which cannot be correlated to HBA trace records because all fields related to the FSF request are empty (zero). Also, the FCP_RSP_IU is missing as well as any sense data if available. This was caused by v2.6.14 commit 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features") introducing trace records for TMFs but hard coding NULL for a possibly existing TMF FSF request. The scsi_cmnd scribble is also zero or unrelated for the TMF request so it also could not lookup a suitable FSF request from there. A broken example trace record formatted with zfcpdbf from the s390-tools package: Timestamp : ... Area : SCSI Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : lr_fail Request ID : 0x0000000000000000 ^^^^^^^^^^^^^^^^ no correlation to HBA record SCSI ID : 0x<scsitarget> SCSI LUN : 0x<scsilun> SCSI result : 0x000e0000 SCSI retries : 0x00 SCSI allowed : 0x05 SCSI scribble : 0x0000000000000000 SCSI opcode : 2a000017 3bb80000 08000000 00000000 FCP rsp inf cod: 0x00 ^^ no TMF response FCP rsp IU : 00000000 00000000 00000000 00000000 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 00000000 00000000 ^^^^^^^^^^^^^^^^^ no interesting FCP_RSP_IU Sense len : ... ^^^^^^^^^^^^^^^^^^^^ no sense data length Sense info : ... ^^^^^^^^^^^^^^^^^^^^ no sense data content, even if present There are some true cases where we really do not have an FSF request: "rsl_fai" from zfcp_dbf_scsi_fail_send() called for early returns / completions in zfcp_scsi_queuecommand(), "abrt_or", "abrt_bl", "abrt_ru", "abrt_ar" from zfcp_scsi_eh_abort_handler() where we did not get as far, "lr_nres", "tr_nres" from zfcp_task_mgmt_function() where we're successful and do not need to do anything because adapter stopped. For these cases it's correct to pass NULL for fsf_req to _zfcp_dbf_scsi(). Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-07-28 18:30:54 +08:00
zfcp_dbf_scsi_devreset("okay", scpnt, tm_flags, fsf_req);
scsi: zfcp: fix use-after-"free" in FC ingress path after TMF When SCSI EH invokes zFCP's callbacks for eh_device_reset_handler() and eh_target_reset_handler(), it expects us to relent the ownership over the given scsi_cmnd and all other scsi_cmnds within the same scope - LUN or target - when returning with SUCCESS from the callback ('release' them). SCSI EH can then reuse those commands. We did not follow this rule to release commands upon SUCCESS; and if later a reply arrived for one of those supposed to be released commands, we would still make use of the scsi_cmnd in our ingress tasklet. This will at least result in undefined behavior or a kernel panic because of a wrong kernel pointer dereference. To fix this, we NULLify all pointers to scsi_cmnds (struct zfcp_fsf_req *)->data in the matching scope if a TMF was successful. This is done under the locks (struct zfcp_adapter *)->abort_lock and (struct zfcp_reqlist *)->lock to prevent the requests from being removed from the request-hashtable, and the ingress tasklet from making use of the scsi_cmnd-pointer in zfcp_fsf_fcp_cmnd_handler(). For cases where a reply arrives during SCSI EH, but before we get a chance to NULLify the pointer - but before we return from the callback -, we assume that the code is protected from races via the CAS operation in blk_complete_request() that is called in scsi_done(). The following stacktrace shows an example for a crash resulting from the previous behavior: Unable to handle kernel pointer dereference at virtual kernel address fffffee17a672000 Oops: 0038 [#1] SMP CPU: 2 PID: 0 Comm: swapper/2 Not tainted task: 00000003f7ff5be0 ti: 00000003f3d38000 task.ti: 00000003f3d38000 Krnl PSW : 0404d00180000000 00000000001156b0 (smp_vcpu_scheduled+0x18/0x40) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3 Krnl GPRS: 000000200000007e 0000000000000000 fffffee17a671fd8 0000000300000015 ffffffff80000000 00000000005dfde8 07000003f7f80e00 000000004fa4e800 000000036ce8d8f8 000000036ce8d9c0 00000003ece8fe00 ffffffff969c9e93 00000003fffffffd 000000036ce8da10 00000000003bf134 00000003f3b07918 Krnl Code: 00000000001156a2: a7190000 lghi %r1,0 00000000001156a6: a7380015 lhi %r3,21 #00000000001156aa: e32050000008 ag %r2,0(%r5) >00000000001156b0: 482022b0 lh %r2,688(%r2) 00000000001156b4: ae123000 sigp %r1,%r2,0(%r3) 00000000001156b8: b2220020 ipm %r2 00000000001156bc: 8820001c srl %r2,28 00000000001156c0: c02700000001 xilf %r2,1 Call Trace: ([<0000000000000000>] 0x0) [<000003ff807bdb8e>] zfcp_fsf_fcp_cmnd_handler+0x3de/0x490 [zfcp] [<000003ff807be30a>] zfcp_fsf_req_complete+0x252/0x800 [zfcp] [<000003ff807c0a48>] zfcp_fsf_reqid_check+0xe8/0x190 [zfcp] [<000003ff807c194e>] zfcp_qdio_int_resp+0x66/0x188 [zfcp] [<000003ff80440c64>] qdio_kick_handler+0xdc/0x310 [qdio] [<000003ff804463d0>] __tiqdio_inbound_processing+0xf8/0xcd8 [qdio] [<0000000000141fd4>] tasklet_action+0x9c/0x170 [<0000000000141550>] __do_softirq+0xe8/0x258 [<000000000010ce0a>] do_softirq+0xba/0xc0 [<000000000014187c>] irq_exit+0xc4/0xe8 [<000000000046b526>] do_IRQ+0x146/0x1d8 [<00000000005d6a3c>] io_return+0x0/0x8 [<00000000005d6422>] vtime_stop_cpu+0x4a/0xa0 ([<0000000000000000>] 0x0) [<0000000000103d8a>] arch_cpu_idle+0xa2/0xb0 [<0000000000197f94>] cpu_startup_entry+0x13c/0x1f8 [<0000000000114782>] smp_start_secondary+0xda/0xe8 [<00000000005d6efe>] restart_int_handler+0x56/0x6c [<0000000000000000>] 0x0 Last Breaking-Event-Address: [<00000000003bf12e>] arch_spin_lock_wait+0x56/0xb0 Suggested-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Fixes: ea127f9754 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git) Cc: <stable@vger.kernel.org> #2.6.32+ Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-12-10 00:16:31 +08:00
zfcp_scsi_forget_cmnds(zfcp_sdev, tm_flags);
}
zfcp_fsf_req_free(fsf_req);
return retval;
}
static int zfcp_scsi_eh_device_reset_handler(struct scsi_cmnd *scpnt)
{
return zfcp_task_mgmt_function(scpnt, FCP_TMF_LUN_RESET);
}
static int zfcp_scsi_eh_target_reset_handler(struct scsi_cmnd *scpnt)
{
return zfcp_task_mgmt_function(scpnt, FCP_TMF_TGT_RESET);
}
static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
{
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(scpnt->device);
struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
int ret;
zfcp_erp_adapter_reopen(adapter, 0, "schrh_1");
zfcp_erp_wait(adapter);
ret = fc_block_scsi_eh(scpnt);
if (ret)
return ret;
return SUCCESS;
}
struct scsi_transport_template *zfcp_scsi_transport_template;
static struct scsi_host_template zfcp_scsi_host_template = {
.module = THIS_MODULE,
.name = "zfcp",
.queuecommand = zfcp_scsi_queuecommand,
.eh_timed_out = fc_eh_timed_out,
.eh_abort_handler = zfcp_scsi_eh_abort_handler,
.eh_device_reset_handler = zfcp_scsi_eh_device_reset_handler,
.eh_target_reset_handler = zfcp_scsi_eh_target_reset_handler,
.eh_host_reset_handler = zfcp_scsi_eh_host_reset_handler,
.slave_alloc = zfcp_scsi_slave_alloc,
.slave_configure = zfcp_scsi_slave_configure,
.slave_destroy = zfcp_scsi_slave_destroy,
.change_queue_depth = scsi_change_queue_depth,
.proc_name = "zfcp",
.can_queue = 4096,
.this_id = -1,
[SCSI] zfcp: block queue limits with data router Commit 86a9668a8d29ea711613e1cb37efa68e7c4db564 "[SCSI] zfcp: support for hardware data router" reduced the initial block queue limits in the scsi_host_template to the absolute minimum and adjusted them later on. However, the adjustment was too late for the BSG devices of Scsi_Host and fc_host. Therefore, ioctl(..., SG_IO, ...) with request or response size > 4kB to a BSG device of an fc_host or a Scsi_Host fails with EINVAL. As a result, users of such ioctl such as HBA_SendCTPassThru() in libzfcphbaapi return with error HBA_STATUS_ERROR. Initialize the block queue limits in zfcp_scsi_host_template to the greatest common denominator (GCD). While we cannot exploit the slightly enlarged maximum request size with data router, this should be neglectible. Doing so also avoids running into trouble after live guest relocation (LGR) / migration from a data router FCP device to an FCP device that does not support data router. In that case, zfcp would figure out the new limits on adapter recovery, but the fc_host and Scsi_Host (plus in fact all sdevs) still exist with the old and now too large queue limits. It should also OK, not to use half the size as in the DIX case, because fc_host and Scsi_Host do not transport FCP requests including SCSI commands using protection data. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Martin Peschke <mpeschke@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> #3.2+ Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-04-26 23:33:45 +08:00
.sg_tablesize = (((QDIO_MAX_ELEMENTS_PER_BUFFER - 1)
* ZFCP_QDIO_MAX_SBALS_PER_REQ) - 2),
/* GCD, adjusted later */
.max_sectors = (((QDIO_MAX_ELEMENTS_PER_BUFFER - 1)
* ZFCP_QDIO_MAX_SBALS_PER_REQ) - 2) * 8,
/* GCD, adjusted later */
.dma_boundary = ZFCP_QDIO_SBALE_LEN - 1,
.use_clustering = 1,
.shost_attrs = zfcp_sysfs_shost_attrs,
.sdev_attrs = zfcp_sysfs_sdev_attrs,
.track_queue_depth = 1,
};
/**
* zfcp_scsi_adapter_register - Register SCSI and FC host with SCSI midlayer
* @adapter: The zfcp adapter to register with the SCSI midlayer
*/
int zfcp_scsi_adapter_register(struct zfcp_adapter *adapter)
{
struct ccw_dev_id dev_id;
if (adapter->scsi_host)
return 0;
ccw_device_get_id(adapter->ccw_device, &dev_id);
/* register adapter as SCSI host with mid layer of SCSI stack */
adapter->scsi_host = scsi_host_alloc(&zfcp_scsi_host_template,
sizeof (struct zfcp_adapter *));
if (!adapter->scsi_host) {
dev_err(&adapter->ccw_device->dev,
"Registering the FCP device with the "
"SCSI stack failed\n");
return -EIO;
}
/* tell the SCSI stack some characteristics of this adapter */
adapter->scsi_host->max_id = 511;
adapter->scsi_host->max_lun = 0xFFFFFFFF;
adapter->scsi_host->max_channel = 0;
adapter->scsi_host->unique_id = dev_id.devno;
adapter->scsi_host->max_cmd_len = 16; /* in struct fcp_cmnd */
adapter->scsi_host->transportt = zfcp_scsi_transport_template;
adapter->scsi_host->hostdata[0] = (unsigned long) adapter;
if (scsi_add_host(adapter->scsi_host, &adapter->ccw_device->dev)) {
scsi_host_put(adapter->scsi_host);
return -EIO;
}
return 0;
}
/**
* zfcp_scsi_adapter_unregister - Unregister SCSI and FC host from SCSI midlayer
* @adapter: The zfcp adapter to unregister.
*/
void zfcp_scsi_adapter_unregister(struct zfcp_adapter *adapter)
{
struct Scsi_Host *shost;
struct zfcp_port *port;
shost = adapter->scsi_host;
if (!shost)
return;
read_lock_irq(&adapter->port_list_lock);
list_for_each_entry(port, &adapter->port_list, list)
port->rport = NULL;
read_unlock_irq(&adapter->port_list_lock);
fc_remove_host(shost);
scsi_remove_host(shost);
scsi_host_put(shost);
adapter->scsi_host = NULL;
}
static struct fc_host_statistics*
zfcp_init_fc_host_stats(struct zfcp_adapter *adapter)
{
struct fc_host_statistics *fc_stats;
if (!adapter->fc_stats) {
fc_stats = kmalloc(sizeof(*fc_stats), GFP_KERNEL);
if (!fc_stats)
return NULL;
adapter->fc_stats = fc_stats; /* freed in adapter_release */
}
memset(adapter->fc_stats, 0, sizeof(*adapter->fc_stats));
return adapter->fc_stats;
}
static void zfcp_adjust_fc_host_stats(struct fc_host_statistics *fc_stats,
struct fsf_qtcb_bottom_port *data,
struct fsf_qtcb_bottom_port *old)
{
fc_stats->seconds_since_last_reset =
data->seconds_since_last_reset - old->seconds_since_last_reset;
fc_stats->tx_frames = data->tx_frames - old->tx_frames;
fc_stats->tx_words = data->tx_words - old->tx_words;
fc_stats->rx_frames = data->rx_frames - old->rx_frames;
fc_stats->rx_words = data->rx_words - old->rx_words;
fc_stats->lip_count = data->lip - old->lip;
fc_stats->nos_count = data->nos - old->nos;
fc_stats->error_frames = data->error_frames - old->error_frames;
fc_stats->dumped_frames = data->dumped_frames - old->dumped_frames;
fc_stats->link_failure_count = data->link_failure - old->link_failure;
fc_stats->loss_of_sync_count = data->loss_of_sync - old->loss_of_sync;
fc_stats->loss_of_signal_count =
data->loss_of_signal - old->loss_of_signal;
fc_stats->prim_seq_protocol_err_count =
data->psp_error_counts - old->psp_error_counts;
fc_stats->invalid_tx_word_count =
data->invalid_tx_words - old->invalid_tx_words;
fc_stats->invalid_crc_count = data->invalid_crcs - old->invalid_crcs;
fc_stats->fcp_input_requests =
data->input_requests - old->input_requests;
fc_stats->fcp_output_requests =
data->output_requests - old->output_requests;
fc_stats->fcp_control_requests =
data->control_requests - old->control_requests;
fc_stats->fcp_input_megabytes = data->input_mb - old->input_mb;
fc_stats->fcp_output_megabytes = data->output_mb - old->output_mb;
}
static void zfcp_set_fc_host_stats(struct fc_host_statistics *fc_stats,
struct fsf_qtcb_bottom_port *data)
{
fc_stats->seconds_since_last_reset = data->seconds_since_last_reset;
fc_stats->tx_frames = data->tx_frames;
fc_stats->tx_words = data->tx_words;
fc_stats->rx_frames = data->rx_frames;
fc_stats->rx_words = data->rx_words;
fc_stats->lip_count = data->lip;
fc_stats->nos_count = data->nos;
fc_stats->error_frames = data->error_frames;
fc_stats->dumped_frames = data->dumped_frames;
fc_stats->link_failure_count = data->link_failure;
fc_stats->loss_of_sync_count = data->loss_of_sync;
fc_stats->loss_of_signal_count = data->loss_of_signal;
fc_stats->prim_seq_protocol_err_count = data->psp_error_counts;
fc_stats->invalid_tx_word_count = data->invalid_tx_words;
fc_stats->invalid_crc_count = data->invalid_crcs;
fc_stats->fcp_input_requests = data->input_requests;
fc_stats->fcp_output_requests = data->output_requests;
fc_stats->fcp_control_requests = data->control_requests;
fc_stats->fcp_input_megabytes = data->input_mb;
fc_stats->fcp_output_megabytes = data->output_mb;
}
static struct fc_host_statistics *zfcp_get_fc_host_stats(struct Scsi_Host *host)
{
struct zfcp_adapter *adapter;
struct fc_host_statistics *fc_stats;
struct fsf_qtcb_bottom_port *data;
int ret;
adapter = (struct zfcp_adapter *)host->hostdata[0];
fc_stats = zfcp_init_fc_host_stats(adapter);
if (!fc_stats)
return NULL;
data = kzalloc(sizeof(*data), GFP_KERNEL);
if (!data)
return NULL;
ret = zfcp_fsf_exchange_port_data_sync(adapter->qdio, data);
if (ret) {
kfree(data);
return NULL;
}
if (adapter->stats_reset &&
((jiffies/HZ - adapter->stats_reset) <
data->seconds_since_last_reset))
zfcp_adjust_fc_host_stats(fc_stats, data,
adapter->stats_reset_data);
else
zfcp_set_fc_host_stats(fc_stats, data);
kfree(data);
return fc_stats;
}
static void zfcp_reset_fc_host_stats(struct Scsi_Host *shost)
{
struct zfcp_adapter *adapter;
struct fsf_qtcb_bottom_port *data;
int ret;
adapter = (struct zfcp_adapter *)shost->hostdata[0];
data = kzalloc(sizeof(*data), GFP_KERNEL);
if (!data)
return;
ret = zfcp_fsf_exchange_port_data_sync(adapter->qdio, data);
if (ret)
kfree(data);
else {
adapter->stats_reset = jiffies/HZ;
kfree(adapter->stats_reset_data);
adapter->stats_reset_data = data; /* finally freed in
adapter_release */
}
}
static void zfcp_get_host_port_state(struct Scsi_Host *shost)
{
struct zfcp_adapter *adapter =
(struct zfcp_adapter *)shost->hostdata[0];
int status = atomic_read(&adapter->status);
if ((status & ZFCP_STATUS_COMMON_RUNNING) &&
!(status & ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED))
fc_host_port_state(shost) = FC_PORTSTATE_ONLINE;
else if (status & ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED)
fc_host_port_state(shost) = FC_PORTSTATE_LINKDOWN;
else if (status & ZFCP_STATUS_COMMON_ERP_FAILED)
fc_host_port_state(shost) = FC_PORTSTATE_ERROR;
else
fc_host_port_state(shost) = FC_PORTSTATE_UNKNOWN;
}
static void zfcp_set_rport_dev_loss_tmo(struct fc_rport *rport, u32 timeout)
{
rport->dev_loss_tmo = timeout;
}
/**
* zfcp_scsi_terminate_rport_io - Terminate all I/O on a rport
* @rport: The FC rport where to teminate I/O
*
* Abort all pending SCSI commands for a port by closing the
* port. Using a reopen avoids a conflict with a shutdown
* overwriting a reopen. The "forced" ensures that a disappeared port
* is not opened again as valid due to the cached plogi data in
* non-NPIV mode.
*/
static void zfcp_scsi_terminate_rport_io(struct fc_rport *rport)
{
struct zfcp_port *port;
struct Scsi_Host *shost = rport_to_shost(rport);
struct zfcp_adapter *adapter =
(struct zfcp_adapter *)shost->hostdata[0];
port = zfcp_get_port_by_wwpn(adapter, rport->port_name);
if (port) {
zfcp_erp_port_forced_reopen(port, 0, "sctrpi1");
put_device(&port->dev);
}
}
static void zfcp_scsi_rport_register(struct zfcp_port *port)
{
struct fc_rport_identifiers ids;
struct fc_rport *rport;
if (port->rport)
return;
ids.node_name = port->wwnn;
ids.port_name = port->wwpn;
ids.port_id = port->d_id;
ids.roles = FC_RPORT_ROLE_FCP_TARGET;
zfcp: close window with unblocked rport during rport gone On a successful end of reopen port forced, zfcp_erp_strategy_followup_success() re-uses the port erp_action and the subsequent zfcp_erp_action_cleanup() now sees ZFCP_ERP_SUCCEEDED with erp_action->action==ZFCP_ERP_ACTION_REOPEN_PORT instead of ZFCP_ERP_ACTION_REOPEN_PORT_FORCED but must not perform zfcp_scsi_schedule_rport_register(). We can detect this because the fresh port reopen erp_action is in its very first step ZFCP_ERP_STEP_UNINITIALIZED. Otherwise this opens a time window with unblocked rport (until the followup port reopen recovery would block it again). If a scsi_cmnd timeout occurs during this time window fc_timed_out() cannot work as desired and such command would indeed time out and trigger scsi_eh. This prevents a clean and timely path failover. This should not happen if the path issue can be recovered on FC transport layer such as path issues involving RSCNs. Also, unnecessary and repeated DID_IMM_RETRY for pending and undesired new requests occur because internally zfcp still has its zfcp_port blocked. As follow-on errors with scsi_eh, it can cause, in the worst case, permanently lost paths due to one of: sd <scsidev>: [<scsidisk>] Medium access timeout failure. Offlining disk! sd <scsidev>: Device offlined - not ready after error recovery For fix validation and to aid future debugging with other recoveries we now also trace (un)blocking of rports. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 5767620c383a ("[SCSI] zfcp: Do not unblock rport from REOPEN_PORT_FORCED") Fixes: a2fa0aede07c ("[SCSI] zfcp: Block FC transport rports early on errors") Fixes: 5f852be9e11d ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI") Fixes: 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable") Fixes: 3859f6a248cb ("[PATCH] zfcp: add rports to enable scsi_add_device to work again") Cc: <stable@vger.kernel.org> #2.6.32+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-08-11 00:30:46 +08:00
zfcp_dbf_rec_trig("scpaddy", port->adapter, port, NULL,
ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD,
ZFCP_PSEUDO_ERP_ACTION_RPORT_ADD);
rport = fc_remote_port_add(port->adapter->scsi_host, 0, &ids);
if (!rport) {
dev_err(&port->adapter->ccw_device->dev,
"Registering port 0x%016Lx failed\n",
(unsigned long long)port->wwpn);
return;
}
rport->maxframe_size = port->maxframe_size;
rport->supported_classes = port->supported_classes;
port->rport = rport;
port->starget_id = rport->scsi_target_id;
zfcp_unit_queue_scsi_scan(port);
}
static void zfcp_scsi_rport_block(struct zfcp_port *port)
{
struct fc_rport *rport = port->rport;
if (rport) {
zfcp: close window with unblocked rport during rport gone On a successful end of reopen port forced, zfcp_erp_strategy_followup_success() re-uses the port erp_action and the subsequent zfcp_erp_action_cleanup() now sees ZFCP_ERP_SUCCEEDED with erp_action->action==ZFCP_ERP_ACTION_REOPEN_PORT instead of ZFCP_ERP_ACTION_REOPEN_PORT_FORCED but must not perform zfcp_scsi_schedule_rport_register(). We can detect this because the fresh port reopen erp_action is in its very first step ZFCP_ERP_STEP_UNINITIALIZED. Otherwise this opens a time window with unblocked rport (until the followup port reopen recovery would block it again). If a scsi_cmnd timeout occurs during this time window fc_timed_out() cannot work as desired and such command would indeed time out and trigger scsi_eh. This prevents a clean and timely path failover. This should not happen if the path issue can be recovered on FC transport layer such as path issues involving RSCNs. Also, unnecessary and repeated DID_IMM_RETRY for pending and undesired new requests occur because internally zfcp still has its zfcp_port blocked. As follow-on errors with scsi_eh, it can cause, in the worst case, permanently lost paths due to one of: sd <scsidev>: [<scsidisk>] Medium access timeout failure. Offlining disk! sd <scsidev>: Device offlined - not ready after error recovery For fix validation and to aid future debugging with other recoveries we now also trace (un)blocking of rports. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 5767620c383a ("[SCSI] zfcp: Do not unblock rport from REOPEN_PORT_FORCED") Fixes: a2fa0aede07c ("[SCSI] zfcp: Block FC transport rports early on errors") Fixes: 5f852be9e11d ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI") Fixes: 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable") Fixes: 3859f6a248cb ("[PATCH] zfcp: add rports to enable scsi_add_device to work again") Cc: <stable@vger.kernel.org> #2.6.32+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-08-11 00:30:46 +08:00
zfcp_dbf_rec_trig("scpdely", port->adapter, port, NULL,
ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL,
ZFCP_PSEUDO_ERP_ACTION_RPORT_DEL);
fc_remote_port_delete(rport);
port->rport = NULL;
}
}
void zfcp_scsi_schedule_rport_register(struct zfcp_port *port)
{
get_device(&port->dev);
port->rport_task = RPORT_ADD;
if (!queue_work(port->adapter->work_queue, &port->rport_work))
put_device(&port->dev);
}
void zfcp_scsi_schedule_rport_block(struct zfcp_port *port)
{
get_device(&port->dev);
port->rport_task = RPORT_DEL;
if (port->rport && queue_work(port->adapter->work_queue,
&port->rport_work))
return;
put_device(&port->dev);
}
void zfcp_scsi_schedule_rports_block(struct zfcp_adapter *adapter)
{
unsigned long flags;
struct zfcp_port *port;
read_lock_irqsave(&adapter->port_list_lock, flags);
list_for_each_entry(port, &adapter->port_list, list)
zfcp_scsi_schedule_rport_block(port);
read_unlock_irqrestore(&adapter->port_list_lock, flags);
}
void zfcp_scsi_rport_work(struct work_struct *work)
{
struct zfcp_port *port = container_of(work, struct zfcp_port,
rport_work);
while (port->rport_task) {
if (port->rport_task == RPORT_ADD) {
port->rport_task = RPORT_NONE;
zfcp_scsi_rport_register(port);
} else {
port->rport_task = RPORT_NONE;
zfcp_scsi_rport_block(port);
}
}
put_device(&port->dev);
}
/**
* zfcp_scsi_set_prot - Configure DIF/DIX support in scsi_host
* @adapter: The adapter where to configure DIF/DIX for the SCSI host
*/
void zfcp_scsi_set_prot(struct zfcp_adapter *adapter)
{
unsigned int mask = 0;
unsigned int data_div;
struct Scsi_Host *shost = adapter->scsi_host;
data_div = atomic_read(&adapter->status) &
ZFCP_STATUS_ADAPTER_DATA_DIV_ENABLED;
if (enable_dif &&
adapter->adapter_features & FSF_FEATURE_DIF_PROT_TYPE1)
mask |= SHOST_DIF_TYPE1_PROTECTION;
if (enable_dif && data_div &&
adapter->adapter_features & FSF_FEATURE_DIX_PROT_TCPIP) {
mask |= SHOST_DIX_TYPE1_PROTECTION;
scsi_host_set_guard(shost, SHOST_DIX_GUARD_IP);
shost->sg_prot_tablesize = adapter->qdio->max_sbale_per_req / 2;
shost->sg_tablesize = adapter->qdio->max_sbale_per_req / 2;
shost->max_sectors = shost->sg_tablesize * 8;
}
scsi_host_set_prot(shost, mask);
}
/**
* zfcp_scsi_dif_sense_error - Report DIF/DIX error as driver sense error
* @scmd: The SCSI command to report the error for
* @ascq: The ASCQ to put in the sense buffer
*
* See the error handling in sd_done for the sense codes used here.
* Set DID_SOFT_ERROR to retry the request, if possible.
*/
void zfcp_scsi_dif_sense_error(struct scsi_cmnd *scmd, int ascq)
{
scsi_build_sense_buffer(1, scmd->sense_buffer,
ILLEGAL_REQUEST, 0x10, ascq);
set_driver_byte(scmd, DRIVER_SENSE);
scmd->result |= SAM_STAT_CHECK_CONDITION;
set_host_byte(scmd, DID_SOFT_ERROR);
}
struct fc_function_template zfcp_transport_functions = {
.show_starget_port_id = 1,
.show_starget_port_name = 1,
.show_starget_node_name = 1,
.show_rport_supported_classes = 1,
.show_rport_maxframe_size = 1,
.show_rport_dev_loss_tmo = 1,
.show_host_node_name = 1,
.show_host_port_name = 1,
.show_host_permanent_port_name = 1,
.show_host_supported_classes = 1,
.show_host_supported_fc4s = 1,
.show_host_supported_speeds = 1,
.show_host_maxframe_size = 1,
.show_host_serial_number = 1,
.get_fc_host_stats = zfcp_get_fc_host_stats,
.reset_fc_host_stats = zfcp_reset_fc_host_stats,
.set_rport_dev_loss_tmo = zfcp_set_rport_dev_loss_tmo,
.get_host_port_state = zfcp_get_host_port_state,
.terminate_rport_io = zfcp_scsi_terminate_rport_io,
.show_host_port_state = 1,
.show_host_active_fc4s = 1,
.bsg_request = zfcp_fc_exec_bsg_job,
.bsg_timeout = zfcp_fc_timeout_bsg_job,
/* no functions registered for following dynamic attributes but
directly set by LLDD */
.show_host_port_type = 1,
.show_host_symbolic_name = 1,
.show_host_speed = 1,
.show_host_port_id = 1,
.dd_bsg_size = sizeof(struct zfcp_fsf_ct_els),
};