2007-09-07 15:15:31 +08:00
|
|
|
/*
|
2008-06-11 00:20:58 +08:00
|
|
|
* zfcp device driver
|
2005-04-17 06:20:36 +08:00
|
|
|
*
|
2008-06-11 00:20:58 +08:00
|
|
|
* Error Recovery Procedures (ERP).
|
2007-09-07 15:15:31 +08:00
|
|
|
*
|
scsi: zfcp: fix rport unblock race with LUN recovery
It is unavoidable that zfcp_scsi_queuecommand() has to finish requests
with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time
window when zfcp detected an unavailable rport but
fc_remote_port_delete(), which is asynchronous via
zfcp_scsi_schedule_rport_block(), has not yet blocked the rport.
However, for the case when the rport becomes available again, we should
prevent unblocking the rport too early. In contrast to other FCP LLDDs,
zfcp has to open each LUN with the FCP channel hardware before it can
send I/O to a LUN. So if a port already has LUNs attached and we
unblock the rport just after port recovery, recoveries of LUNs behind
this port can still be pending which in turn force
zfcp_scsi_queuecommand() to unnecessarily finish requests with
DID_IMM_RETRY.
This also opens a time window with unblocked rport (until the followup
LUN reopen recovery has finished). If a scsi_cmnd timeout occurs during
this time window fc_timed_out() cannot work as desired and such command
would indeed time out and trigger scsi_eh. This prevents a clean and
timely path failover. This should not happen if the path issue can be
recovered on FC transport layer such as path issues involving RSCNs.
Fix this by only calling zfcp_scsi_schedule_rport_register(), to
asynchronously trigger fc_remote_port_add(), after all LUN recoveries as
children of the rport have finished and no new recoveries of equal or
higher order were triggered meanwhile. Finished intentionally includes
any recovery result no matter if successful or failed (still unblock
rport so other successful LUNs work). For simplicity, we check after
each finished LUN recovery if there is another LUN recovery pending on
the same port and then do nothing. We handle the special case of a
successful recovery of a port without LUN children the same way without
changing this case's semantics.
For debugging we introduce 2 new trace records written if the rport
unblock attempt was aborted due to still unfinished or freshly triggered
recovery. The records are only written above the default trace level.
Benjamin noticed the important special case of new recovery that can be
triggered between having given up the erp_lock and before calling
zfcp_erp_action_cleanup() within zfcp_erp_strategy(). We must avoid the
following sequence:
ERP thread rport_work other context
------------------------- -------------- --------------------------------
port is unblocked, rport still blocked,
due to pending/running ERP action,
so ((port->status & ...UNBLOCK) != 0)
and (port->rport == NULL)
unlock ERP
zfcp_erp_action_cleanup()
case ZFCP_ERP_ACTION_REOPEN_LUN:
zfcp_erp_try_rport_unblock()
((status & ...UNBLOCK) != 0) [OLD!]
zfcp_erp_port_reopen()
lock ERP
zfcp_erp_port_block()
port->status clear ...UNBLOCK
unlock ERP
zfcp_scsi_schedule_rport_block()
port->rport_task = RPORT_DEL
queue_work(rport_work)
zfcp_scsi_rport_work()
(port->rport_task != RPORT_ADD)
port->rport_task = RPORT_NONE
zfcp_scsi_rport_block()
if (!port->rport) return
zfcp_scsi_schedule_rport_register()
port->rport_task = RPORT_ADD
queue_work(rport_work)
zfcp_scsi_rport_work()
(port->rport_task == RPORT_ADD)
port->rport_task = RPORT_NONE
zfcp_scsi_rport_register()
(port->rport == NULL)
rport = fc_remote_port_add()
port->rport = rport;
Now the rport was erroneously unblocked while the zfcp_port is blocked.
This is another situation we want to avoid due to scsi_eh
potential. This state would at least remain until the new recovery from
the other context finished successfully, or potentially forever if it
failed. In order to close this race, we take the erp_lock inside
zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or
LUN. With that, the possible corresponding rport state sequences would
be: (unblock[ERP thread],block[other context]) if the ERP thread gets
erp_lock first and still sees ((port->status & ...UNBLOCK) != 0),
(block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock
after the other context has already cleard ...UNBLOCK from port->status.
Since checking fields of struct erp_action is unsafe because they could
have been overwritten (re-used for new recovery) meanwhile, we only
check status of zfcp_port and LUN since these are only changed under
erp_lock elsewhere. Regarding the check of the proper status flags (port
or port_forced are similar to the shown adapter recovery):
[zfcp_erp_adapter_shutdown()]
zfcp_erp_adapter_reopen()
zfcp_erp_adapter_block()
* clear UNBLOCK ---------------------------------------+
zfcp_scsi_schedule_rports_block() |
write_lock_irqsave(&adapter->erp_lock, flags);-------+ |
zfcp_erp_action_enqueue() | |
zfcp_erp_setup_act() | |
* set ERP_INUSE -----------------------------------|--|--+
write_unlock_irqrestore(&adapter->erp_lock, flags);--+ | |
.context-switch. | |
zfcp_erp_thread() | |
zfcp_erp_strategy() | |
write_lock_irqsave(&adapter->erp_lock, flags);------+ | |
... | | |
zfcp_erp_strategy_check_target() | | |
zfcp_erp_strategy_check_adapter() | | |
zfcp_erp_adapter_unblock() | | |
* set UNBLOCK -----------------------------------|--+ |
zfcp_erp_action_dequeue() | |
* clear ERP_INUSE ---------------------------------|-----+
... |
write_unlock_irqrestore(&adapter->erp_lock, flags);-+
Hence, we should check for both UNBLOCK and ERP_INUSE because they are
interleaved. Also we need to explicitly check ERP_FAILED for the link
down case which currently does not clear the UNBLOCK flag in
zfcp_fsf_link_down_info_eval().
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 8830271c4819 ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport")
Fixes: a2fa0aede07c ("[SCSI] zfcp: Block FC transport rports early on errors")
Fixes: 5f852be9e11d ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI")
Fixes: 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable")
Fixes: 3859f6a248cb ("[PATCH] zfcp: add rports to enable scsi_add_device to work again")
Cc: <stable@vger.kernel.org> #2.6.32+
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-12-10 00:16:33 +08:00
|
|
|
* Copyright IBM Corp. 2002, 2016
|
2005-04-17 06:20:36 +08:00
|
|
|
*/
|
|
|
|
|
2008-12-25 20:39:53 +08:00
|
|
|
#define KMSG_COMPONENT "zfcp"
|
|
|
|
#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
|
|
|
|
|
2009-08-18 21:43:25 +08:00
|
|
|
#include <linux/kthread.h>
|
2005-04-17 06:20:36 +08:00
|
|
|
#include "zfcp_ext.h"
|
2010-02-17 18:18:50 +08:00
|
|
|
#include "zfcp_reqlist.h"
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
#define ZFCP_MAX_ERPS 3
|
|
|
|
|
|
|
|
enum zfcp_erp_act_flags {
|
|
|
|
ZFCP_STATUS_ERP_TIMEDOUT = 0x10000000,
|
|
|
|
ZFCP_STATUS_ERP_CLOSE_ONLY = 0x01000000,
|
|
|
|
ZFCP_STATUS_ERP_DISMISSING = 0x00100000,
|
|
|
|
ZFCP_STATUS_ERP_DISMISSED = 0x00200000,
|
|
|
|
ZFCP_STATUS_ERP_LOWMEM = 0x00400000,
|
2010-09-08 20:39:54 +08:00
|
|
|
ZFCP_STATUS_ERP_NO_REF = 0x00800000,
|
2008-07-02 16:56:40 +08:00
|
|
|
};
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
enum zfcp_erp_steps {
|
|
|
|
ZFCP_ERP_STEP_UNINITIALIZED = 0x0000,
|
|
|
|
ZFCP_ERP_STEP_FSF_XCONFIG = 0x0001,
|
|
|
|
ZFCP_ERP_STEP_PHYS_PORT_CLOSING = 0x0010,
|
|
|
|
ZFCP_ERP_STEP_PORT_CLOSING = 0x0100,
|
|
|
|
ZFCP_ERP_STEP_PORT_OPENING = 0x0800,
|
2010-09-08 20:39:55 +08:00
|
|
|
ZFCP_ERP_STEP_LUN_CLOSING = 0x1000,
|
|
|
|
ZFCP_ERP_STEP_LUN_OPENING = 0x2000,
|
2008-07-02 16:56:40 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
enum zfcp_erp_act_type {
|
2010-09-08 20:39:55 +08:00
|
|
|
ZFCP_ERP_ACTION_REOPEN_LUN = 1,
|
2008-07-02 16:56:40 +08:00
|
|
|
ZFCP_ERP_ACTION_REOPEN_PORT = 2,
|
|
|
|
ZFCP_ERP_ACTION_REOPEN_PORT_FORCED = 3,
|
|
|
|
ZFCP_ERP_ACTION_REOPEN_ADAPTER = 4,
|
|
|
|
};
|
|
|
|
|
|
|
|
enum zfcp_erp_act_state {
|
|
|
|
ZFCP_ERP_ACTION_RUNNING = 1,
|
|
|
|
ZFCP_ERP_ACTION_READY = 2,
|
|
|
|
};
|
|
|
|
|
|
|
|
enum zfcp_erp_act_result {
|
|
|
|
ZFCP_ERP_SUCCEEDED = 0,
|
|
|
|
ZFCP_ERP_FAILED = 1,
|
|
|
|
ZFCP_ERP_CONTINUES = 2,
|
|
|
|
ZFCP_ERP_EXIT = 3,
|
|
|
|
ZFCP_ERP_DISMISSED = 4,
|
|
|
|
ZFCP_ERP_NOMEM = 5,
|
|
|
|
};
|
|
|
|
|
|
|
|
static void zfcp_erp_adapter_block(struct zfcp_adapter *adapter, int mask)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2010-09-08 20:40:01 +08:00
|
|
|
zfcp_erp_clear_adapter_status(adapter,
|
|
|
|
ZFCP_STATUS_COMMON_UNBLOCKED | mask);
|
2006-09-19 04:29:56 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_action_exists(struct zfcp_erp_action *act)
|
2006-09-19 04:29:56 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
struct zfcp_erp_action *curr_act;
|
|
|
|
|
|
|
|
list_for_each_entry(curr_act, &act->adapter->erp_running_head, list)
|
|
|
|
if (act == curr_act)
|
|
|
|
return ZFCP_ERP_ACTION_RUNNING;
|
|
|
|
return 0;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static void zfcp_erp_action_ready(struct zfcp_erp_action *act)
|
2006-09-19 04:29:56 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
struct zfcp_adapter *adapter = act->adapter;
|
|
|
|
|
|
|
|
list_move(&act->list, &act->adapter->erp_ready_head);
|
2010-12-02 22:16:12 +08:00
|
|
|
zfcp_dbf_rec_run("erardy1", act);
|
2009-08-18 21:43:25 +08:00
|
|
|
wake_up(&adapter->erp_ready_wq);
|
2010-12-02 22:16:12 +08:00
|
|
|
zfcp_dbf_rec_run("erardy2", act);
|
2006-09-19 04:29:56 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static void zfcp_erp_action_dismiss(struct zfcp_erp_action *act)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
act->status |= ZFCP_STATUS_ERP_DISMISSED;
|
|
|
|
if (zfcp_erp_action_exists(act) == ZFCP_ERP_ACTION_RUNNING)
|
|
|
|
zfcp_erp_action_ready(act);
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
static void zfcp_erp_action_dismiss_lun(struct scsi_device *sdev)
|
2008-07-02 16:56:40 +08:00
|
|
|
{
|
2010-09-08 20:39:55 +08:00
|
|
|
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev);
|
|
|
|
|
|
|
|
if (atomic_read(&zfcp_sdev->status) & ZFCP_STATUS_COMMON_ERP_INUSE)
|
|
|
|
zfcp_erp_action_dismiss(&zfcp_sdev->erp_action);
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static void zfcp_erp_action_dismiss_port(struct zfcp_port *port)
|
|
|
|
{
|
2010-09-08 20:39:55 +08:00
|
|
|
struct scsi_device *sdev;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_ERP_INUSE)
|
|
|
|
zfcp_erp_action_dismiss(&port->erp_action);
|
[SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
CPU: 1 Not tainted 3.9.3+ #69
Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
<snip>
Call Trace:
([<00000000001165de>] show_trace+0x106/0x154)
[<00000000001166a0>] show_stack+0x74/0xf4
[<00000000006ff646>] dump_stack+0xc6/0xd4
[<000000000017f3a0>] __might_sleep+0x128/0x148
[<000000000015ece8>] flush_work+0x54/0x1f8
[<00000000001630de>] __cancel_work_timer+0xc6/0x128
[<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
[<0000000000161816>] execute_in_process_context+0x96/0xa8
[<00000000004d33d8>] device_release+0x60/0xc0
[<000000000048af48>] kobject_release+0xa8/0x1c4
[<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
[<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
[<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
[<000000000016b75a>] kthread+0xf2/0xfc
[<000000000070c9de>] kernel_thread_starter+0x6/0xc
[<000000000070c9d8>] kernel_thread_starter+0x0/0xc
Apparently, the ref_count for some scsi_device drops down to zero,
triggering device removal through execute_in_process_context(), while
the lldd error recovery thread iterates through a scsi device list.
Unfortunately, execute_in_process_context() decides to immediately
execute that device removal function, instead of scheduling asynchronous
execution, since it detects process context and thinks it is safe to do
so. But almost all calls to shost_for_each_device() in our lldd are
inside spin_lock_irq, even in thread context. Obviously, schedule()
inside spin_lock_irq sections is a bad idea.
Change the lldd to use the proper iterator function,
__shost_for_each_device(), in combination with required locking.
Occurences that need to be changed include all calls in zfcp_erp.c,
since those might be executed in zfcp error recovery thread context
with a lock held.
Other occurences of shost_for_each_device() in zfcp_fsf.c do not
need to be changed (no process context, no surrounding locking).
The problem was introduced in Linux 2.6.37 by commit
b62a8d9b45b971a67a0f8413338c230e3117dff5
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org #2.6.37+
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-22 23:45:37 +08:00
|
|
|
else {
|
|
|
|
spin_lock(port->adapter->scsi_host->host_lock);
|
|
|
|
__shost_for_each_device(sdev, port->adapter->scsi_host)
|
2010-09-08 20:39:55 +08:00
|
|
|
if (sdev_to_zfcp(sdev)->port == port)
|
|
|
|
zfcp_erp_action_dismiss_lun(sdev);
|
[SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
CPU: 1 Not tainted 3.9.3+ #69
Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
<snip>
Call Trace:
([<00000000001165de>] show_trace+0x106/0x154)
[<00000000001166a0>] show_stack+0x74/0xf4
[<00000000006ff646>] dump_stack+0xc6/0xd4
[<000000000017f3a0>] __might_sleep+0x128/0x148
[<000000000015ece8>] flush_work+0x54/0x1f8
[<00000000001630de>] __cancel_work_timer+0xc6/0x128
[<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
[<0000000000161816>] execute_in_process_context+0x96/0xa8
[<00000000004d33d8>] device_release+0x60/0xc0
[<000000000048af48>] kobject_release+0xa8/0x1c4
[<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
[<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
[<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
[<000000000016b75a>] kthread+0xf2/0xfc
[<000000000070c9de>] kernel_thread_starter+0x6/0xc
[<000000000070c9d8>] kernel_thread_starter+0x0/0xc
Apparently, the ref_count for some scsi_device drops down to zero,
triggering device removal through execute_in_process_context(), while
the lldd error recovery thread iterates through a scsi device list.
Unfortunately, execute_in_process_context() decides to immediately
execute that device removal function, instead of scheduling asynchronous
execution, since it detects process context and thinks it is safe to do
so. But almost all calls to shost_for_each_device() in our lldd are
inside spin_lock_irq, even in thread context. Obviously, schedule()
inside spin_lock_irq sections is a bad idea.
Change the lldd to use the proper iterator function,
__shost_for_each_device(), in combination with required locking.
Occurences that need to be changed include all calls in zfcp_erp.c,
since those might be executed in zfcp error recovery thread context
with a lock held.
Other occurences of shost_for_each_device() in zfcp_fsf.c do not
need to be changed (no process context, no surrounding locking).
The problem was introduced in Linux 2.6.37 by commit
b62a8d9b45b971a67a0f8413338c230e3117dff5
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org #2.6.37+
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-22 23:45:37 +08:00
|
|
|
spin_unlock(port->adapter->scsi_host->host_lock);
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static void zfcp_erp_action_dismiss_adapter(struct zfcp_adapter *adapter)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
struct zfcp_port *port;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
if (atomic_read(&adapter->status) & ZFCP_STATUS_COMMON_ERP_INUSE)
|
|
|
|
zfcp_erp_action_dismiss(&adapter->erp_action);
|
2009-11-24 23:53:58 +08:00
|
|
|
else {
|
|
|
|
read_lock(&adapter->port_list_lock);
|
|
|
|
list_for_each_entry(port, &adapter->port_list, list)
|
2008-07-02 16:56:40 +08:00
|
|
|
zfcp_erp_action_dismiss_port(port);
|
2009-11-24 23:53:58 +08:00
|
|
|
read_unlock(&adapter->port_list_lock);
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_required_act(int want, struct zfcp_adapter *adapter,
|
|
|
|
struct zfcp_port *port,
|
2010-09-08 20:39:55 +08:00
|
|
|
struct scsi_device *sdev)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
int need = want;
|
2010-09-08 20:39:55 +08:00
|
|
|
int l_status, p_status, a_status;
|
|
|
|
struct zfcp_scsi_dev *zfcp_sdev;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
switch (want) {
|
2010-09-08 20:39:55 +08:00
|
|
|
case ZFCP_ERP_ACTION_REOPEN_LUN:
|
|
|
|
zfcp_sdev = sdev_to_zfcp(sdev);
|
|
|
|
l_status = atomic_read(&zfcp_sdev->status);
|
|
|
|
if (l_status & ZFCP_STATUS_COMMON_ERP_INUSE)
|
2008-07-02 16:56:40 +08:00
|
|
|
return 0;
|
|
|
|
p_status = atomic_read(&port->status);
|
|
|
|
if (!(p_status & ZFCP_STATUS_COMMON_RUNNING) ||
|
|
|
|
p_status & ZFCP_STATUS_COMMON_ERP_FAILED)
|
|
|
|
return 0;
|
|
|
|
if (!(p_status & ZFCP_STATUS_COMMON_UNBLOCKED))
|
|
|
|
need = ZFCP_ERP_ACTION_REOPEN_PORT;
|
|
|
|
/* fall through */
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED:
|
2010-07-08 15:53:06 +08:00
|
|
|
p_status = atomic_read(&port->status);
|
|
|
|
if (!(p_status & ZFCP_STATUS_COMMON_OPEN))
|
|
|
|
need = ZFCP_ERP_ACTION_REOPEN_PORT;
|
|
|
|
/* fall through */
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT:
|
2008-07-02 16:56:40 +08:00
|
|
|
p_status = atomic_read(&port->status);
|
|
|
|
if (p_status & ZFCP_STATUS_COMMON_ERP_INUSE)
|
|
|
|
return 0;
|
|
|
|
a_status = atomic_read(&adapter->status);
|
|
|
|
if (!(a_status & ZFCP_STATUS_COMMON_RUNNING) ||
|
|
|
|
a_status & ZFCP_STATUS_COMMON_ERP_FAILED)
|
|
|
|
return 0;
|
2010-11-17 21:23:42 +08:00
|
|
|
if (p_status & ZFCP_STATUS_COMMON_NOESC)
|
|
|
|
return need;
|
2008-07-02 16:56:40 +08:00
|
|
|
if (!(a_status & ZFCP_STATUS_COMMON_UNBLOCKED))
|
|
|
|
need = ZFCP_ERP_ACTION_REOPEN_ADAPTER;
|
|
|
|
/* fall through */
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_ADAPTER:
|
|
|
|
a_status = atomic_read(&adapter->status);
|
|
|
|
if (a_status & ZFCP_STATUS_COMMON_ERP_INUSE)
|
|
|
|
return 0;
|
2009-08-18 21:43:27 +08:00
|
|
|
if (!(a_status & ZFCP_STATUS_COMMON_RUNNING) &&
|
|
|
|
!(a_status & ZFCP_STATUS_COMMON_OPEN))
|
|
|
|
return 0; /* shutdown requested for closed adapter */
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
return need;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2010-09-08 20:39:54 +08:00
|
|
|
static struct zfcp_erp_action *zfcp_erp_setup_act(int need, u32 act_status,
|
2008-07-02 16:56:40 +08:00
|
|
|
struct zfcp_adapter *adapter,
|
|
|
|
struct zfcp_port *port,
|
2010-09-08 20:39:55 +08:00
|
|
|
struct scsi_device *sdev)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
struct zfcp_erp_action *erp_action;
|
2010-09-08 20:39:55 +08:00
|
|
|
struct zfcp_scsi_dev *zfcp_sdev;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
switch (need) {
|
2010-09-08 20:39:55 +08:00
|
|
|
case ZFCP_ERP_ACTION_REOPEN_LUN:
|
|
|
|
zfcp_sdev = sdev_to_zfcp(sdev);
|
2010-09-08 20:39:54 +08:00
|
|
|
if (!(act_status & ZFCP_STATUS_ERP_NO_REF))
|
2010-09-08 20:39:55 +08:00
|
|
|
if (scsi_device_get(sdev))
|
2010-09-08 20:39:54 +08:00
|
|
|
return NULL;
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_or(ZFCP_STATUS_COMMON_ERP_INUSE,
|
2010-09-08 20:39:55 +08:00
|
|
|
&zfcp_sdev->status);
|
|
|
|
erp_action = &zfcp_sdev->erp_action;
|
2010-11-17 21:23:43 +08:00
|
|
|
memset(erp_action, 0, sizeof(struct zfcp_erp_action));
|
|
|
|
erp_action->port = port;
|
|
|
|
erp_action->sdev = sdev;
|
2010-09-08 20:39:55 +08:00
|
|
|
if (!(atomic_read(&zfcp_sdev->status) &
|
|
|
|
ZFCP_STATUS_COMMON_RUNNING))
|
2010-09-08 20:39:54 +08:00
|
|
|
act_status |= ZFCP_STATUS_ERP_CLOSE_ONLY;
|
2008-07-02 16:56:40 +08:00
|
|
|
break;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT:
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED:
|
2010-02-17 18:18:56 +08:00
|
|
|
if (!get_device(&port->dev))
|
2009-11-24 23:54:05 +08:00
|
|
|
return NULL;
|
2008-07-02 16:56:40 +08:00
|
|
|
zfcp_erp_action_dismiss_port(port);
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_or(ZFCP_STATUS_COMMON_ERP_INUSE, &port->status);
|
2008-07-02 16:56:40 +08:00
|
|
|
erp_action = &port->erp_action;
|
2010-11-17 21:23:43 +08:00
|
|
|
memset(erp_action, 0, sizeof(struct zfcp_erp_action));
|
|
|
|
erp_action->port = port;
|
2008-07-02 16:56:40 +08:00
|
|
|
if (!(atomic_read(&port->status) & ZFCP_STATUS_COMMON_RUNNING))
|
2010-09-08 20:39:54 +08:00
|
|
|
act_status |= ZFCP_STATUS_ERP_CLOSE_ONLY;
|
2008-07-02 16:56:40 +08:00
|
|
|
break;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
case ZFCP_ERP_ACTION_REOPEN_ADAPTER:
|
2009-11-24 23:53:59 +08:00
|
|
|
kref_get(&adapter->ref);
|
2008-07-02 16:56:40 +08:00
|
|
|
zfcp_erp_action_dismiss_adapter(adapter);
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_or(ZFCP_STATUS_COMMON_ERP_INUSE, &adapter->status);
|
2008-07-02 16:56:40 +08:00
|
|
|
erp_action = &adapter->erp_action;
|
2010-11-17 21:23:43 +08:00
|
|
|
memset(erp_action, 0, sizeof(struct zfcp_erp_action));
|
2008-07-02 16:56:40 +08:00
|
|
|
if (!(atomic_read(&adapter->status) &
|
|
|
|
ZFCP_STATUS_COMMON_RUNNING))
|
2010-09-08 20:39:54 +08:00
|
|
|
act_status |= ZFCP_STATUS_ERP_CLOSE_ONLY;
|
2008-07-02 16:56:40 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
default:
|
|
|
|
return NULL;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
erp_action->adapter = adapter;
|
|
|
|
erp_action->action = need;
|
2010-09-08 20:39:54 +08:00
|
|
|
erp_action->status = act_status;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
return erp_action;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_action_enqueue(int want, struct zfcp_adapter *adapter,
|
|
|
|
struct zfcp_port *port,
|
2010-09-08 20:39:55 +08:00
|
|
|
struct scsi_device *sdev,
|
2010-12-02 22:16:16 +08:00
|
|
|
char *id, u32 act_status)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
int retval = 1, need;
|
2010-12-02 22:16:12 +08:00
|
|
|
struct zfcp_erp_action *act;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2009-08-18 21:43:25 +08:00
|
|
|
if (!adapter->erp_thread)
|
2008-07-02 16:56:40 +08:00
|
|
|
return -EIO;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
need = zfcp_erp_required_act(want, adapter, port, sdev);
|
2008-07-02 16:56:40 +08:00
|
|
|
if (!need)
|
2005-04-17 06:20:36 +08:00
|
|
|
goto out;
|
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
act = zfcp_erp_setup_act(need, act_status, adapter, port, sdev);
|
2008-07-02 16:56:40 +08:00
|
|
|
if (!act)
|
|
|
|
goto out;
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_or(ZFCP_STATUS_ADAPTER_ERP_PENDING, &adapter->status);
|
2008-07-02 16:56:40 +08:00
|
|
|
++adapter->erp_total_count;
|
|
|
|
list_add_tail(&act->list, &adapter->erp_ready_head);
|
2009-08-18 21:43:25 +08:00
|
|
|
wake_up(&adapter->erp_ready_wq);
|
2008-07-02 16:56:40 +08:00
|
|
|
retval = 0;
|
2005-04-17 06:20:36 +08:00
|
|
|
out:
|
2010-12-02 22:16:12 +08:00
|
|
|
zfcp_dbf_rec_trig(id, adapter, port, sdev, want, need);
|
2005-04-17 06:20:36 +08:00
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int _zfcp_erp_adapter_reopen(struct zfcp_adapter *adapter,
|
2010-12-02 22:16:16 +08:00
|
|
|
int clear_mask, char *id)
|
2008-07-02 16:56:40 +08:00
|
|
|
{
|
|
|
|
zfcp_erp_adapter_block(adapter, clear_mask);
|
2009-03-02 20:09:08 +08:00
|
|
|
zfcp_scsi_schedule_rports_block(adapter);
|
2008-07-02 16:56:40 +08:00
|
|
|
|
|
|
|
/* ensure propagation of failed status to new devices */
|
|
|
|
if (atomic_read(&adapter->status) & ZFCP_STATUS_COMMON_ERP_FAILED) {
|
2010-09-08 20:40:01 +08:00
|
|
|
zfcp_erp_set_adapter_status(adapter,
|
|
|
|
ZFCP_STATUS_COMMON_ERP_FAILED);
|
2008-07-02 16:56:40 +08:00
|
|
|
return -EIO;
|
|
|
|
}
|
|
|
|
return zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER,
|
2010-12-02 22:16:16 +08:00
|
|
|
adapter, NULL, NULL, id, 0);
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* zfcp_erp_adapter_reopen - Reopen adapter.
|
|
|
|
* @adapter: Adapter to reopen.
|
|
|
|
* @clear: Status flags to clear.
|
|
|
|
* @id: Id for debug trace event.
|
2005-04-17 06:20:36 +08:00
|
|
|
*/
|
2010-12-02 22:16:16 +08:00
|
|
|
void zfcp_erp_adapter_reopen(struct zfcp_adapter *adapter, int clear, char *id)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
unsigned long flags;
|
|
|
|
|
2009-11-24 23:53:58 +08:00
|
|
|
zfcp_erp_adapter_block(adapter, clear);
|
|
|
|
zfcp_scsi_schedule_rports_block(adapter);
|
|
|
|
|
|
|
|
write_lock_irqsave(&adapter->erp_lock, flags);
|
|
|
|
if (atomic_read(&adapter->status) & ZFCP_STATUS_COMMON_ERP_FAILED)
|
2010-09-08 20:40:01 +08:00
|
|
|
zfcp_erp_set_adapter_status(adapter,
|
|
|
|
ZFCP_STATUS_COMMON_ERP_FAILED);
|
2009-11-24 23:53:58 +08:00
|
|
|
else
|
|
|
|
zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER, adapter,
|
2010-12-02 22:16:16 +08:00
|
|
|
NULL, NULL, id, 0);
|
2009-11-24 23:53:58 +08:00
|
|
|
write_unlock_irqrestore(&adapter->erp_lock, flags);
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
/**
|
|
|
|
* zfcp_erp_adapter_shutdown - Shutdown adapter.
|
|
|
|
* @adapter: Adapter to shut down.
|
|
|
|
* @clear: Status flags to clear.
|
|
|
|
* @id: Id for debug trace event.
|
|
|
|
*/
|
|
|
|
void zfcp_erp_adapter_shutdown(struct zfcp_adapter *adapter, int clear,
|
2010-12-02 22:16:16 +08:00
|
|
|
char *id)
|
2008-07-02 16:56:40 +08:00
|
|
|
{
|
|
|
|
int flags = ZFCP_STATUS_COMMON_RUNNING | ZFCP_STATUS_COMMON_ERP_FAILED;
|
2010-12-02 22:16:16 +08:00
|
|
|
zfcp_erp_adapter_reopen(adapter, clear | flags, id);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
/**
|
|
|
|
* zfcp_erp_port_shutdown - Shutdown port
|
|
|
|
* @port: Port to shut down.
|
|
|
|
* @clear: Status flags to clear.
|
|
|
|
* @id: Id for debug trace event.
|
2005-04-17 06:20:36 +08:00
|
|
|
*/
|
2010-12-02 22:16:16 +08:00
|
|
|
void zfcp_erp_port_shutdown(struct zfcp_port *port, int clear, char *id)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
int flags = ZFCP_STATUS_COMMON_RUNNING | ZFCP_STATUS_COMMON_ERP_FAILED;
|
2010-12-02 22:16:16 +08:00
|
|
|
zfcp_erp_port_reopen(port, clear | flags, id);
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void zfcp_erp_port_block(struct zfcp_port *port, int clear)
|
|
|
|
{
|
2010-09-08 20:40:01 +08:00
|
|
|
zfcp_erp_clear_port_status(port,
|
|
|
|
ZFCP_STATUS_COMMON_UNBLOCKED | clear);
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
|
|
|
|
2010-12-02 22:16:16 +08:00
|
|
|
static void _zfcp_erp_port_forced_reopen(struct zfcp_port *port, int clear,
|
|
|
|
char *id)
|
2008-07-02 16:56:40 +08:00
|
|
|
{
|
|
|
|
zfcp_erp_port_block(port, clear);
|
2009-03-02 20:09:08 +08:00
|
|
|
zfcp_scsi_schedule_rport_block(port);
|
2008-07-02 16:56:40 +08:00
|
|
|
|
|
|
|
if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_ERP_FAILED)
|
|
|
|
return;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_PORT_FORCED,
|
2010-12-02 22:16:16 +08:00
|
|
|
port->adapter, port, NULL, id, 0);
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* zfcp_erp_port_forced_reopen - Forced close of port and open again
|
|
|
|
* @port: Port to force close and to reopen.
|
2010-12-02 22:16:16 +08:00
|
|
|
* @clear: Status flags to clear.
|
2008-07-02 16:56:40 +08:00
|
|
|
* @id: Id for debug trace event.
|
|
|
|
*/
|
2010-12-02 22:16:16 +08:00
|
|
|
void zfcp_erp_port_forced_reopen(struct zfcp_port *port, int clear, char *id)
|
2008-07-02 16:56:40 +08:00
|
|
|
{
|
|
|
|
unsigned long flags;
|
|
|
|
struct zfcp_adapter *adapter = port->adapter;
|
|
|
|
|
2009-11-24 23:53:58 +08:00
|
|
|
write_lock_irqsave(&adapter->erp_lock, flags);
|
2010-12-02 22:16:16 +08:00
|
|
|
_zfcp_erp_port_forced_reopen(port, clear, id);
|
2009-11-24 23:53:58 +08:00
|
|
|
write_unlock_irqrestore(&adapter->erp_lock, flags);
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
|
|
|
|
2010-12-02 22:16:16 +08:00
|
|
|
static int _zfcp_erp_port_reopen(struct zfcp_port *port, int clear, char *id)
|
2008-07-02 16:56:40 +08:00
|
|
|
{
|
|
|
|
zfcp_erp_port_block(port, clear);
|
2009-03-02 20:09:08 +08:00
|
|
|
zfcp_scsi_schedule_rport_block(port);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_ERP_FAILED) {
|
2005-04-17 06:20:36 +08:00
|
|
|
/* ensure propagation of failed status to new devices */
|
2010-09-08 20:40:01 +08:00
|
|
|
zfcp_erp_set_port_status(port, ZFCP_STATUS_COMMON_ERP_FAILED);
|
2008-07-02 16:56:40 +08:00
|
|
|
return -EIO;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
return zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_PORT,
|
2010-12-02 22:16:16 +08:00
|
|
|
port->adapter, port, NULL, id, 0);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
2008-07-02 16:56:40 +08:00
|
|
|
* zfcp_erp_port_reopen - trigger remote port recovery
|
|
|
|
* @port: port to recover
|
|
|
|
* @clear_mask: flags in port status to be cleared
|
2010-12-02 22:16:16 +08:00
|
|
|
* @id: Id for debug trace event.
|
2005-04-17 06:20:36 +08:00
|
|
|
*
|
2008-07-02 16:56:40 +08:00
|
|
|
* Returns 0 if recovery has been triggered, < 0 if not.
|
2005-04-17 06:20:36 +08:00
|
|
|
*/
|
2010-12-02 22:16:16 +08:00
|
|
|
int zfcp_erp_port_reopen(struct zfcp_port *port, int clear, char *id)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
int retval;
|
2009-11-24 23:53:58 +08:00
|
|
|
unsigned long flags;
|
2005-04-17 06:20:36 +08:00
|
|
|
struct zfcp_adapter *adapter = port->adapter;
|
|
|
|
|
2009-11-24 23:53:58 +08:00
|
|
|
write_lock_irqsave(&adapter->erp_lock, flags);
|
2010-12-02 22:16:16 +08:00
|
|
|
retval = _zfcp_erp_port_reopen(port, clear, id);
|
2009-11-24 23:53:58 +08:00
|
|
|
write_unlock_irqrestore(&adapter->erp_lock, flags);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
static void zfcp_erp_lun_block(struct scsi_device *sdev, int clear_mask)
|
2008-07-02 16:56:40 +08:00
|
|
|
{
|
2010-09-08 20:40:01 +08:00
|
|
|
zfcp_erp_clear_lun_status(sdev,
|
|
|
|
ZFCP_STATUS_COMMON_UNBLOCKED | clear_mask);
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
static void _zfcp_erp_lun_reopen(struct scsi_device *sdev, int clear, char *id,
|
2010-12-02 22:16:16 +08:00
|
|
|
u32 act_status)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2010-09-08 20:39:55 +08:00
|
|
|
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev);
|
|
|
|
struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
zfcp_erp_lun_block(sdev, clear);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
if (atomic_read(&zfcp_sdev->status) & ZFCP_STATUS_COMMON_ERP_FAILED)
|
2008-07-02 16:56:40 +08:00
|
|
|
return;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_LUN, adapter,
|
2010-12-02 22:16:16 +08:00
|
|
|
zfcp_sdev->port, sdev, id, act_status);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
2010-09-08 20:39:55 +08:00
|
|
|
* zfcp_erp_lun_reopen - initiate reopen of a LUN
|
|
|
|
* @sdev: SCSI device / LUN to be reopened
|
|
|
|
* @clear_mask: specifies flags in LUN status to be cleared
|
2010-12-02 22:16:16 +08:00
|
|
|
* @id: Id for debug trace event.
|
|
|
|
*
|
2005-04-17 06:20:36 +08:00
|
|
|
* Return: 0 on success, < 0 on error
|
|
|
|
*/
|
2010-12-02 22:16:16 +08:00
|
|
|
void zfcp_erp_lun_reopen(struct scsi_device *sdev, int clear, char *id)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
unsigned long flags;
|
2010-09-08 20:39:55 +08:00
|
|
|
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev);
|
|
|
|
struct zfcp_port *port = zfcp_sdev->port;
|
2008-07-02 16:56:40 +08:00
|
|
|
struct zfcp_adapter *adapter = port->adapter;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2009-11-24 23:53:58 +08:00
|
|
|
write_lock_irqsave(&adapter->erp_lock, flags);
|
2010-12-02 22:16:16 +08:00
|
|
|
_zfcp_erp_lun_reopen(sdev, clear, id, 0);
|
2010-09-08 20:39:54 +08:00
|
|
|
write_unlock_irqrestore(&adapter->erp_lock, flags);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
2010-09-08 20:39:55 +08:00
|
|
|
* zfcp_erp_lun_shutdown - Shutdown LUN
|
|
|
|
* @sdev: SCSI device / LUN to shut down.
|
2010-09-08 20:39:54 +08:00
|
|
|
* @clear: Status flags to clear.
|
|
|
|
* @id: Id for debug trace event.
|
|
|
|
*/
|
2010-12-02 22:16:16 +08:00
|
|
|
void zfcp_erp_lun_shutdown(struct scsi_device *sdev, int clear, char *id)
|
2010-09-08 20:39:54 +08:00
|
|
|
{
|
|
|
|
int flags = ZFCP_STATUS_COMMON_RUNNING | ZFCP_STATUS_COMMON_ERP_FAILED;
|
2010-12-02 22:16:16 +08:00
|
|
|
zfcp_erp_lun_reopen(sdev, clear | flags, id);
|
2010-09-08 20:39:54 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
2010-09-08 20:39:55 +08:00
|
|
|
* zfcp_erp_lun_shutdown_wait - Shutdown LUN and wait for erp completion
|
|
|
|
* @sdev: SCSI device / LUN to shut down.
|
2010-09-08 20:39:54 +08:00
|
|
|
* @id: Id for debug trace event.
|
|
|
|
*
|
2010-09-08 20:39:55 +08:00
|
|
|
* Do not acquire a reference for the LUN when creating the ERP
|
2010-09-08 20:39:54 +08:00
|
|
|
* action. It is safe, because this function waits for the ERP to
|
2010-09-08 20:39:55 +08:00
|
|
|
* complete first. This allows to shutdown the LUN, even when the SCSI
|
|
|
|
* device is in the state SDEV_DEL when scsi_device_get will fail.
|
2010-09-08 20:39:54 +08:00
|
|
|
*/
|
2010-09-08 20:39:55 +08:00
|
|
|
void zfcp_erp_lun_shutdown_wait(struct scsi_device *sdev, char *id)
|
2010-09-08 20:39:54 +08:00
|
|
|
{
|
|
|
|
unsigned long flags;
|
2010-09-08 20:39:55 +08:00
|
|
|
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev);
|
|
|
|
struct zfcp_port *port = zfcp_sdev->port;
|
2010-09-08 20:39:54 +08:00
|
|
|
struct zfcp_adapter *adapter = port->adapter;
|
|
|
|
int clear = ZFCP_STATUS_COMMON_RUNNING | ZFCP_STATUS_COMMON_ERP_FAILED;
|
|
|
|
|
|
|
|
write_lock_irqsave(&adapter->erp_lock, flags);
|
2010-12-02 22:16:16 +08:00
|
|
|
_zfcp_erp_lun_reopen(sdev, clear, id, ZFCP_STATUS_ERP_NO_REF);
|
2009-11-24 23:53:58 +08:00
|
|
|
write_unlock_irqrestore(&adapter->erp_lock, flags);
|
2010-09-08 20:39:54 +08:00
|
|
|
|
|
|
|
zfcp_erp_wait(adapter);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int status_change_set(unsigned long mask, atomic_t *status)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
return (atomic_read(status) ^ mask) & mask;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static void zfcp_erp_adapter_unblock(struct zfcp_adapter *adapter)
|
2008-03-27 21:22:02 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
if (status_change_set(ZFCP_STATUS_COMMON_UNBLOCKED, &adapter->status))
|
2010-12-02 22:16:12 +08:00
|
|
|
zfcp_dbf_rec_run("eraubl1", &adapter->erp_action);
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_or(ZFCP_STATUS_COMMON_UNBLOCKED, &adapter->status);
|
2008-03-27 21:22:02 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static void zfcp_erp_port_unblock(struct zfcp_port *port)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
if (status_change_set(ZFCP_STATUS_COMMON_UNBLOCKED, &port->status))
|
2010-12-02 22:16:12 +08:00
|
|
|
zfcp_dbf_rec_run("erpubl1", &port->erp_action);
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_or(ZFCP_STATUS_COMMON_UNBLOCKED, &port->status);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
static void zfcp_erp_lun_unblock(struct scsi_device *sdev)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2010-09-08 20:39:55 +08:00
|
|
|
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev);
|
|
|
|
|
|
|
|
if (status_change_set(ZFCP_STATUS_COMMON_UNBLOCKED, &zfcp_sdev->status))
|
2010-12-02 22:16:12 +08:00
|
|
|
zfcp_dbf_rec_run("erlubl1", &sdev_to_zfcp(sdev)->erp_action);
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_or(ZFCP_STATUS_COMMON_UNBLOCKED, &zfcp_sdev->status);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static void zfcp_erp_action_to_running(struct zfcp_erp_action *erp_action)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
list_move(&erp_action->list, &erp_action->adapter->erp_running_head);
|
2010-12-02 22:16:12 +08:00
|
|
|
zfcp_dbf_rec_run("erator1", erp_action);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static void zfcp_erp_strategy_check_fsfreq(struct zfcp_erp_action *act)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
struct zfcp_adapter *adapter = act->adapter;
|
2010-02-17 18:18:49 +08:00
|
|
|
struct zfcp_fsf_req *req;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2010-02-17 18:18:49 +08:00
|
|
|
if (!act->fsf_req_id)
|
2008-07-02 16:56:40 +08:00
|
|
|
return;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2010-02-17 18:18:50 +08:00
|
|
|
spin_lock(&adapter->req_list->lock);
|
|
|
|
req = _zfcp_reqlist_find(adapter->req_list, act->fsf_req_id);
|
2010-02-17 18:18:49 +08:00
|
|
|
if (req && req->erp_action == act) {
|
2008-07-02 16:56:40 +08:00
|
|
|
if (act->status & (ZFCP_STATUS_ERP_DISMISSED |
|
|
|
|
ZFCP_STATUS_ERP_TIMEDOUT)) {
|
2010-02-17 18:18:49 +08:00
|
|
|
req->status |= ZFCP_STATUS_FSFREQ_DISMISSED;
|
2010-12-02 22:16:12 +08:00
|
|
|
zfcp_dbf_rec_run("erscf_1", act);
|
2010-02-17 18:18:49 +08:00
|
|
|
req->erp_action = NULL;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
2008-07-02 16:56:40 +08:00
|
|
|
if (act->status & ZFCP_STATUS_ERP_TIMEDOUT)
|
2010-12-02 22:16:12 +08:00
|
|
|
zfcp_dbf_rec_run("erscf_2", act);
|
2010-02-17 18:18:49 +08:00
|
|
|
if (req->status & ZFCP_STATUS_FSFREQ_DISMISSED)
|
|
|
|
act->fsf_req_id = 0;
|
2008-07-02 16:56:40 +08:00
|
|
|
} else
|
2010-02-17 18:18:49 +08:00
|
|
|
act->fsf_req_id = 0;
|
2010-02-17 18:18:50 +08:00
|
|
|
spin_unlock(&adapter->req_list->lock);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
/**
|
|
|
|
* zfcp_erp_notify - Trigger ERP action.
|
|
|
|
* @erp_action: ERP action to continue.
|
|
|
|
* @set_mask: ERP action status flags to set.
|
2005-04-17 06:20:36 +08:00
|
|
|
*/
|
2008-07-02 16:56:40 +08:00
|
|
|
void zfcp_erp_notify(struct zfcp_erp_action *erp_action, unsigned long set_mask)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
struct zfcp_adapter *adapter = erp_action->adapter;
|
2008-07-02 16:56:40 +08:00
|
|
|
unsigned long flags;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
write_lock_irqsave(&adapter->erp_lock, flags);
|
2005-04-17 06:20:36 +08:00
|
|
|
if (zfcp_erp_action_exists(erp_action) == ZFCP_ERP_ACTION_RUNNING) {
|
|
|
|
erp_action->status |= set_mask;
|
|
|
|
zfcp_erp_action_ready(erp_action);
|
|
|
|
}
|
|
|
|
write_unlock_irqrestore(&adapter->erp_lock, flags);
|
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
/**
|
|
|
|
* zfcp_erp_timeout_handler - Trigger ERP action from timed out ERP request
|
|
|
|
* @data: ERP action (from timer data)
|
2005-04-17 06:20:36 +08:00
|
|
|
*/
|
2008-07-02 16:56:40 +08:00
|
|
|
void zfcp_erp_timeout_handler(unsigned long data)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
struct zfcp_erp_action *act = (struct zfcp_erp_action *) data;
|
|
|
|
zfcp_erp_notify(act, ZFCP_STATUS_ERP_TIMEDOUT);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static void zfcp_erp_memwait_handler(unsigned long data)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
zfcp_erp_notify((struct zfcp_erp_action *)data, 0);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static void zfcp_erp_strategy_memwait(struct zfcp_erp_action *erp_action)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
init_timer(&erp_action->timer);
|
|
|
|
erp_action->timer.function = zfcp_erp_memwait_handler;
|
|
|
|
erp_action->timer.data = (unsigned long) erp_action;
|
|
|
|
erp_action->timer.expires = jiffies + HZ;
|
|
|
|
add_timer(&erp_action->timer);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static void _zfcp_erp_port_reopen_all(struct zfcp_adapter *adapter,
|
2010-12-02 22:16:16 +08:00
|
|
|
int clear, char *id)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
struct zfcp_port *port;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2009-11-24 23:53:58 +08:00
|
|
|
read_lock(&adapter->port_list_lock);
|
|
|
|
list_for_each_entry(port, &adapter->port_list, list)
|
2010-12-02 22:16:16 +08:00
|
|
|
_zfcp_erp_port_reopen(port, clear, id);
|
2009-11-24 23:53:58 +08:00
|
|
|
read_unlock(&adapter->port_list_lock);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
static void _zfcp_erp_lun_reopen_all(struct zfcp_port *port, int clear,
|
2010-12-02 22:16:16 +08:00
|
|
|
char *id)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2010-09-08 20:39:55 +08:00
|
|
|
struct scsi_device *sdev;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
[SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
CPU: 1 Not tainted 3.9.3+ #69
Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
<snip>
Call Trace:
([<00000000001165de>] show_trace+0x106/0x154)
[<00000000001166a0>] show_stack+0x74/0xf4
[<00000000006ff646>] dump_stack+0xc6/0xd4
[<000000000017f3a0>] __might_sleep+0x128/0x148
[<000000000015ece8>] flush_work+0x54/0x1f8
[<00000000001630de>] __cancel_work_timer+0xc6/0x128
[<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
[<0000000000161816>] execute_in_process_context+0x96/0xa8
[<00000000004d33d8>] device_release+0x60/0xc0
[<000000000048af48>] kobject_release+0xa8/0x1c4
[<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
[<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
[<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
[<000000000016b75a>] kthread+0xf2/0xfc
[<000000000070c9de>] kernel_thread_starter+0x6/0xc
[<000000000070c9d8>] kernel_thread_starter+0x0/0xc
Apparently, the ref_count for some scsi_device drops down to zero,
triggering device removal through execute_in_process_context(), while
the lldd error recovery thread iterates through a scsi device list.
Unfortunately, execute_in_process_context() decides to immediately
execute that device removal function, instead of scheduling asynchronous
execution, since it detects process context and thinks it is safe to do
so. But almost all calls to shost_for_each_device() in our lldd are
inside spin_lock_irq, even in thread context. Obviously, schedule()
inside spin_lock_irq sections is a bad idea.
Change the lldd to use the proper iterator function,
__shost_for_each_device(), in combination with required locking.
Occurences that need to be changed include all calls in zfcp_erp.c,
since those might be executed in zfcp error recovery thread context
with a lock held.
Other occurences of shost_for_each_device() in zfcp_fsf.c do not
need to be changed (no process context, no surrounding locking).
The problem was introduced in Linux 2.6.37 by commit
b62a8d9b45b971a67a0f8413338c230e3117dff5
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org #2.6.37+
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-22 23:45:37 +08:00
|
|
|
spin_lock(port->adapter->scsi_host->host_lock);
|
|
|
|
__shost_for_each_device(sdev, port->adapter->scsi_host)
|
2010-09-08 20:39:55 +08:00
|
|
|
if (sdev_to_zfcp(sdev)->port == port)
|
2010-12-02 22:16:16 +08:00
|
|
|
_zfcp_erp_lun_reopen(sdev, clear, id, 0);
|
[SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
CPU: 1 Not tainted 3.9.3+ #69
Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
<snip>
Call Trace:
([<00000000001165de>] show_trace+0x106/0x154)
[<00000000001166a0>] show_stack+0x74/0xf4
[<00000000006ff646>] dump_stack+0xc6/0xd4
[<000000000017f3a0>] __might_sleep+0x128/0x148
[<000000000015ece8>] flush_work+0x54/0x1f8
[<00000000001630de>] __cancel_work_timer+0xc6/0x128
[<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
[<0000000000161816>] execute_in_process_context+0x96/0xa8
[<00000000004d33d8>] device_release+0x60/0xc0
[<000000000048af48>] kobject_release+0xa8/0x1c4
[<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
[<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
[<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
[<000000000016b75a>] kthread+0xf2/0xfc
[<000000000070c9de>] kernel_thread_starter+0x6/0xc
[<000000000070c9d8>] kernel_thread_starter+0x0/0xc
Apparently, the ref_count for some scsi_device drops down to zero,
triggering device removal through execute_in_process_context(), while
the lldd error recovery thread iterates through a scsi device list.
Unfortunately, execute_in_process_context() decides to immediately
execute that device removal function, instead of scheduling asynchronous
execution, since it detects process context and thinks it is safe to do
so. But almost all calls to shost_for_each_device() in our lldd are
inside spin_lock_irq, even in thread context. Obviously, schedule()
inside spin_lock_irq sections is a bad idea.
Change the lldd to use the proper iterator function,
__shost_for_each_device(), in combination with required locking.
Occurences that need to be changed include all calls in zfcp_erp.c,
since those might be executed in zfcp error recovery thread context
with a lock held.
Other occurences of shost_for_each_device() in zfcp_fsf.c do not
need to be changed (no process context, no surrounding locking).
The problem was introduced in Linux 2.6.37 by commit
b62a8d9b45b971a67a0f8413338c230e3117dff5
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org #2.6.37+
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-22 23:45:37 +08:00
|
|
|
spin_unlock(port->adapter->scsi_host->host_lock);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2009-07-13 21:06:09 +08:00
|
|
|
static void zfcp_erp_strategy_followup_failed(struct zfcp_erp_action *act)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
switch (act->action) {
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_ADAPTER:
|
2010-12-02 22:16:16 +08:00
|
|
|
_zfcp_erp_adapter_reopen(act->adapter, 0, "ersff_1");
|
2008-07-02 16:56:40 +08:00
|
|
|
break;
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED:
|
2010-12-02 22:16:16 +08:00
|
|
|
_zfcp_erp_port_forced_reopen(act->port, 0, "ersff_2");
|
2008-07-02 16:56:40 +08:00
|
|
|
break;
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT:
|
2010-12-02 22:16:16 +08:00
|
|
|
_zfcp_erp_port_reopen(act->port, 0, "ersff_3");
|
2008-07-02 16:56:40 +08:00
|
|
|
break;
|
2010-09-08 20:39:55 +08:00
|
|
|
case ZFCP_ERP_ACTION_REOPEN_LUN:
|
2010-12-02 22:16:16 +08:00
|
|
|
_zfcp_erp_lun_reopen(act->sdev, 0, "ersff_4", 0);
|
2009-07-13 21:06:09 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void zfcp_erp_strategy_followup_success(struct zfcp_erp_action *act)
|
|
|
|
{
|
|
|
|
switch (act->action) {
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_ADAPTER:
|
2010-12-02 22:16:16 +08:00
|
|
|
_zfcp_erp_port_reopen_all(act->adapter, 0, "ersfs_1");
|
2009-07-13 21:06:09 +08:00
|
|
|
break;
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED:
|
2010-12-02 22:16:16 +08:00
|
|
|
_zfcp_erp_port_reopen(act->port, 0, "ersfs_2");
|
2009-07-13 21:06:09 +08:00
|
|
|
break;
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT:
|
2010-12-02 22:16:16 +08:00
|
|
|
_zfcp_erp_lun_reopen_all(act->port, 0, "ersfs_3");
|
2008-07-02 16:56:40 +08:00
|
|
|
break;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static void zfcp_erp_wakeup(struct zfcp_adapter *adapter)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
unsigned long flags;
|
|
|
|
|
2009-11-24 23:53:58 +08:00
|
|
|
read_lock_irqsave(&adapter->erp_lock, flags);
|
2008-07-02 16:56:40 +08:00
|
|
|
if (list_empty(&adapter->erp_ready_head) &&
|
|
|
|
list_empty(&adapter->erp_running_head)) {
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(ZFCP_STATUS_ADAPTER_ERP_PENDING,
|
2008-07-02 16:56:40 +08:00
|
|
|
&adapter->status);
|
|
|
|
wake_up(&adapter->erp_done_wqh);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
2009-11-24 23:53:58 +08:00
|
|
|
read_unlock_irqrestore(&adapter->erp_lock, flags);
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static void zfcp_erp_enqueue_ptp_port(struct zfcp_adapter *adapter)
|
|
|
|
{
|
|
|
|
struct zfcp_port *port;
|
|
|
|
port = zfcp_port_enqueue(adapter, adapter->peer_wwpn, 0,
|
|
|
|
adapter->peer_d_id);
|
|
|
|
if (IS_ERR(port)) /* error or port already attached */
|
|
|
|
return;
|
2010-12-02 22:16:16 +08:00
|
|
|
_zfcp_erp_port_reopen(port, 0, "ereptp1");
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_adapter_strat_fsf_xconf(struct zfcp_erp_action *erp_action)
|
|
|
|
{
|
|
|
|
int retries;
|
|
|
|
int sleep = 1;
|
|
|
|
struct zfcp_adapter *adapter = erp_action->adapter;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(ZFCP_STATUS_ADAPTER_XCONFIG_OK, &adapter->status);
|
2008-07-02 16:56:40 +08:00
|
|
|
|
|
|
|
for (retries = 7; retries; retries--) {
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(ZFCP_STATUS_ADAPTER_HOST_CON_INIT,
|
2008-07-02 16:56:40 +08:00
|
|
|
&adapter->status);
|
|
|
|
write_lock_irq(&adapter->erp_lock);
|
|
|
|
zfcp_erp_action_to_running(erp_action);
|
|
|
|
write_unlock_irq(&adapter->erp_lock);
|
|
|
|
if (zfcp_fsf_exchange_config_data(erp_action)) {
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(ZFCP_STATUS_ADAPTER_HOST_CON_INIT,
|
2008-07-02 16:56:40 +08:00
|
|
|
&adapter->status);
|
|
|
|
return ZFCP_ERP_FAILED;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2009-08-18 21:43:25 +08:00
|
|
|
wait_event(adapter->erp_ready_wq,
|
|
|
|
!list_empty(&adapter->erp_ready_head));
|
2008-07-02 16:56:40 +08:00
|
|
|
if (erp_action->status & ZFCP_STATUS_ERP_TIMEDOUT)
|
|
|
|
break;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
if (!(atomic_read(&adapter->status) &
|
|
|
|
ZFCP_STATUS_ADAPTER_HOST_CON_INIT))
|
|
|
|
break;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
ssleep(sleep);
|
|
|
|
sleep *= 2;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(ZFCP_STATUS_ADAPTER_HOST_CON_INIT,
|
2008-07-02 16:56:40 +08:00
|
|
|
&adapter->status);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
if (!(atomic_read(&adapter->status) & ZFCP_STATUS_ADAPTER_XCONFIG_OK))
|
|
|
|
return ZFCP_ERP_FAILED;
|
2007-09-07 15:15:31 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
if (fc_host_port_type(adapter->scsi_host) == FC_PORTTYPE_PTP)
|
|
|
|
zfcp_erp_enqueue_ptp_port(adapter);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
return ZFCP_ERP_SUCCEEDED;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_adapter_strategy_open_fsf_xport(struct zfcp_erp_action *act)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
int ret;
|
|
|
|
struct zfcp_adapter *adapter = act->adapter;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
write_lock_irq(&adapter->erp_lock);
|
|
|
|
zfcp_erp_action_to_running(act);
|
|
|
|
write_unlock_irq(&adapter->erp_lock);
|
|
|
|
|
|
|
|
ret = zfcp_fsf_exchange_port_data(act);
|
|
|
|
if (ret == -EOPNOTSUPP)
|
|
|
|
return ZFCP_ERP_SUCCEEDED;
|
|
|
|
if (ret)
|
|
|
|
return ZFCP_ERP_FAILED;
|
|
|
|
|
2010-12-02 22:16:12 +08:00
|
|
|
zfcp_dbf_rec_run("erasox1", act);
|
2009-08-18 21:43:25 +08:00
|
|
|
wait_event(adapter->erp_ready_wq,
|
|
|
|
!list_empty(&adapter->erp_ready_head));
|
2010-12-02 22:16:12 +08:00
|
|
|
zfcp_dbf_rec_run("erasox2", act);
|
2008-07-02 16:56:40 +08:00
|
|
|
if (act->status & ZFCP_STATUS_ERP_TIMEDOUT)
|
|
|
|
return ZFCP_ERP_FAILED;
|
|
|
|
|
|
|
|
return ZFCP_ERP_SUCCEEDED;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_adapter_strategy_open_fsf(struct zfcp_erp_action *act)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
if (zfcp_erp_adapter_strat_fsf_xconf(act) == ZFCP_ERP_FAILED)
|
|
|
|
return ZFCP_ERP_FAILED;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
if (zfcp_erp_adapter_strategy_open_fsf_xport(act) == ZFCP_ERP_FAILED)
|
|
|
|
return ZFCP_ERP_FAILED;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2011-02-23 02:54:40 +08:00
|
|
|
if (mempool_resize(act->adapter->pool.sr_data,
|
2015-04-15 06:48:21 +08:00
|
|
|
act->adapter->stat_read_buf_num))
|
2010-06-21 16:11:33 +08:00
|
|
|
return ZFCP_ERP_FAILED;
|
|
|
|
|
|
|
|
if (mempool_resize(act->adapter->pool.status_read_req,
|
2015-04-15 06:48:21 +08:00
|
|
|
act->adapter->stat_read_buf_num))
|
2010-06-21 16:11:33 +08:00
|
|
|
return ZFCP_ERP_FAILED;
|
|
|
|
|
2010-05-01 00:09:36 +08:00
|
|
|
atomic_set(&act->adapter->stat_miss, act->adapter->stat_read_buf_num);
|
2008-07-02 16:56:40 +08:00
|
|
|
if (zfcp_status_read_refill(act->adapter))
|
|
|
|
return ZFCP_ERP_FAILED;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
return ZFCP_ERP_SUCCEEDED;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2009-03-02 20:09:03 +08:00
|
|
|
static void zfcp_erp_adapter_strategy_close(struct zfcp_erp_action *act)
|
2008-07-02 16:56:40 +08:00
|
|
|
{
|
|
|
|
struct zfcp_adapter *adapter = act->adapter;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
/* close queues to ensure that buffers are not accessed by adapter */
|
2009-08-18 21:43:19 +08:00
|
|
|
zfcp_qdio_close(adapter->qdio);
|
2008-07-02 16:56:40 +08:00
|
|
|
zfcp_fsf_req_dismiss_all(adapter);
|
|
|
|
adapter->fsf_req_seq_no = 0;
|
2009-08-18 21:43:12 +08:00
|
|
|
zfcp_fc_wka_ports_force_offline(adapter->gs);
|
2010-09-08 20:39:55 +08:00
|
|
|
/* all ports and LUNs are closed */
|
2010-09-08 20:40:01 +08:00
|
|
|
zfcp_erp_clear_adapter_status(adapter, ZFCP_STATUS_COMMON_OPEN);
|
2009-03-02 20:09:03 +08:00
|
|
|
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(ZFCP_STATUS_ADAPTER_XCONFIG_OK |
|
2009-03-02 20:09:03 +08:00
|
|
|
ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED, &adapter->status);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2009-03-02 20:09:03 +08:00
|
|
|
static int zfcp_erp_adapter_strategy_open(struct zfcp_erp_action *act)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2009-03-02 20:09:03 +08:00
|
|
|
struct zfcp_adapter *adapter = act->adapter;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2010-12-02 22:16:17 +08:00
|
|
|
if (zfcp_qdio_open(adapter->qdio)) {
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(ZFCP_STATUS_ADAPTER_XCONFIG_OK |
|
2009-03-02 20:09:03 +08:00
|
|
|
ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED,
|
|
|
|
&adapter->status);
|
|
|
|
return ZFCP_ERP_FAILED;
|
|
|
|
}
|
2008-07-02 16:56:40 +08:00
|
|
|
|
2009-03-02 20:09:03 +08:00
|
|
|
if (zfcp_erp_adapter_strategy_open_fsf(act)) {
|
|
|
|
zfcp_erp_adapter_strategy_close(act);
|
|
|
|
return ZFCP_ERP_FAILED;
|
|
|
|
}
|
|
|
|
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_or(ZFCP_STATUS_COMMON_OPEN, &adapter->status);
|
2009-03-02 20:09:03 +08:00
|
|
|
|
|
|
|
return ZFCP_ERP_SUCCEEDED;
|
|
|
|
}
|
2008-07-02 16:56:40 +08:00
|
|
|
|
2009-03-02 20:09:03 +08:00
|
|
|
static int zfcp_erp_adapter_strategy(struct zfcp_erp_action *act)
|
|
|
|
{
|
|
|
|
struct zfcp_adapter *adapter = act->adapter;
|
|
|
|
|
|
|
|
if (atomic_read(&adapter->status) & ZFCP_STATUS_COMMON_OPEN) {
|
|
|
|
zfcp_erp_adapter_strategy_close(act);
|
|
|
|
if (act->status & ZFCP_STATUS_ERP_CLOSE_ONLY)
|
|
|
|
return ZFCP_ERP_EXIT;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (zfcp_erp_adapter_strategy_open(act)) {
|
2008-07-02 16:56:40 +08:00
|
|
|
ssleep(8);
|
2009-03-02 20:09:03 +08:00
|
|
|
return ZFCP_ERP_FAILED;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2009-03-02 20:09:03 +08:00
|
|
|
return ZFCP_ERP_SUCCEEDED;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_port_forced_strategy_close(struct zfcp_erp_action *act)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
int retval;
|
|
|
|
|
|
|
|
retval = zfcp_fsf_close_physical_port(act);
|
|
|
|
if (retval == -ENOMEM)
|
|
|
|
return ZFCP_ERP_NOMEM;
|
|
|
|
act->step = ZFCP_ERP_STEP_PHYS_PORT_CLOSING;
|
|
|
|
if (retval)
|
|
|
|
return ZFCP_ERP_FAILED;
|
|
|
|
|
|
|
|
return ZFCP_ERP_CONTINUES;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_port_forced_strategy(struct zfcp_erp_action *erp_action)
|
|
|
|
{
|
|
|
|
struct zfcp_port *port = erp_action->port;
|
|
|
|
int status = atomic_read(&port->status);
|
|
|
|
|
|
|
|
switch (erp_action->step) {
|
|
|
|
case ZFCP_ERP_STEP_UNINITIALIZED:
|
|
|
|
if ((status & ZFCP_STATUS_PORT_PHYS_OPEN) &&
|
|
|
|
(status & ZFCP_STATUS_COMMON_OPEN))
|
|
|
|
return zfcp_erp_port_forced_strategy_close(erp_action);
|
|
|
|
else
|
|
|
|
return ZFCP_ERP_FAILED;
|
|
|
|
|
|
|
|
case ZFCP_ERP_STEP_PHYS_PORT_CLOSING:
|
2009-07-13 21:06:08 +08:00
|
|
|
if (!(status & ZFCP_STATUS_PORT_PHYS_OPEN))
|
2008-07-02 16:56:40 +08:00
|
|
|
return ZFCP_ERP_SUCCEEDED;
|
|
|
|
}
|
|
|
|
return ZFCP_ERP_FAILED;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_port_strategy_close(struct zfcp_erp_action *erp_action)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
int retval;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
retval = zfcp_fsf_close_port(erp_action);
|
|
|
|
if (retval == -ENOMEM)
|
|
|
|
return ZFCP_ERP_NOMEM;
|
|
|
|
erp_action->step = ZFCP_ERP_STEP_PORT_CLOSING;
|
|
|
|
if (retval)
|
|
|
|
return ZFCP_ERP_FAILED;
|
|
|
|
return ZFCP_ERP_CONTINUES;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_port_strategy_open_port(struct zfcp_erp_action *erp_action)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
int retval;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
retval = zfcp_fsf_open_port(erp_action);
|
|
|
|
if (retval == -ENOMEM)
|
|
|
|
return ZFCP_ERP_NOMEM;
|
|
|
|
erp_action->step = ZFCP_ERP_STEP_PORT_OPENING;
|
|
|
|
if (retval)
|
|
|
|
return ZFCP_ERP_FAILED;
|
|
|
|
return ZFCP_ERP_CONTINUES;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_open_ptp_port(struct zfcp_erp_action *act)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
struct zfcp_adapter *adapter = act->adapter;
|
|
|
|
struct zfcp_port *port = act->port;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
if (port->wwpn != adapter->peer_wwpn) {
|
2010-09-08 20:40:01 +08:00
|
|
|
zfcp_erp_set_port_status(port, ZFCP_STATUS_COMMON_ERP_FAILED);
|
2008-07-02 16:56:40 +08:00
|
|
|
return ZFCP_ERP_FAILED;
|
|
|
|
}
|
|
|
|
port->d_id = adapter->peer_d_id;
|
|
|
|
return zfcp_erp_port_strategy_open_port(act);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int zfcp_erp_port_strategy_open_common(struct zfcp_erp_action *act)
|
|
|
|
{
|
|
|
|
struct zfcp_adapter *adapter = act->adapter;
|
|
|
|
struct zfcp_port *port = act->port;
|
|
|
|
int p_status = atomic_read(&port->status);
|
|
|
|
|
|
|
|
switch (act->step) {
|
|
|
|
case ZFCP_ERP_STEP_UNINITIALIZED:
|
|
|
|
case ZFCP_ERP_STEP_PHYS_PORT_CLOSING:
|
|
|
|
case ZFCP_ERP_STEP_PORT_CLOSING:
|
|
|
|
if (fc_host_port_type(adapter->scsi_host) == FC_PORTTYPE_PTP)
|
|
|
|
return zfcp_erp_open_ptp_port(act);
|
2008-12-19 23:56:59 +08:00
|
|
|
if (!port->d_id) {
|
2009-10-14 17:00:43 +08:00
|
|
|
zfcp_fc_trigger_did_lookup(port);
|
2009-08-18 21:43:20 +08:00
|
|
|
return ZFCP_ERP_EXIT;
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
|
|
|
return zfcp_erp_port_strategy_open_port(act);
|
|
|
|
|
|
|
|
case ZFCP_ERP_STEP_PORT_OPENING:
|
|
|
|
/* D_ID might have changed during open */
|
2008-10-01 18:42:17 +08:00
|
|
|
if (p_status & ZFCP_STATUS_COMMON_OPEN) {
|
2009-10-14 17:00:43 +08:00
|
|
|
if (!port->d_id) {
|
|
|
|
zfcp_fc_trigger_did_lookup(port);
|
|
|
|
return ZFCP_ERP_EXIT;
|
2008-10-01 18:42:17 +08:00
|
|
|
}
|
2009-10-14 17:00:43 +08:00
|
|
|
return ZFCP_ERP_SUCCEEDED;
|
2008-10-01 18:42:17 +08:00
|
|
|
}
|
2009-05-15 19:18:20 +08:00
|
|
|
if (port->d_id && !(p_status & ZFCP_STATUS_COMMON_NOESC)) {
|
|
|
|
port->d_id = 0;
|
2010-07-08 15:53:10 +08:00
|
|
|
return ZFCP_ERP_FAILED;
|
2009-05-15 19:18:20 +08:00
|
|
|
}
|
|
|
|
/* fall through otherwise */
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
|
|
|
return ZFCP_ERP_FAILED;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int zfcp_erp_port_strategy(struct zfcp_erp_action *erp_action)
|
|
|
|
{
|
|
|
|
struct zfcp_port *port = erp_action->port;
|
2009-10-14 17:00:43 +08:00
|
|
|
int p_status = atomic_read(&port->status);
|
2008-07-02 16:56:40 +08:00
|
|
|
|
2009-10-14 17:00:43 +08:00
|
|
|
if ((p_status & ZFCP_STATUS_COMMON_NOESC) &&
|
|
|
|
!(p_status & ZFCP_STATUS_COMMON_OPEN))
|
2008-10-01 18:42:17 +08:00
|
|
|
goto close_init_done;
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
switch (erp_action->step) {
|
|
|
|
case ZFCP_ERP_STEP_UNINITIALIZED:
|
2009-10-14 17:00:43 +08:00
|
|
|
if (p_status & ZFCP_STATUS_COMMON_OPEN)
|
2008-07-02 16:56:40 +08:00
|
|
|
return zfcp_erp_port_strategy_close(erp_action);
|
2005-04-17 06:20:36 +08:00
|
|
|
break;
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
case ZFCP_ERP_STEP_PORT_CLOSING:
|
2009-10-14 17:00:43 +08:00
|
|
|
if (p_status & ZFCP_STATUS_COMMON_OPEN)
|
2008-07-02 16:56:40 +08:00
|
|
|
return ZFCP_ERP_FAILED;
|
2005-04-17 06:20:36 +08:00
|
|
|
break;
|
|
|
|
}
|
2008-10-01 18:42:17 +08:00
|
|
|
|
|
|
|
close_init_done:
|
2008-07-02 16:56:40 +08:00
|
|
|
if (erp_action->status & ZFCP_STATUS_ERP_CLOSE_ONLY)
|
|
|
|
return ZFCP_ERP_EXIT;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-10-01 18:42:17 +08:00
|
|
|
return zfcp_erp_port_strategy_open_common(erp_action);
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
static void zfcp_erp_lun_strategy_clearstati(struct scsi_device *sdev)
|
2008-07-02 16:56:40 +08:00
|
|
|
{
|
2010-09-08 20:39:55 +08:00
|
|
|
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev);
|
|
|
|
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(ZFCP_STATUS_COMMON_ACCESS_DENIED,
|
2010-09-08 20:39:55 +08:00
|
|
|
&zfcp_sdev->status);
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
static int zfcp_erp_lun_strategy_close(struct zfcp_erp_action *erp_action)
|
2008-07-02 16:56:40 +08:00
|
|
|
{
|
2010-09-08 20:39:55 +08:00
|
|
|
int retval = zfcp_fsf_close_lun(erp_action);
|
2008-07-02 16:56:40 +08:00
|
|
|
if (retval == -ENOMEM)
|
|
|
|
return ZFCP_ERP_NOMEM;
|
2010-09-08 20:39:55 +08:00
|
|
|
erp_action->step = ZFCP_ERP_STEP_LUN_CLOSING;
|
2008-07-02 16:56:40 +08:00
|
|
|
if (retval)
|
|
|
|
return ZFCP_ERP_FAILED;
|
|
|
|
return ZFCP_ERP_CONTINUES;
|
|
|
|
}
|
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
static int zfcp_erp_lun_strategy_open(struct zfcp_erp_action *erp_action)
|
2008-07-02 16:56:40 +08:00
|
|
|
{
|
2010-09-08 20:39:55 +08:00
|
|
|
int retval = zfcp_fsf_open_lun(erp_action);
|
2008-07-02 16:56:40 +08:00
|
|
|
if (retval == -ENOMEM)
|
|
|
|
return ZFCP_ERP_NOMEM;
|
2010-09-08 20:39:55 +08:00
|
|
|
erp_action->step = ZFCP_ERP_STEP_LUN_OPENING;
|
2008-07-02 16:56:40 +08:00
|
|
|
if (retval)
|
|
|
|
return ZFCP_ERP_FAILED;
|
|
|
|
return ZFCP_ERP_CONTINUES;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
static int zfcp_erp_lun_strategy(struct zfcp_erp_action *erp_action)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2010-09-08 20:39:55 +08:00
|
|
|
struct scsi_device *sdev = erp_action->sdev;
|
|
|
|
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev);
|
2008-07-02 16:56:40 +08:00
|
|
|
|
|
|
|
switch (erp_action->step) {
|
|
|
|
case ZFCP_ERP_STEP_UNINITIALIZED:
|
2010-09-08 20:39:55 +08:00
|
|
|
zfcp_erp_lun_strategy_clearstati(sdev);
|
|
|
|
if (atomic_read(&zfcp_sdev->status) & ZFCP_STATUS_COMMON_OPEN)
|
|
|
|
return zfcp_erp_lun_strategy_close(erp_action);
|
2008-07-02 16:56:40 +08:00
|
|
|
/* already closed, fall through */
|
2010-09-08 20:39:55 +08:00
|
|
|
case ZFCP_ERP_STEP_LUN_CLOSING:
|
|
|
|
if (atomic_read(&zfcp_sdev->status) & ZFCP_STATUS_COMMON_OPEN)
|
2008-07-02 16:56:40 +08:00
|
|
|
return ZFCP_ERP_FAILED;
|
|
|
|
if (erp_action->status & ZFCP_STATUS_ERP_CLOSE_ONLY)
|
|
|
|
return ZFCP_ERP_EXIT;
|
2010-09-08 20:39:55 +08:00
|
|
|
return zfcp_erp_lun_strategy_open(erp_action);
|
2008-07-02 16:56:40 +08:00
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
case ZFCP_ERP_STEP_LUN_OPENING:
|
|
|
|
if (atomic_read(&zfcp_sdev->status) & ZFCP_STATUS_COMMON_OPEN)
|
2008-07-02 16:56:40 +08:00
|
|
|
return ZFCP_ERP_SUCCEEDED;
|
|
|
|
}
|
|
|
|
return ZFCP_ERP_FAILED;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
static int zfcp_erp_strategy_check_lun(struct scsi_device *sdev, int result)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2010-09-08 20:39:55 +08:00
|
|
|
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev);
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
switch (result) {
|
|
|
|
case ZFCP_ERP_SUCCEEDED :
|
2010-09-08 20:39:55 +08:00
|
|
|
atomic_set(&zfcp_sdev->erp_counter, 0);
|
|
|
|
zfcp_erp_lun_unblock(sdev);
|
2005-04-17 06:20:36 +08:00
|
|
|
break;
|
|
|
|
case ZFCP_ERP_FAILED :
|
2010-09-08 20:39:55 +08:00
|
|
|
atomic_inc(&zfcp_sdev->erp_counter);
|
|
|
|
if (atomic_read(&zfcp_sdev->erp_counter) > ZFCP_MAX_ERPS) {
|
|
|
|
dev_err(&zfcp_sdev->port->adapter->ccw_device->dev,
|
|
|
|
"ERP failed for LUN 0x%016Lx on "
|
2008-10-01 18:42:15 +08:00
|
|
|
"port 0x%016Lx\n",
|
2010-09-08 20:39:55 +08:00
|
|
|
(unsigned long long)zfcp_scsi_dev_lun(sdev),
|
|
|
|
(unsigned long long)zfcp_sdev->port->wwpn);
|
2010-09-08 20:40:01 +08:00
|
|
|
zfcp_erp_set_lun_status(sdev,
|
|
|
|
ZFCP_STATUS_COMMON_ERP_FAILED);
|
2008-10-01 18:42:15 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
if (atomic_read(&zfcp_sdev->status) & ZFCP_STATUS_COMMON_ERP_FAILED) {
|
|
|
|
zfcp_erp_lun_block(sdev, 0);
|
2005-04-17 06:20:36 +08:00
|
|
|
result = ZFCP_ERP_EXIT;
|
|
|
|
}
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_strategy_check_port(struct zfcp_port *port, int result)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
switch (result) {
|
|
|
|
case ZFCP_ERP_SUCCEEDED :
|
|
|
|
atomic_set(&port->erp_counter, 0);
|
|
|
|
zfcp_erp_port_unblock(port);
|
|
|
|
break;
|
2008-07-02 16:56:40 +08:00
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
case ZFCP_ERP_FAILED :
|
2008-07-02 16:56:40 +08:00
|
|
|
if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_NOESC) {
|
2008-06-11 00:21:00 +08:00
|
|
|
zfcp_erp_port_block(port, 0);
|
|
|
|
result = ZFCP_ERP_EXIT;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
atomic_inc(&port->erp_counter);
|
2008-10-01 18:42:15 +08:00
|
|
|
if (atomic_read(&port->erp_counter) > ZFCP_MAX_ERPS) {
|
|
|
|
dev_err(&port->adapter->ccw_device->dev,
|
|
|
|
"ERP failed for remote port 0x%016Lx\n",
|
2008-10-01 18:42:18 +08:00
|
|
|
(unsigned long long)port->wwpn);
|
2010-09-08 20:40:01 +08:00
|
|
|
zfcp_erp_set_port_status(port,
|
|
|
|
ZFCP_STATUS_COMMON_ERP_FAILED);
|
2008-10-01 18:42:15 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_ERP_FAILED) {
|
|
|
|
zfcp_erp_port_block(port, 0);
|
2005-04-17 06:20:36 +08:00
|
|
|
result = ZFCP_ERP_EXIT;
|
|
|
|
}
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_strategy_check_adapter(struct zfcp_adapter *adapter,
|
|
|
|
int result)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
switch (result) {
|
|
|
|
case ZFCP_ERP_SUCCEEDED :
|
|
|
|
atomic_set(&adapter->erp_counter, 0);
|
|
|
|
zfcp_erp_adapter_unblock(adapter);
|
|
|
|
break;
|
2008-07-02 16:56:40 +08:00
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
case ZFCP_ERP_FAILED :
|
|
|
|
atomic_inc(&adapter->erp_counter);
|
2008-10-01 18:42:15 +08:00
|
|
|
if (atomic_read(&adapter->erp_counter) > ZFCP_MAX_ERPS) {
|
|
|
|
dev_err(&adapter->ccw_device->dev,
|
|
|
|
"ERP cannot recover an error "
|
|
|
|
"on the FCP device\n");
|
2010-09-08 20:40:01 +08:00
|
|
|
zfcp_erp_set_adapter_status(adapter,
|
|
|
|
ZFCP_STATUS_COMMON_ERP_FAILED);
|
2008-10-01 18:42:15 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
if (atomic_read(&adapter->status) & ZFCP_STATUS_COMMON_ERP_FAILED) {
|
|
|
|
zfcp_erp_adapter_block(adapter, 0);
|
2005-04-17 06:20:36 +08:00
|
|
|
result = ZFCP_ERP_EXIT;
|
|
|
|
}
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_strategy_check_target(struct zfcp_erp_action *erp_action,
|
|
|
|
int result)
|
2007-05-08 17:16:52 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
struct zfcp_adapter *adapter = erp_action->adapter;
|
|
|
|
struct zfcp_port *port = erp_action->port;
|
2010-09-08 20:39:55 +08:00
|
|
|
struct scsi_device *sdev = erp_action->sdev;
|
2008-07-02 16:56:40 +08:00
|
|
|
|
|
|
|
switch (erp_action->action) {
|
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
case ZFCP_ERP_ACTION_REOPEN_LUN:
|
|
|
|
result = zfcp_erp_strategy_check_lun(sdev, result);
|
2008-07-02 16:56:40 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED:
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT:
|
|
|
|
result = zfcp_erp_strategy_check_port(port, result);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_ADAPTER:
|
|
|
|
result = zfcp_erp_strategy_check_adapter(adapter, result);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
return result;
|
2007-05-08 17:16:52 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_strat_change_det(atomic_t *target_status, u32 erp_status)
|
2007-05-08 17:16:52 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
int status = atomic_read(target_status);
|
2007-05-08 17:16:52 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
if ((status & ZFCP_STATUS_COMMON_RUNNING) &&
|
|
|
|
(erp_status & ZFCP_STATUS_ERP_CLOSE_ONLY))
|
|
|
|
return 1; /* take it online */
|
2007-05-08 17:16:52 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
if (!(status & ZFCP_STATUS_COMMON_RUNNING) &&
|
|
|
|
!(erp_status & ZFCP_STATUS_ERP_CLOSE_ONLY))
|
|
|
|
return 1; /* take it offline */
|
|
|
|
|
|
|
|
return 0;
|
2007-05-08 17:16:52 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_strategy_statechange(struct zfcp_erp_action *act, int ret)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
int action = act->action;
|
|
|
|
struct zfcp_adapter *adapter = act->adapter;
|
|
|
|
struct zfcp_port *port = act->port;
|
2010-09-08 20:39:55 +08:00
|
|
|
struct scsi_device *sdev = act->sdev;
|
|
|
|
struct zfcp_scsi_dev *zfcp_sdev;
|
2008-07-02 16:56:40 +08:00
|
|
|
u32 erp_status = act->status;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
switch (action) {
|
2005-04-17 06:20:36 +08:00
|
|
|
case ZFCP_ERP_ACTION_REOPEN_ADAPTER:
|
2008-07-02 16:56:40 +08:00
|
|
|
if (zfcp_erp_strat_change_det(&adapter->status, erp_status)) {
|
|
|
|
_zfcp_erp_adapter_reopen(adapter,
|
|
|
|
ZFCP_STATUS_COMMON_ERP_FAILED,
|
2010-12-02 22:16:16 +08:00
|
|
|
"ersscg1");
|
2008-07-02 16:56:40 +08:00
|
|
|
return ZFCP_ERP_EXIT;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED:
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT:
|
2008-07-02 16:56:40 +08:00
|
|
|
if (zfcp_erp_strat_change_det(&port->status, erp_status)) {
|
|
|
|
_zfcp_erp_port_reopen(port,
|
|
|
|
ZFCP_STATUS_COMMON_ERP_FAILED,
|
2010-12-02 22:16:16 +08:00
|
|
|
"ersscg2");
|
2008-07-02 16:56:40 +08:00
|
|
|
return ZFCP_ERP_EXIT;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
break;
|
|
|
|
|
2010-09-08 20:39:55 +08:00
|
|
|
case ZFCP_ERP_ACTION_REOPEN_LUN:
|
|
|
|
zfcp_sdev = sdev_to_zfcp(sdev);
|
|
|
|
if (zfcp_erp_strat_change_det(&zfcp_sdev->status, erp_status)) {
|
|
|
|
_zfcp_erp_lun_reopen(sdev,
|
|
|
|
ZFCP_STATUS_COMMON_ERP_FAILED,
|
2010-12-02 22:16:16 +08:00
|
|
|
"ersscg3", 0);
|
2008-07-02 16:56:40 +08:00
|
|
|
return ZFCP_ERP_EXIT;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
break;
|
|
|
|
}
|
2008-07-02 16:56:40 +08:00
|
|
|
return ret;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static void zfcp_erp_action_dequeue(struct zfcp_erp_action *erp_action)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
struct zfcp_adapter *adapter = erp_action->adapter;
|
2010-09-08 20:39:55 +08:00
|
|
|
struct zfcp_scsi_dev *zfcp_sdev;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
adapter->erp_total_count--;
|
|
|
|
if (erp_action->status & ZFCP_STATUS_ERP_LOWMEM) {
|
|
|
|
adapter->erp_low_mem_count--;
|
|
|
|
erp_action->status &= ~ZFCP_STATUS_ERP_LOWMEM;
|
2008-03-27 21:22:05 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
list_del(&erp_action->list);
|
2010-12-02 22:16:12 +08:00
|
|
|
zfcp_dbf_rec_run("eractd1", erp_action);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
switch (erp_action->action) {
|
2010-09-08 20:39:55 +08:00
|
|
|
case ZFCP_ERP_ACTION_REOPEN_LUN:
|
|
|
|
zfcp_sdev = sdev_to_zfcp(erp_action->sdev);
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(ZFCP_STATUS_COMMON_ERP_INUSE,
|
2010-09-08 20:39:55 +08:00
|
|
|
&zfcp_sdev->status);
|
2008-07-02 16:56:40 +08:00
|
|
|
break;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED:
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT:
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(ZFCP_STATUS_COMMON_ERP_INUSE,
|
2008-07-02 16:56:40 +08:00
|
|
|
&erp_action->port->status);
|
|
|
|
break;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
case ZFCP_ERP_ACTION_REOPEN_ADAPTER:
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(ZFCP_STATUS_COMMON_ERP_INUSE,
|
2008-07-02 16:56:40 +08:00
|
|
|
&erp_action->adapter->status);
|
|
|
|
break;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
scsi: zfcp: fix rport unblock race with LUN recovery
It is unavoidable that zfcp_scsi_queuecommand() has to finish requests
with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time
window when zfcp detected an unavailable rport but
fc_remote_port_delete(), which is asynchronous via
zfcp_scsi_schedule_rport_block(), has not yet blocked the rport.
However, for the case when the rport becomes available again, we should
prevent unblocking the rport too early. In contrast to other FCP LLDDs,
zfcp has to open each LUN with the FCP channel hardware before it can
send I/O to a LUN. So if a port already has LUNs attached and we
unblock the rport just after port recovery, recoveries of LUNs behind
this port can still be pending which in turn force
zfcp_scsi_queuecommand() to unnecessarily finish requests with
DID_IMM_RETRY.
This also opens a time window with unblocked rport (until the followup
LUN reopen recovery has finished). If a scsi_cmnd timeout occurs during
this time window fc_timed_out() cannot work as desired and such command
would indeed time out and trigger scsi_eh. This prevents a clean and
timely path failover. This should not happen if the path issue can be
recovered on FC transport layer such as path issues involving RSCNs.
Fix this by only calling zfcp_scsi_schedule_rport_register(), to
asynchronously trigger fc_remote_port_add(), after all LUN recoveries as
children of the rport have finished and no new recoveries of equal or
higher order were triggered meanwhile. Finished intentionally includes
any recovery result no matter if successful or failed (still unblock
rport so other successful LUNs work). For simplicity, we check after
each finished LUN recovery if there is another LUN recovery pending on
the same port and then do nothing. We handle the special case of a
successful recovery of a port without LUN children the same way without
changing this case's semantics.
For debugging we introduce 2 new trace records written if the rport
unblock attempt was aborted due to still unfinished or freshly triggered
recovery. The records are only written above the default trace level.
Benjamin noticed the important special case of new recovery that can be
triggered between having given up the erp_lock and before calling
zfcp_erp_action_cleanup() within zfcp_erp_strategy(). We must avoid the
following sequence:
ERP thread rport_work other context
------------------------- -------------- --------------------------------
port is unblocked, rport still blocked,
due to pending/running ERP action,
so ((port->status & ...UNBLOCK) != 0)
and (port->rport == NULL)
unlock ERP
zfcp_erp_action_cleanup()
case ZFCP_ERP_ACTION_REOPEN_LUN:
zfcp_erp_try_rport_unblock()
((status & ...UNBLOCK) != 0) [OLD!]
zfcp_erp_port_reopen()
lock ERP
zfcp_erp_port_block()
port->status clear ...UNBLOCK
unlock ERP
zfcp_scsi_schedule_rport_block()
port->rport_task = RPORT_DEL
queue_work(rport_work)
zfcp_scsi_rport_work()
(port->rport_task != RPORT_ADD)
port->rport_task = RPORT_NONE
zfcp_scsi_rport_block()
if (!port->rport) return
zfcp_scsi_schedule_rport_register()
port->rport_task = RPORT_ADD
queue_work(rport_work)
zfcp_scsi_rport_work()
(port->rport_task == RPORT_ADD)
port->rport_task = RPORT_NONE
zfcp_scsi_rport_register()
(port->rport == NULL)
rport = fc_remote_port_add()
port->rport = rport;
Now the rport was erroneously unblocked while the zfcp_port is blocked.
This is another situation we want to avoid due to scsi_eh
potential. This state would at least remain until the new recovery from
the other context finished successfully, or potentially forever if it
failed. In order to close this race, we take the erp_lock inside
zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or
LUN. With that, the possible corresponding rport state sequences would
be: (unblock[ERP thread],block[other context]) if the ERP thread gets
erp_lock first and still sees ((port->status & ...UNBLOCK) != 0),
(block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock
after the other context has already cleard ...UNBLOCK from port->status.
Since checking fields of struct erp_action is unsafe because they could
have been overwritten (re-used for new recovery) meanwhile, we only
check status of zfcp_port and LUN since these are only changed under
erp_lock elsewhere. Regarding the check of the proper status flags (port
or port_forced are similar to the shown adapter recovery):
[zfcp_erp_adapter_shutdown()]
zfcp_erp_adapter_reopen()
zfcp_erp_adapter_block()
* clear UNBLOCK ---------------------------------------+
zfcp_scsi_schedule_rports_block() |
write_lock_irqsave(&adapter->erp_lock, flags);-------+ |
zfcp_erp_action_enqueue() | |
zfcp_erp_setup_act() | |
* set ERP_INUSE -----------------------------------|--|--+
write_unlock_irqrestore(&adapter->erp_lock, flags);--+ | |
.context-switch. | |
zfcp_erp_thread() | |
zfcp_erp_strategy() | |
write_lock_irqsave(&adapter->erp_lock, flags);------+ | |
... | | |
zfcp_erp_strategy_check_target() | | |
zfcp_erp_strategy_check_adapter() | | |
zfcp_erp_adapter_unblock() | | |
* set UNBLOCK -----------------------------------|--+ |
zfcp_erp_action_dequeue() | |
* clear ERP_INUSE ---------------------------------|-----+
... |
write_unlock_irqrestore(&adapter->erp_lock, flags);-+
Hence, we should check for both UNBLOCK and ERP_INUSE because they are
interleaved. Also we need to explicitly check ERP_FAILED for the link
down case which currently does not clear the UNBLOCK flag in
zfcp_fsf_link_down_info_eval().
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 8830271c4819 ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport")
Fixes: a2fa0aede07c ("[SCSI] zfcp: Block FC transport rports early on errors")
Fixes: 5f852be9e11d ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI")
Fixes: 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable")
Fixes: 3859f6a248cb ("[PATCH] zfcp: add rports to enable scsi_add_device to work again")
Cc: <stable@vger.kernel.org> #2.6.32+
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-12-10 00:16:33 +08:00
|
|
|
/**
|
|
|
|
* zfcp_erp_try_rport_unblock - unblock rport if no more/new recovery
|
|
|
|
* @port: zfcp_port whose fc_rport we should try to unblock
|
|
|
|
*/
|
|
|
|
static void zfcp_erp_try_rport_unblock(struct zfcp_port *port)
|
|
|
|
{
|
|
|
|
unsigned long flags;
|
|
|
|
struct zfcp_adapter *adapter = port->adapter;
|
|
|
|
int port_status;
|
|
|
|
struct Scsi_Host *shost = adapter->scsi_host;
|
|
|
|
struct scsi_device *sdev;
|
|
|
|
|
|
|
|
write_lock_irqsave(&adapter->erp_lock, flags);
|
|
|
|
port_status = atomic_read(&port->status);
|
|
|
|
if ((port_status & ZFCP_STATUS_COMMON_UNBLOCKED) == 0 ||
|
|
|
|
(port_status & (ZFCP_STATUS_COMMON_ERP_INUSE |
|
|
|
|
ZFCP_STATUS_COMMON_ERP_FAILED)) != 0) {
|
|
|
|
/* new ERP of severity >= port triggered elsewhere meanwhile or
|
|
|
|
* local link down (adapter erp_failed but not clear unblock)
|
|
|
|
*/
|
|
|
|
zfcp_dbf_rec_run_lvl(4, "ertru_p", &port->erp_action);
|
|
|
|
write_unlock_irqrestore(&adapter->erp_lock, flags);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
spin_lock(shost->host_lock);
|
|
|
|
__shost_for_each_device(sdev, shost) {
|
|
|
|
struct zfcp_scsi_dev *zsdev = sdev_to_zfcp(sdev);
|
|
|
|
int lun_status;
|
|
|
|
|
|
|
|
if (zsdev->port != port)
|
|
|
|
continue;
|
|
|
|
/* LUN under port of interest */
|
|
|
|
lun_status = atomic_read(&zsdev->status);
|
|
|
|
if ((lun_status & ZFCP_STATUS_COMMON_ERP_FAILED) != 0)
|
|
|
|
continue; /* unblock rport despite failed LUNs */
|
|
|
|
/* LUN recovery not given up yet [maybe follow-up pending] */
|
|
|
|
if ((lun_status & ZFCP_STATUS_COMMON_UNBLOCKED) == 0 ||
|
|
|
|
(lun_status & ZFCP_STATUS_COMMON_ERP_INUSE) != 0) {
|
|
|
|
/* LUN blocked:
|
|
|
|
* not yet unblocked [LUN recovery pending]
|
|
|
|
* or meanwhile blocked [new LUN recovery triggered]
|
|
|
|
*/
|
|
|
|
zfcp_dbf_rec_run_lvl(4, "ertru_l", &zsdev->erp_action);
|
|
|
|
spin_unlock(shost->host_lock);
|
|
|
|
write_unlock_irqrestore(&adapter->erp_lock, flags);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
/* now port has no child or all children have completed recovery,
|
|
|
|
* and no ERP of severity >= port was meanwhile triggered elsewhere
|
|
|
|
*/
|
|
|
|
zfcp_scsi_schedule_rport_register(port);
|
|
|
|
spin_unlock(shost->host_lock);
|
|
|
|
write_unlock_irqrestore(&adapter->erp_lock, flags);
|
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static void zfcp_erp_action_cleanup(struct zfcp_erp_action *act, int result)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
struct zfcp_adapter *adapter = act->adapter;
|
|
|
|
struct zfcp_port *port = act->port;
|
2010-09-08 20:39:55 +08:00
|
|
|
struct scsi_device *sdev = act->sdev;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
switch (act->action) {
|
2010-09-08 20:39:55 +08:00
|
|
|
case ZFCP_ERP_ACTION_REOPEN_LUN:
|
2010-09-08 20:39:54 +08:00
|
|
|
if (!(act->status & ZFCP_STATUS_ERP_NO_REF))
|
2010-09-08 20:39:55 +08:00
|
|
|
scsi_device_put(sdev);
|
scsi: zfcp: fix rport unblock race with LUN recovery
It is unavoidable that zfcp_scsi_queuecommand() has to finish requests
with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time
window when zfcp detected an unavailable rport but
fc_remote_port_delete(), which is asynchronous via
zfcp_scsi_schedule_rport_block(), has not yet blocked the rport.
However, for the case when the rport becomes available again, we should
prevent unblocking the rport too early. In contrast to other FCP LLDDs,
zfcp has to open each LUN with the FCP channel hardware before it can
send I/O to a LUN. So if a port already has LUNs attached and we
unblock the rport just after port recovery, recoveries of LUNs behind
this port can still be pending which in turn force
zfcp_scsi_queuecommand() to unnecessarily finish requests with
DID_IMM_RETRY.
This also opens a time window with unblocked rport (until the followup
LUN reopen recovery has finished). If a scsi_cmnd timeout occurs during
this time window fc_timed_out() cannot work as desired and such command
would indeed time out and trigger scsi_eh. This prevents a clean and
timely path failover. This should not happen if the path issue can be
recovered on FC transport layer such as path issues involving RSCNs.
Fix this by only calling zfcp_scsi_schedule_rport_register(), to
asynchronously trigger fc_remote_port_add(), after all LUN recoveries as
children of the rport have finished and no new recoveries of equal or
higher order were triggered meanwhile. Finished intentionally includes
any recovery result no matter if successful or failed (still unblock
rport so other successful LUNs work). For simplicity, we check after
each finished LUN recovery if there is another LUN recovery pending on
the same port and then do nothing. We handle the special case of a
successful recovery of a port without LUN children the same way without
changing this case's semantics.
For debugging we introduce 2 new trace records written if the rport
unblock attempt was aborted due to still unfinished or freshly triggered
recovery. The records are only written above the default trace level.
Benjamin noticed the important special case of new recovery that can be
triggered between having given up the erp_lock and before calling
zfcp_erp_action_cleanup() within zfcp_erp_strategy(). We must avoid the
following sequence:
ERP thread rport_work other context
------------------------- -------------- --------------------------------
port is unblocked, rport still blocked,
due to pending/running ERP action,
so ((port->status & ...UNBLOCK) != 0)
and (port->rport == NULL)
unlock ERP
zfcp_erp_action_cleanup()
case ZFCP_ERP_ACTION_REOPEN_LUN:
zfcp_erp_try_rport_unblock()
((status & ...UNBLOCK) != 0) [OLD!]
zfcp_erp_port_reopen()
lock ERP
zfcp_erp_port_block()
port->status clear ...UNBLOCK
unlock ERP
zfcp_scsi_schedule_rport_block()
port->rport_task = RPORT_DEL
queue_work(rport_work)
zfcp_scsi_rport_work()
(port->rport_task != RPORT_ADD)
port->rport_task = RPORT_NONE
zfcp_scsi_rport_block()
if (!port->rport) return
zfcp_scsi_schedule_rport_register()
port->rport_task = RPORT_ADD
queue_work(rport_work)
zfcp_scsi_rport_work()
(port->rport_task == RPORT_ADD)
port->rport_task = RPORT_NONE
zfcp_scsi_rport_register()
(port->rport == NULL)
rport = fc_remote_port_add()
port->rport = rport;
Now the rport was erroneously unblocked while the zfcp_port is blocked.
This is another situation we want to avoid due to scsi_eh
potential. This state would at least remain until the new recovery from
the other context finished successfully, or potentially forever if it
failed. In order to close this race, we take the erp_lock inside
zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or
LUN. With that, the possible corresponding rport state sequences would
be: (unblock[ERP thread],block[other context]) if the ERP thread gets
erp_lock first and still sees ((port->status & ...UNBLOCK) != 0),
(block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock
after the other context has already cleard ...UNBLOCK from port->status.
Since checking fields of struct erp_action is unsafe because they could
have been overwritten (re-used for new recovery) meanwhile, we only
check status of zfcp_port and LUN since these are only changed under
erp_lock elsewhere. Regarding the check of the proper status flags (port
or port_forced are similar to the shown adapter recovery):
[zfcp_erp_adapter_shutdown()]
zfcp_erp_adapter_reopen()
zfcp_erp_adapter_block()
* clear UNBLOCK ---------------------------------------+
zfcp_scsi_schedule_rports_block() |
write_lock_irqsave(&adapter->erp_lock, flags);-------+ |
zfcp_erp_action_enqueue() | |
zfcp_erp_setup_act() | |
* set ERP_INUSE -----------------------------------|--|--+
write_unlock_irqrestore(&adapter->erp_lock, flags);--+ | |
.context-switch. | |
zfcp_erp_thread() | |
zfcp_erp_strategy() | |
write_lock_irqsave(&adapter->erp_lock, flags);------+ | |
... | | |
zfcp_erp_strategy_check_target() | | |
zfcp_erp_strategy_check_adapter() | | |
zfcp_erp_adapter_unblock() | | |
* set UNBLOCK -----------------------------------|--+ |
zfcp_erp_action_dequeue() | |
* clear ERP_INUSE ---------------------------------|-----+
... |
write_unlock_irqrestore(&adapter->erp_lock, flags);-+
Hence, we should check for both UNBLOCK and ERP_INUSE because they are
interleaved. Also we need to explicitly check ERP_FAILED for the link
down case which currently does not clear the UNBLOCK flag in
zfcp_fsf_link_down_info_eval().
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 8830271c4819 ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport")
Fixes: a2fa0aede07c ("[SCSI] zfcp: Block FC transport rports early on errors")
Fixes: 5f852be9e11d ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI")
Fixes: 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable")
Fixes: 3859f6a248cb ("[PATCH] zfcp: add rports to enable scsi_add_device to work again")
Cc: <stable@vger.kernel.org> #2.6.32+
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-12-10 00:16:33 +08:00
|
|
|
zfcp_erp_try_rport_unblock(port);
|
2008-07-02 16:56:40 +08:00
|
|
|
break;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT:
|
2016-08-11 00:30:46 +08:00
|
|
|
/* This switch case might also happen after a forced reopen
|
|
|
|
* was successfully done and thus overwritten with a new
|
|
|
|
* non-forced reopen at `ersfs_2'. In this case, we must not
|
|
|
|
* do the clean-up of the non-forced version.
|
|
|
|
*/
|
|
|
|
if (act->step != ZFCP_ERP_STEP_UNINITIALIZED)
|
|
|
|
if (result == ZFCP_ERP_SUCCEEDED)
|
scsi: zfcp: fix rport unblock race with LUN recovery
It is unavoidable that zfcp_scsi_queuecommand() has to finish requests
with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time
window when zfcp detected an unavailable rport but
fc_remote_port_delete(), which is asynchronous via
zfcp_scsi_schedule_rport_block(), has not yet blocked the rport.
However, for the case when the rport becomes available again, we should
prevent unblocking the rport too early. In contrast to other FCP LLDDs,
zfcp has to open each LUN with the FCP channel hardware before it can
send I/O to a LUN. So if a port already has LUNs attached and we
unblock the rport just after port recovery, recoveries of LUNs behind
this port can still be pending which in turn force
zfcp_scsi_queuecommand() to unnecessarily finish requests with
DID_IMM_RETRY.
This also opens a time window with unblocked rport (until the followup
LUN reopen recovery has finished). If a scsi_cmnd timeout occurs during
this time window fc_timed_out() cannot work as desired and such command
would indeed time out and trigger scsi_eh. This prevents a clean and
timely path failover. This should not happen if the path issue can be
recovered on FC transport layer such as path issues involving RSCNs.
Fix this by only calling zfcp_scsi_schedule_rport_register(), to
asynchronously trigger fc_remote_port_add(), after all LUN recoveries as
children of the rport have finished and no new recoveries of equal or
higher order were triggered meanwhile. Finished intentionally includes
any recovery result no matter if successful or failed (still unblock
rport so other successful LUNs work). For simplicity, we check after
each finished LUN recovery if there is another LUN recovery pending on
the same port and then do nothing. We handle the special case of a
successful recovery of a port without LUN children the same way without
changing this case's semantics.
For debugging we introduce 2 new trace records written if the rport
unblock attempt was aborted due to still unfinished or freshly triggered
recovery. The records are only written above the default trace level.
Benjamin noticed the important special case of new recovery that can be
triggered between having given up the erp_lock and before calling
zfcp_erp_action_cleanup() within zfcp_erp_strategy(). We must avoid the
following sequence:
ERP thread rport_work other context
------------------------- -------------- --------------------------------
port is unblocked, rport still blocked,
due to pending/running ERP action,
so ((port->status & ...UNBLOCK) != 0)
and (port->rport == NULL)
unlock ERP
zfcp_erp_action_cleanup()
case ZFCP_ERP_ACTION_REOPEN_LUN:
zfcp_erp_try_rport_unblock()
((status & ...UNBLOCK) != 0) [OLD!]
zfcp_erp_port_reopen()
lock ERP
zfcp_erp_port_block()
port->status clear ...UNBLOCK
unlock ERP
zfcp_scsi_schedule_rport_block()
port->rport_task = RPORT_DEL
queue_work(rport_work)
zfcp_scsi_rport_work()
(port->rport_task != RPORT_ADD)
port->rport_task = RPORT_NONE
zfcp_scsi_rport_block()
if (!port->rport) return
zfcp_scsi_schedule_rport_register()
port->rport_task = RPORT_ADD
queue_work(rport_work)
zfcp_scsi_rport_work()
(port->rport_task == RPORT_ADD)
port->rport_task = RPORT_NONE
zfcp_scsi_rport_register()
(port->rport == NULL)
rport = fc_remote_port_add()
port->rport = rport;
Now the rport was erroneously unblocked while the zfcp_port is blocked.
This is another situation we want to avoid due to scsi_eh
potential. This state would at least remain until the new recovery from
the other context finished successfully, or potentially forever if it
failed. In order to close this race, we take the erp_lock inside
zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or
LUN. With that, the possible corresponding rport state sequences would
be: (unblock[ERP thread],block[other context]) if the ERP thread gets
erp_lock first and still sees ((port->status & ...UNBLOCK) != 0),
(block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock
after the other context has already cleard ...UNBLOCK from port->status.
Since checking fields of struct erp_action is unsafe because they could
have been overwritten (re-used for new recovery) meanwhile, we only
check status of zfcp_port and LUN since these are only changed under
erp_lock elsewhere. Regarding the check of the proper status flags (port
or port_forced are similar to the shown adapter recovery):
[zfcp_erp_adapter_shutdown()]
zfcp_erp_adapter_reopen()
zfcp_erp_adapter_block()
* clear UNBLOCK ---------------------------------------+
zfcp_scsi_schedule_rports_block() |
write_lock_irqsave(&adapter->erp_lock, flags);-------+ |
zfcp_erp_action_enqueue() | |
zfcp_erp_setup_act() | |
* set ERP_INUSE -----------------------------------|--|--+
write_unlock_irqrestore(&adapter->erp_lock, flags);--+ | |
.context-switch. | |
zfcp_erp_thread() | |
zfcp_erp_strategy() | |
write_lock_irqsave(&adapter->erp_lock, flags);------+ | |
... | | |
zfcp_erp_strategy_check_target() | | |
zfcp_erp_strategy_check_adapter() | | |
zfcp_erp_adapter_unblock() | | |
* set UNBLOCK -----------------------------------|--+ |
zfcp_erp_action_dequeue() | |
* clear ERP_INUSE ---------------------------------|-----+
... |
write_unlock_irqrestore(&adapter->erp_lock, flags);-+
Hence, we should check for both UNBLOCK and ERP_INUSE because they are
interleaved. Also we need to explicitly check ERP_FAILED for the link
down case which currently does not clear the UNBLOCK flag in
zfcp_fsf_link_down_info_eval().
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 8830271c4819 ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport")
Fixes: a2fa0aede07c ("[SCSI] zfcp: Block FC transport rports early on errors")
Fixes: 5f852be9e11d ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI")
Fixes: 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable")
Fixes: 3859f6a248cb ("[PATCH] zfcp: add rports to enable scsi_add_device to work again")
Cc: <stable@vger.kernel.org> #2.6.32+
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-12-10 00:16:33 +08:00
|
|
|
zfcp_erp_try_rport_unblock(port);
|
2010-07-08 15:53:05 +08:00
|
|
|
/* fall through */
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED:
|
2010-02-17 18:18:56 +08:00
|
|
|
put_device(&port->dev);
|
2008-07-02 16:56:40 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_ADAPTER:
|
2009-03-02 20:09:08 +08:00
|
|
|
if (result == ZFCP_ERP_SUCCEEDED) {
|
2008-12-25 20:38:50 +08:00
|
|
|
register_service_level(&adapter->service_level);
|
2012-09-04 21:23:35 +08:00
|
|
|
zfcp_fc_conditional_port_scan(adapter);
|
2011-02-23 02:54:48 +08:00
|
|
|
queue_work(adapter->work_queue, &adapter->ns_up_work);
|
2009-03-02 20:09:08 +08:00
|
|
|
} else
|
|
|
|
unregister_service_level(&adapter->service_level);
|
2011-02-23 02:54:48 +08:00
|
|
|
|
2009-11-24 23:53:59 +08:00
|
|
|
kref_put(&adapter->ref, zfcp_adapter_release);
|
2008-07-02 16:56:40 +08:00
|
|
|
break;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_strategy_do_action(struct zfcp_erp_action *erp_action)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
switch (erp_action->action) {
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_ADAPTER:
|
|
|
|
return zfcp_erp_adapter_strategy(erp_action);
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED:
|
|
|
|
return zfcp_erp_port_forced_strategy(erp_action);
|
|
|
|
case ZFCP_ERP_ACTION_REOPEN_PORT:
|
|
|
|
return zfcp_erp_port_strategy(erp_action);
|
2010-09-08 20:39:55 +08:00
|
|
|
case ZFCP_ERP_ACTION_REOPEN_LUN:
|
|
|
|
return zfcp_erp_lun_strategy(erp_action);
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
|
|
|
return ZFCP_ERP_FAILED;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_strategy(struct zfcp_erp_action *erp_action)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
int retval;
|
2008-07-02 16:56:40 +08:00
|
|
|
unsigned long flags;
|
2009-11-24 23:53:58 +08:00
|
|
|
struct zfcp_adapter *adapter = erp_action->adapter;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2009-11-24 23:53:59 +08:00
|
|
|
kref_get(&adapter->ref);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2009-11-24 23:53:59 +08:00
|
|
|
write_lock_irqsave(&adapter->erp_lock, flags);
|
2008-07-02 16:56:40 +08:00
|
|
|
zfcp_erp_strategy_check_fsfreq(erp_action);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
if (erp_action->status & ZFCP_STATUS_ERP_DISMISSED) {
|
|
|
|
zfcp_erp_action_dequeue(erp_action);
|
|
|
|
retval = ZFCP_ERP_DISMISSED;
|
|
|
|
goto unlock;
|
2005-06-13 19:15:15 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2010-07-08 15:53:09 +08:00
|
|
|
if (erp_action->status & ZFCP_STATUS_ERP_TIMEDOUT) {
|
|
|
|
retval = ZFCP_ERP_FAILED;
|
|
|
|
goto check_target;
|
|
|
|
}
|
|
|
|
|
2006-02-11 08:41:50 +08:00
|
|
|
zfcp_erp_action_to_running(erp_action);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
/* no lock to allow for blocking operations */
|
2009-11-24 23:53:58 +08:00
|
|
|
write_unlock_irqrestore(&adapter->erp_lock, flags);
|
2008-07-02 16:56:40 +08:00
|
|
|
retval = zfcp_erp_strategy_do_action(erp_action);
|
2009-11-24 23:53:58 +08:00
|
|
|
write_lock_irqsave(&adapter->erp_lock, flags);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
if (erp_action->status & ZFCP_STATUS_ERP_DISMISSED)
|
|
|
|
retval = ZFCP_ERP_CONTINUES;
|
2008-06-11 00:21:00 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
switch (retval) {
|
|
|
|
case ZFCP_ERP_NOMEM:
|
|
|
|
if (!(erp_action->status & ZFCP_STATUS_ERP_LOWMEM)) {
|
|
|
|
++adapter->erp_low_mem_count;
|
|
|
|
erp_action->status |= ZFCP_STATUS_ERP_LOWMEM;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
2008-07-02 16:56:40 +08:00
|
|
|
if (adapter->erp_total_count == adapter->erp_low_mem_count)
|
2010-12-02 22:16:16 +08:00
|
|
|
_zfcp_erp_adapter_reopen(adapter, 0, "erstgy1");
|
2008-07-02 16:56:40 +08:00
|
|
|
else {
|
|
|
|
zfcp_erp_strategy_memwait(erp_action);
|
|
|
|
retval = ZFCP_ERP_CONTINUES;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
2008-07-02 16:56:40 +08:00
|
|
|
goto unlock;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
case ZFCP_ERP_CONTINUES:
|
|
|
|
if (erp_action->status & ZFCP_STATUS_ERP_LOWMEM) {
|
|
|
|
--adapter->erp_low_mem_count;
|
|
|
|
erp_action->status &= ~ZFCP_STATUS_ERP_LOWMEM;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
2008-07-02 16:56:40 +08:00
|
|
|
goto unlock;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2010-07-08 15:53:09 +08:00
|
|
|
check_target:
|
2008-07-02 16:56:40 +08:00
|
|
|
retval = zfcp_erp_strategy_check_target(erp_action, retval);
|
|
|
|
zfcp_erp_action_dequeue(erp_action);
|
|
|
|
retval = zfcp_erp_strategy_statechange(erp_action, retval);
|
|
|
|
if (retval == ZFCP_ERP_EXIT)
|
|
|
|
goto unlock;
|
2009-07-13 21:06:09 +08:00
|
|
|
if (retval == ZFCP_ERP_SUCCEEDED)
|
|
|
|
zfcp_erp_strategy_followup_success(erp_action);
|
|
|
|
if (retval == ZFCP_ERP_FAILED)
|
|
|
|
zfcp_erp_strategy_followup_failed(erp_action);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
unlock:
|
2009-11-24 23:53:58 +08:00
|
|
|
write_unlock_irqrestore(&adapter->erp_lock, flags);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
if (retval != ZFCP_ERP_CONTINUES)
|
|
|
|
zfcp_erp_action_cleanup(erp_action, retval);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2009-11-24 23:53:59 +08:00
|
|
|
kref_put(&adapter->ref, zfcp_adapter_release);
|
2005-04-17 06:20:36 +08:00
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
static int zfcp_erp_thread(void *data)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2008-07-02 16:56:40 +08:00
|
|
|
struct zfcp_adapter *adapter = (struct zfcp_adapter *) data;
|
|
|
|
struct list_head *next;
|
|
|
|
struct zfcp_erp_action *act;
|
|
|
|
unsigned long flags;
|
2009-04-17 21:08:06 +08:00
|
|
|
|
2009-08-18 21:43:25 +08:00
|
|
|
for (;;) {
|
|
|
|
wait_event_interruptible(adapter->erp_ready_wq,
|
|
|
|
!list_empty(&adapter->erp_ready_head) ||
|
|
|
|
kthread_should_stop());
|
2009-04-17 21:08:06 +08:00
|
|
|
|
2009-08-18 21:43:25 +08:00
|
|
|
if (kthread_should_stop())
|
|
|
|
break;
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
write_lock_irqsave(&adapter->erp_lock, flags);
|
|
|
|
next = adapter->erp_ready_head.next;
|
|
|
|
write_unlock_irqrestore(&adapter->erp_lock, flags);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
if (next != &adapter->erp_ready_head) {
|
|
|
|
act = list_entry(next, struct zfcp_erp_action, list);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
/* there is more to come after dismission, no notify */
|
|
|
|
if (zfcp_erp_strategy(act) != ZFCP_ERP_DISMISSED)
|
|
|
|
zfcp_erp_wakeup(adapter);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
/**
|
|
|
|
* zfcp_erp_thread_setup - Start ERP thread for adapter
|
|
|
|
* @adapter: Adapter to start the ERP thread for
|
|
|
|
*
|
|
|
|
* Returns 0 on success or error code from kernel_thread()
|
|
|
|
*/
|
|
|
|
int zfcp_erp_thread_setup(struct zfcp_adapter *adapter)
|
|
|
|
{
|
2009-08-18 21:43:25 +08:00
|
|
|
struct task_struct *thread;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2009-08-18 21:43:25 +08:00
|
|
|
thread = kthread_run(zfcp_erp_thread, adapter, "zfcperp%s",
|
|
|
|
dev_name(&adapter->ccw_device->dev));
|
|
|
|
if (IS_ERR(thread)) {
|
2008-07-02 16:56:40 +08:00
|
|
|
dev_err(&adapter->ccw_device->dev,
|
2008-10-01 18:42:15 +08:00
|
|
|
"Creating an ERP thread for the FCP device failed.\n");
|
2009-08-18 21:43:25 +08:00
|
|
|
return PTR_ERR(thread);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
2009-08-18 21:43:25 +08:00
|
|
|
|
|
|
|
adapter->erp_thread = thread;
|
2008-07-02 16:56:40 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
/**
|
|
|
|
* zfcp_erp_thread_kill - Stop ERP thread.
|
|
|
|
* @adapter: Adapter where the ERP thread should be stopped.
|
|
|
|
*
|
|
|
|
* The caller of this routine ensures that the specified adapter has
|
|
|
|
* been shut down and that this operation has been completed. Thus,
|
|
|
|
* there are no pending erp_actions which would need to be handled
|
|
|
|
* here.
|
|
|
|
*/
|
|
|
|
void zfcp_erp_thread_kill(struct zfcp_adapter *adapter)
|
|
|
|
{
|
2009-08-18 21:43:25 +08:00
|
|
|
kthread_stop(adapter->erp_thread);
|
|
|
|
adapter->erp_thread = NULL;
|
2009-08-18 21:43:27 +08:00
|
|
|
WARN_ON(!list_empty(&adapter->erp_ready_head));
|
|
|
|
WARN_ON(!list_empty(&adapter->erp_running_head));
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
/**
|
2010-09-08 20:40:01 +08:00
|
|
|
* zfcp_erp_wait - wait for completion of error recovery on an adapter
|
|
|
|
* @adapter: adapter for which to wait for completion of its error recovery
|
2008-07-02 16:56:40 +08:00
|
|
|
*/
|
2010-09-08 20:40:01 +08:00
|
|
|
void zfcp_erp_wait(struct zfcp_adapter *adapter)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2010-09-08 20:40:01 +08:00
|
|
|
wait_event(adapter->erp_done_wqh,
|
|
|
|
!(atomic_read(&adapter->status) &
|
|
|
|
ZFCP_STATUS_ADAPTER_ERP_PENDING));
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
/**
|
2010-09-08 20:40:01 +08:00
|
|
|
* zfcp_erp_set_adapter_status - set adapter status bits
|
|
|
|
* @adapter: adapter to change the status
|
|
|
|
* @mask: status bits to change
|
|
|
|
*
|
|
|
|
* Changes in common status bits are propagated to attached ports and LUNs.
|
2008-07-02 16:56:40 +08:00
|
|
|
*/
|
2010-09-08 20:40:01 +08:00
|
|
|
void zfcp_erp_set_adapter_status(struct zfcp_adapter *adapter, u32 mask)
|
2008-07-02 16:56:40 +08:00
|
|
|
{
|
2010-09-08 20:40:01 +08:00
|
|
|
struct zfcp_port *port;
|
|
|
|
struct scsi_device *sdev;
|
|
|
|
unsigned long flags;
|
|
|
|
u32 common_mask = mask & ZFCP_COMMON_FLAGS;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_or(mask, &adapter->status);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2010-09-08 20:40:01 +08:00
|
|
|
if (!common_mask)
|
|
|
|
return;
|
|
|
|
|
|
|
|
read_lock_irqsave(&adapter->port_list_lock, flags);
|
|
|
|
list_for_each_entry(port, &adapter->port_list, list)
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_or(common_mask, &port->status);
|
2010-09-08 20:40:01 +08:00
|
|
|
read_unlock_irqrestore(&adapter->port_list_lock, flags);
|
|
|
|
|
[SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
CPU: 1 Not tainted 3.9.3+ #69
Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
<snip>
Call Trace:
([<00000000001165de>] show_trace+0x106/0x154)
[<00000000001166a0>] show_stack+0x74/0xf4
[<00000000006ff646>] dump_stack+0xc6/0xd4
[<000000000017f3a0>] __might_sleep+0x128/0x148
[<000000000015ece8>] flush_work+0x54/0x1f8
[<00000000001630de>] __cancel_work_timer+0xc6/0x128
[<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
[<0000000000161816>] execute_in_process_context+0x96/0xa8
[<00000000004d33d8>] device_release+0x60/0xc0
[<000000000048af48>] kobject_release+0xa8/0x1c4
[<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
[<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
[<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
[<000000000016b75a>] kthread+0xf2/0xfc
[<000000000070c9de>] kernel_thread_starter+0x6/0xc
[<000000000070c9d8>] kernel_thread_starter+0x0/0xc
Apparently, the ref_count for some scsi_device drops down to zero,
triggering device removal through execute_in_process_context(), while
the lldd error recovery thread iterates through a scsi device list.
Unfortunately, execute_in_process_context() decides to immediately
execute that device removal function, instead of scheduling asynchronous
execution, since it detects process context and thinks it is safe to do
so. But almost all calls to shost_for_each_device() in our lldd are
inside spin_lock_irq, even in thread context. Obviously, schedule()
inside spin_lock_irq sections is a bad idea.
Change the lldd to use the proper iterator function,
__shost_for_each_device(), in combination with required locking.
Occurences that need to be changed include all calls in zfcp_erp.c,
since those might be executed in zfcp error recovery thread context
with a lock held.
Other occurences of shost_for_each_device() in zfcp_fsf.c do not
need to be changed (no process context, no surrounding locking).
The problem was introduced in Linux 2.6.37 by commit
b62a8d9b45b971a67a0f8413338c230e3117dff5
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org #2.6.37+
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-22 23:45:37 +08:00
|
|
|
spin_lock_irqsave(adapter->scsi_host->host_lock, flags);
|
|
|
|
__shost_for_each_device(sdev, adapter->scsi_host)
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_or(common_mask, &sdev_to_zfcp(sdev)->status);
|
[SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
CPU: 1 Not tainted 3.9.3+ #69
Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
<snip>
Call Trace:
([<00000000001165de>] show_trace+0x106/0x154)
[<00000000001166a0>] show_stack+0x74/0xf4
[<00000000006ff646>] dump_stack+0xc6/0xd4
[<000000000017f3a0>] __might_sleep+0x128/0x148
[<000000000015ece8>] flush_work+0x54/0x1f8
[<00000000001630de>] __cancel_work_timer+0xc6/0x128
[<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
[<0000000000161816>] execute_in_process_context+0x96/0xa8
[<00000000004d33d8>] device_release+0x60/0xc0
[<000000000048af48>] kobject_release+0xa8/0x1c4
[<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
[<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
[<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
[<000000000016b75a>] kthread+0xf2/0xfc
[<000000000070c9de>] kernel_thread_starter+0x6/0xc
[<000000000070c9d8>] kernel_thread_starter+0x0/0xc
Apparently, the ref_count for some scsi_device drops down to zero,
triggering device removal through execute_in_process_context(), while
the lldd error recovery thread iterates through a scsi device list.
Unfortunately, execute_in_process_context() decides to immediately
execute that device removal function, instead of scheduling asynchronous
execution, since it detects process context and thinks it is safe to do
so. But almost all calls to shost_for_each_device() in our lldd are
inside spin_lock_irq, even in thread context. Obviously, schedule()
inside spin_lock_irq sections is a bad idea.
Change the lldd to use the proper iterator function,
__shost_for_each_device(), in combination with required locking.
Occurences that need to be changed include all calls in zfcp_erp.c,
since those might be executed in zfcp error recovery thread context
with a lock held.
Other occurences of shost_for_each_device() in zfcp_fsf.c do not
need to be changed (no process context, no surrounding locking).
The problem was introduced in Linux 2.6.37 by commit
b62a8d9b45b971a67a0f8413338c230e3117dff5
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org #2.6.37+
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-22 23:45:37 +08:00
|
|
|
spin_unlock_irqrestore(adapter->scsi_host->host_lock, flags);
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
/**
|
2010-09-08 20:40:01 +08:00
|
|
|
* zfcp_erp_clear_adapter_status - clear adapter status bits
|
2008-07-02 16:56:40 +08:00
|
|
|
* @adapter: adapter to change the status
|
|
|
|
* @mask: status bits to change
|
|
|
|
*
|
2010-09-08 20:39:55 +08:00
|
|
|
* Changes in common status bits are propagated to attached ports and LUNs.
|
2008-07-02 16:56:40 +08:00
|
|
|
*/
|
2010-09-08 20:40:01 +08:00
|
|
|
void zfcp_erp_clear_adapter_status(struct zfcp_adapter *adapter, u32 mask)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
struct zfcp_port *port;
|
2010-09-08 20:40:01 +08:00
|
|
|
struct scsi_device *sdev;
|
2009-11-24 23:53:58 +08:00
|
|
|
unsigned long flags;
|
2008-07-02 16:56:40 +08:00
|
|
|
u32 common_mask = mask & ZFCP_COMMON_FLAGS;
|
2010-09-08 20:40:01 +08:00
|
|
|
u32 clear_counter = mask & ZFCP_STATUS_COMMON_ERP_FAILED;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(mask, &adapter->status);
|
2010-09-08 20:40:01 +08:00
|
|
|
|
|
|
|
if (!common_mask)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (clear_counter)
|
|
|
|
atomic_set(&adapter->erp_counter, 0);
|
|
|
|
|
|
|
|
read_lock_irqsave(&adapter->port_list_lock, flags);
|
|
|
|
list_for_each_entry(port, &adapter->port_list, list) {
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(common_mask, &port->status);
|
2010-09-08 20:40:01 +08:00
|
|
|
if (clear_counter)
|
|
|
|
atomic_set(&port->erp_counter, 0);
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
2010-09-08 20:40:01 +08:00
|
|
|
read_unlock_irqrestore(&adapter->port_list_lock, flags);
|
2008-07-02 16:56:40 +08:00
|
|
|
|
[SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
CPU: 1 Not tainted 3.9.3+ #69
Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
<snip>
Call Trace:
([<00000000001165de>] show_trace+0x106/0x154)
[<00000000001166a0>] show_stack+0x74/0xf4
[<00000000006ff646>] dump_stack+0xc6/0xd4
[<000000000017f3a0>] __might_sleep+0x128/0x148
[<000000000015ece8>] flush_work+0x54/0x1f8
[<00000000001630de>] __cancel_work_timer+0xc6/0x128
[<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
[<0000000000161816>] execute_in_process_context+0x96/0xa8
[<00000000004d33d8>] device_release+0x60/0xc0
[<000000000048af48>] kobject_release+0xa8/0x1c4
[<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
[<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
[<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
[<000000000016b75a>] kthread+0xf2/0xfc
[<000000000070c9de>] kernel_thread_starter+0x6/0xc
[<000000000070c9d8>] kernel_thread_starter+0x0/0xc
Apparently, the ref_count for some scsi_device drops down to zero,
triggering device removal through execute_in_process_context(), while
the lldd error recovery thread iterates through a scsi device list.
Unfortunately, execute_in_process_context() decides to immediately
execute that device removal function, instead of scheduling asynchronous
execution, since it detects process context and thinks it is safe to do
so. But almost all calls to shost_for_each_device() in our lldd are
inside spin_lock_irq, even in thread context. Obviously, schedule()
inside spin_lock_irq sections is a bad idea.
Change the lldd to use the proper iterator function,
__shost_for_each_device(), in combination with required locking.
Occurences that need to be changed include all calls in zfcp_erp.c,
since those might be executed in zfcp error recovery thread context
with a lock held.
Other occurences of shost_for_each_device() in zfcp_fsf.c do not
need to be changed (no process context, no surrounding locking).
The problem was introduced in Linux 2.6.37 by commit
b62a8d9b45b971a67a0f8413338c230e3117dff5
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org #2.6.37+
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-22 23:45:37 +08:00
|
|
|
spin_lock_irqsave(adapter->scsi_host->host_lock, flags);
|
|
|
|
__shost_for_each_device(sdev, adapter->scsi_host) {
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(common_mask, &sdev_to_zfcp(sdev)->status);
|
2010-09-08 20:40:01 +08:00
|
|
|
if (clear_counter)
|
|
|
|
atomic_set(&sdev_to_zfcp(sdev)->erp_counter, 0);
|
2009-11-24 23:53:58 +08:00
|
|
|
}
|
[SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
CPU: 1 Not tainted 3.9.3+ #69
Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
<snip>
Call Trace:
([<00000000001165de>] show_trace+0x106/0x154)
[<00000000001166a0>] show_stack+0x74/0xf4
[<00000000006ff646>] dump_stack+0xc6/0xd4
[<000000000017f3a0>] __might_sleep+0x128/0x148
[<000000000015ece8>] flush_work+0x54/0x1f8
[<00000000001630de>] __cancel_work_timer+0xc6/0x128
[<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
[<0000000000161816>] execute_in_process_context+0x96/0xa8
[<00000000004d33d8>] device_release+0x60/0xc0
[<000000000048af48>] kobject_release+0xa8/0x1c4
[<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
[<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
[<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
[<000000000016b75a>] kthread+0xf2/0xfc
[<000000000070c9de>] kernel_thread_starter+0x6/0xc
[<000000000070c9d8>] kernel_thread_starter+0x0/0xc
Apparently, the ref_count for some scsi_device drops down to zero,
triggering device removal through execute_in_process_context(), while
the lldd error recovery thread iterates through a scsi device list.
Unfortunately, execute_in_process_context() decides to immediately
execute that device removal function, instead of scheduling asynchronous
execution, since it detects process context and thinks it is safe to do
so. But almost all calls to shost_for_each_device() in our lldd are
inside spin_lock_irq, even in thread context. Obviously, schedule()
inside spin_lock_irq sections is a bad idea.
Change the lldd to use the proper iterator function,
__shost_for_each_device(), in combination with required locking.
Occurences that need to be changed include all calls in zfcp_erp.c,
since those might be executed in zfcp error recovery thread context
with a lock held.
Other occurences of shost_for_each_device() in zfcp_fsf.c do not
need to be changed (no process context, no surrounding locking).
The problem was introduced in Linux 2.6.37 by commit
b62a8d9b45b971a67a0f8413338c230e3117dff5
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org #2.6.37+
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-22 23:45:37 +08:00
|
|
|
spin_unlock_irqrestore(adapter->scsi_host->host_lock, flags);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
/**
|
2010-09-08 20:40:01 +08:00
|
|
|
* zfcp_erp_set_port_status - set port status bits
|
|
|
|
* @port: port to change the status
|
2008-07-02 16:56:40 +08:00
|
|
|
* @mask: status bits to change
|
|
|
|
*
|
2010-09-08 20:39:55 +08:00
|
|
|
* Changes in common status bits are propagated to attached LUNs.
|
2008-07-02 16:56:40 +08:00
|
|
|
*/
|
2010-09-08 20:40:01 +08:00
|
|
|
void zfcp_erp_set_port_status(struct zfcp_port *port, u32 mask)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2010-09-08 20:39:55 +08:00
|
|
|
struct scsi_device *sdev;
|
2008-07-02 16:56:40 +08:00
|
|
|
u32 common_mask = mask & ZFCP_COMMON_FLAGS;
|
[SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
CPU: 1 Not tainted 3.9.3+ #69
Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
<snip>
Call Trace:
([<00000000001165de>] show_trace+0x106/0x154)
[<00000000001166a0>] show_stack+0x74/0xf4
[<00000000006ff646>] dump_stack+0xc6/0xd4
[<000000000017f3a0>] __might_sleep+0x128/0x148
[<000000000015ece8>] flush_work+0x54/0x1f8
[<00000000001630de>] __cancel_work_timer+0xc6/0x128
[<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
[<0000000000161816>] execute_in_process_context+0x96/0xa8
[<00000000004d33d8>] device_release+0x60/0xc0
[<000000000048af48>] kobject_release+0xa8/0x1c4
[<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
[<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
[<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
[<000000000016b75a>] kthread+0xf2/0xfc
[<000000000070c9de>] kernel_thread_starter+0x6/0xc
[<000000000070c9d8>] kernel_thread_starter+0x0/0xc
Apparently, the ref_count for some scsi_device drops down to zero,
triggering device removal through execute_in_process_context(), while
the lldd error recovery thread iterates through a scsi device list.
Unfortunately, execute_in_process_context() decides to immediately
execute that device removal function, instead of scheduling asynchronous
execution, since it detects process context and thinks it is safe to do
so. But almost all calls to shost_for_each_device() in our lldd are
inside spin_lock_irq, even in thread context. Obviously, schedule()
inside spin_lock_irq sections is a bad idea.
Change the lldd to use the proper iterator function,
__shost_for_each_device(), in combination with required locking.
Occurences that need to be changed include all calls in zfcp_erp.c,
since those might be executed in zfcp error recovery thread context
with a lock held.
Other occurences of shost_for_each_device() in zfcp_fsf.c do not
need to be changed (no process context, no surrounding locking).
The problem was introduced in Linux 2.6.37 by commit
b62a8d9b45b971a67a0f8413338c230e3117dff5
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org #2.6.37+
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-22 23:45:37 +08:00
|
|
|
unsigned long flags;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_or(mask, &port->status);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2010-09-08 20:40:01 +08:00
|
|
|
if (!common_mask)
|
|
|
|
return;
|
|
|
|
|
[SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
CPU: 1 Not tainted 3.9.3+ #69
Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
<snip>
Call Trace:
([<00000000001165de>] show_trace+0x106/0x154)
[<00000000001166a0>] show_stack+0x74/0xf4
[<00000000006ff646>] dump_stack+0xc6/0xd4
[<000000000017f3a0>] __might_sleep+0x128/0x148
[<000000000015ece8>] flush_work+0x54/0x1f8
[<00000000001630de>] __cancel_work_timer+0xc6/0x128
[<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
[<0000000000161816>] execute_in_process_context+0x96/0xa8
[<00000000004d33d8>] device_release+0x60/0xc0
[<000000000048af48>] kobject_release+0xa8/0x1c4
[<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
[<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
[<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
[<000000000016b75a>] kthread+0xf2/0xfc
[<000000000070c9de>] kernel_thread_starter+0x6/0xc
[<000000000070c9d8>] kernel_thread_starter+0x0/0xc
Apparently, the ref_count for some scsi_device drops down to zero,
triggering device removal through execute_in_process_context(), while
the lldd error recovery thread iterates through a scsi device list.
Unfortunately, execute_in_process_context() decides to immediately
execute that device removal function, instead of scheduling asynchronous
execution, since it detects process context and thinks it is safe to do
so. But almost all calls to shost_for_each_device() in our lldd are
inside spin_lock_irq, even in thread context. Obviously, schedule()
inside spin_lock_irq sections is a bad idea.
Change the lldd to use the proper iterator function,
__shost_for_each_device(), in combination with required locking.
Occurences that need to be changed include all calls in zfcp_erp.c,
since those might be executed in zfcp error recovery thread context
with a lock held.
Other occurences of shost_for_each_device() in zfcp_fsf.c do not
need to be changed (no process context, no surrounding locking).
The problem was introduced in Linux 2.6.37 by commit
b62a8d9b45b971a67a0f8413338c230e3117dff5
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org #2.6.37+
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-22 23:45:37 +08:00
|
|
|
spin_lock_irqsave(port->adapter->scsi_host->host_lock, flags);
|
|
|
|
__shost_for_each_device(sdev, port->adapter->scsi_host)
|
2010-09-08 20:40:01 +08:00
|
|
|
if (sdev_to_zfcp(sdev)->port == port)
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_or(common_mask,
|
2010-09-08 20:40:01 +08:00
|
|
|
&sdev_to_zfcp(sdev)->status);
|
[SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
CPU: 1 Not tainted 3.9.3+ #69
Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
<snip>
Call Trace:
([<00000000001165de>] show_trace+0x106/0x154)
[<00000000001166a0>] show_stack+0x74/0xf4
[<00000000006ff646>] dump_stack+0xc6/0xd4
[<000000000017f3a0>] __might_sleep+0x128/0x148
[<000000000015ece8>] flush_work+0x54/0x1f8
[<00000000001630de>] __cancel_work_timer+0xc6/0x128
[<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
[<0000000000161816>] execute_in_process_context+0x96/0xa8
[<00000000004d33d8>] device_release+0x60/0xc0
[<000000000048af48>] kobject_release+0xa8/0x1c4
[<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
[<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
[<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
[<000000000016b75a>] kthread+0xf2/0xfc
[<000000000070c9de>] kernel_thread_starter+0x6/0xc
[<000000000070c9d8>] kernel_thread_starter+0x0/0xc
Apparently, the ref_count for some scsi_device drops down to zero,
triggering device removal through execute_in_process_context(), while
the lldd error recovery thread iterates through a scsi device list.
Unfortunately, execute_in_process_context() decides to immediately
execute that device removal function, instead of scheduling asynchronous
execution, since it detects process context and thinks it is safe to do
so. But almost all calls to shost_for_each_device() in our lldd are
inside spin_lock_irq, even in thread context. Obviously, schedule()
inside spin_lock_irq sections is a bad idea.
Change the lldd to use the proper iterator function,
__shost_for_each_device(), in combination with required locking.
Occurences that need to be changed include all calls in zfcp_erp.c,
since those might be executed in zfcp error recovery thread context
with a lock held.
Other occurences of shost_for_each_device() in zfcp_fsf.c do not
need to be changed (no process context, no surrounding locking).
The problem was introduced in Linux 2.6.37 by commit
b62a8d9b45b971a67a0f8413338c230e3117dff5
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org #2.6.37+
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-22 23:45:37 +08:00
|
|
|
spin_unlock_irqrestore(port->adapter->scsi_host->host_lock, flags);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
/**
|
2010-09-08 20:40:01 +08:00
|
|
|
* zfcp_erp_clear_port_status - clear port status bits
|
|
|
|
* @port: adapter to change the status
|
2008-07-02 16:56:40 +08:00
|
|
|
* @mask: status bits to change
|
2010-09-08 20:40:01 +08:00
|
|
|
*
|
|
|
|
* Changes in common status bits are propagated to attached LUNs.
|
2008-07-02 16:56:40 +08:00
|
|
|
*/
|
2010-09-08 20:40:01 +08:00
|
|
|
void zfcp_erp_clear_port_status(struct zfcp_port *port, u32 mask)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2010-09-08 20:40:01 +08:00
|
|
|
struct scsi_device *sdev;
|
|
|
|
u32 common_mask = mask & ZFCP_COMMON_FLAGS;
|
|
|
|
u32 clear_counter = mask & ZFCP_STATUS_COMMON_ERP_FAILED;
|
[SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
CPU: 1 Not tainted 3.9.3+ #69
Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
<snip>
Call Trace:
([<00000000001165de>] show_trace+0x106/0x154)
[<00000000001166a0>] show_stack+0x74/0xf4
[<00000000006ff646>] dump_stack+0xc6/0xd4
[<000000000017f3a0>] __might_sleep+0x128/0x148
[<000000000015ece8>] flush_work+0x54/0x1f8
[<00000000001630de>] __cancel_work_timer+0xc6/0x128
[<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
[<0000000000161816>] execute_in_process_context+0x96/0xa8
[<00000000004d33d8>] device_release+0x60/0xc0
[<000000000048af48>] kobject_release+0xa8/0x1c4
[<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
[<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
[<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
[<000000000016b75a>] kthread+0xf2/0xfc
[<000000000070c9de>] kernel_thread_starter+0x6/0xc
[<000000000070c9d8>] kernel_thread_starter+0x0/0xc
Apparently, the ref_count for some scsi_device drops down to zero,
triggering device removal through execute_in_process_context(), while
the lldd error recovery thread iterates through a scsi device list.
Unfortunately, execute_in_process_context() decides to immediately
execute that device removal function, instead of scheduling asynchronous
execution, since it detects process context and thinks it is safe to do
so. But almost all calls to shost_for_each_device() in our lldd are
inside spin_lock_irq, even in thread context. Obviously, schedule()
inside spin_lock_irq sections is a bad idea.
Change the lldd to use the proper iterator function,
__shost_for_each_device(), in combination with required locking.
Occurences that need to be changed include all calls in zfcp_erp.c,
since those might be executed in zfcp error recovery thread context
with a lock held.
Other occurences of shost_for_each_device() in zfcp_fsf.c do not
need to be changed (no process context, no surrounding locking).
The problem was introduced in Linux 2.6.37 by commit
b62a8d9b45b971a67a0f8413338c230e3117dff5
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org #2.6.37+
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-22 23:45:37 +08:00
|
|
|
unsigned long flags;
|
2010-09-08 20:40:01 +08:00
|
|
|
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(mask, &port->status);
|
2010-09-08 20:40:01 +08:00
|
|
|
|
|
|
|
if (!common_mask)
|
|
|
|
return;
|
2010-09-08 20:39:55 +08:00
|
|
|
|
2010-09-08 20:40:01 +08:00
|
|
|
if (clear_counter)
|
|
|
|
atomic_set(&port->erp_counter, 0);
|
|
|
|
|
[SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
CPU: 1 Not tainted 3.9.3+ #69
Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
<snip>
Call Trace:
([<00000000001165de>] show_trace+0x106/0x154)
[<00000000001166a0>] show_stack+0x74/0xf4
[<00000000006ff646>] dump_stack+0xc6/0xd4
[<000000000017f3a0>] __might_sleep+0x128/0x148
[<000000000015ece8>] flush_work+0x54/0x1f8
[<00000000001630de>] __cancel_work_timer+0xc6/0x128
[<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
[<0000000000161816>] execute_in_process_context+0x96/0xa8
[<00000000004d33d8>] device_release+0x60/0xc0
[<000000000048af48>] kobject_release+0xa8/0x1c4
[<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
[<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
[<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
[<000000000016b75a>] kthread+0xf2/0xfc
[<000000000070c9de>] kernel_thread_starter+0x6/0xc
[<000000000070c9d8>] kernel_thread_starter+0x0/0xc
Apparently, the ref_count for some scsi_device drops down to zero,
triggering device removal through execute_in_process_context(), while
the lldd error recovery thread iterates through a scsi device list.
Unfortunately, execute_in_process_context() decides to immediately
execute that device removal function, instead of scheduling asynchronous
execution, since it detects process context and thinks it is safe to do
so. But almost all calls to shost_for_each_device() in our lldd are
inside spin_lock_irq, even in thread context. Obviously, schedule()
inside spin_lock_irq sections is a bad idea.
Change the lldd to use the proper iterator function,
__shost_for_each_device(), in combination with required locking.
Occurences that need to be changed include all calls in zfcp_erp.c,
since those might be executed in zfcp error recovery thread context
with a lock held.
Other occurences of shost_for_each_device() in zfcp_fsf.c do not
need to be changed (no process context, no surrounding locking).
The problem was introduced in Linux 2.6.37 by commit
b62a8d9b45b971a67a0f8413338c230e3117dff5
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org #2.6.37+
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-22 23:45:37 +08:00
|
|
|
spin_lock_irqsave(port->adapter->scsi_host->host_lock, flags);
|
|
|
|
__shost_for_each_device(sdev, port->adapter->scsi_host)
|
2010-09-08 20:40:01 +08:00
|
|
|
if (sdev_to_zfcp(sdev)->port == port) {
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(common_mask,
|
2010-09-08 20:40:01 +08:00
|
|
|
&sdev_to_zfcp(sdev)->status);
|
|
|
|
if (clear_counter)
|
|
|
|
atomic_set(&sdev_to_zfcp(sdev)->erp_counter, 0);
|
2008-07-02 16:56:40 +08:00
|
|
|
}
|
[SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
CPU: 1 Not tainted 3.9.3+ #69
Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
<snip>
Call Trace:
([<00000000001165de>] show_trace+0x106/0x154)
[<00000000001166a0>] show_stack+0x74/0xf4
[<00000000006ff646>] dump_stack+0xc6/0xd4
[<000000000017f3a0>] __might_sleep+0x128/0x148
[<000000000015ece8>] flush_work+0x54/0x1f8
[<00000000001630de>] __cancel_work_timer+0xc6/0x128
[<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
[<0000000000161816>] execute_in_process_context+0x96/0xa8
[<00000000004d33d8>] device_release+0x60/0xc0
[<000000000048af48>] kobject_release+0xa8/0x1c4
[<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
[<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
[<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
[<000000000016b75a>] kthread+0xf2/0xfc
[<000000000070c9de>] kernel_thread_starter+0x6/0xc
[<000000000070c9d8>] kernel_thread_starter+0x0/0xc
Apparently, the ref_count for some scsi_device drops down to zero,
triggering device removal through execute_in_process_context(), while
the lldd error recovery thread iterates through a scsi device list.
Unfortunately, execute_in_process_context() decides to immediately
execute that device removal function, instead of scheduling asynchronous
execution, since it detects process context and thinks it is safe to do
so. But almost all calls to shost_for_each_device() in our lldd are
inside spin_lock_irq, even in thread context. Obviously, schedule()
inside spin_lock_irq sections is a bad idea.
Change the lldd to use the proper iterator function,
__shost_for_each_device(), in combination with required locking.
Occurences that need to be changed include all calls in zfcp_erp.c,
since those might be executed in zfcp error recovery thread context
with a lock held.
Other occurences of shost_for_each_device() in zfcp_fsf.c do not
need to be changed (no process context, no surrounding locking).
The problem was introduced in Linux 2.6.37 by commit
b62a8d9b45b971a67a0f8413338c230e3117dff5
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org #2.6.37+
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-22 23:45:37 +08:00
|
|
|
spin_unlock_irqrestore(port->adapter->scsi_host->host_lock, flags);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
/**
|
2010-09-08 20:40:01 +08:00
|
|
|
* zfcp_erp_set_lun_status - set lun status bits
|
|
|
|
* @sdev: SCSI device / lun to set the status bits
|
|
|
|
* @mask: status bits to change
|
2008-07-02 16:56:40 +08:00
|
|
|
*/
|
2010-09-08 20:40:01 +08:00
|
|
|
void zfcp_erp_set_lun_status(struct scsi_device *sdev, u32 mask)
|
2005-06-13 19:23:57 +08:00
|
|
|
{
|
2010-09-08 20:40:01 +08:00
|
|
|
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev);
|
|
|
|
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_or(mask, &zfcp_sdev->status);
|
2005-06-13 19:23:57 +08:00
|
|
|
}
|
|
|
|
|
2008-07-02 16:56:40 +08:00
|
|
|
/**
|
2010-09-08 20:40:01 +08:00
|
|
|
* zfcp_erp_clear_lun_status - clear lun status bits
|
|
|
|
* @sdev: SCSi device / lun to clear the status bits
|
|
|
|
* @mask: status bits to change
|
2008-07-02 16:56:40 +08:00
|
|
|
*/
|
2010-09-08 20:40:01 +08:00
|
|
|
void zfcp_erp_clear_lun_status(struct scsi_device *sdev, u32 mask)
|
2005-06-13 19:23:57 +08:00
|
|
|
{
|
2010-09-08 20:40:01 +08:00
|
|
|
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev);
|
|
|
|
|
2015-04-24 07:12:32 +08:00
|
|
|
atomic_andnot(mask, &zfcp_sdev->status);
|
2010-09-08 20:40:01 +08:00
|
|
|
|
|
|
|
if (mask & ZFCP_STATUS_COMMON_ERP_FAILED)
|
|
|
|
atomic_set(&zfcp_sdev->erp_counter, 0);
|
2005-06-13 19:23:57 +08:00
|
|
|
}
|
2010-09-08 20:40:01 +08:00
|
|
|
|