linux-sg2042

History

Steffen Maier 6f2ce1c6af scsi: zfcp: fix rport unblock race with LUN recovery It is unavoidable that zfcp_scsi_queuecommand() has to finish requests with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time window when zfcp detected an unavailable rport but fc_remote_port_delete(), which is asynchronous via zfcp_scsi_schedule_rport_block(), has not yet blocked the rport. However, for the case when the rport becomes available again, we should prevent unblocking the rport too early. In contrast to other FCP LLDDs, zfcp has to open each LUN with the FCP channel hardware before it can send I/O to a LUN. So if a port already has LUNs attached and we unblock the rport just after port recovery, recoveries of LUNs behind this port can still be pending which in turn force zfcp_scsi_queuecommand() to unnecessarily finish requests with DID_IMM_RETRY. This also opens a time window with unblocked rport (until the followup LUN reopen recovery has finished). If a scsi_cmnd timeout occurs during this time window fc_timed_out() cannot work as desired and such command would indeed time out and trigger scsi_eh. This prevents a clean and timely path failover. This should not happen if the path issue can be recovered on FC transport layer such as path issues involving RSCNs. Fix this by only calling zfcp_scsi_schedule_rport_register(), to asynchronously trigger fc_remote_port_add(), after all LUN recoveries as children of the rport have finished and no new recoveries of equal or higher order were triggered meanwhile. Finished intentionally includes any recovery result no matter if successful or failed (still unblock rport so other successful LUNs work). For simplicity, we check after each finished LUN recovery if there is another LUN recovery pending on the same port and then do nothing. We handle the special case of a successful recovery of a port without LUN children the same way without changing this case's semantics. For debugging we introduce 2 new trace records written if the rport unblock attempt was aborted due to still unfinished or freshly triggered recovery. The records are only written above the default trace level. Benjamin noticed the important special case of new recovery that can be triggered between having given up the erp_lock and before calling zfcp_erp_action_cleanup() within zfcp_erp_strategy(). We must avoid the following sequence: ERP thread rport_work other context ------------------------- -------------- -------------------------------- port is unblocked, rport still blocked, due to pending/running ERP action, so ((port->status & ...UNBLOCK) != 0) and (port->rport == NULL) unlock ERP zfcp_erp_action_cleanup() case ZFCP_ERP_ACTION_REOPEN_LUN: zfcp_erp_try_rport_unblock() ((status & ...UNBLOCK) != 0) [OLD!] zfcp_erp_port_reopen() lock ERP zfcp_erp_port_block() port->status clear ...UNBLOCK unlock ERP zfcp_scsi_schedule_rport_block() port->rport_task = RPORT_DEL queue_work(rport_work) zfcp_scsi_rport_work() (port->rport_task != RPORT_ADD) port->rport_task = RPORT_NONE zfcp_scsi_rport_block() if (!port->rport) return zfcp_scsi_schedule_rport_register() port->rport_task = RPORT_ADD queue_work(rport_work) zfcp_scsi_rport_work() (port->rport_task == RPORT_ADD) port->rport_task = RPORT_NONE zfcp_scsi_rport_register() (port->rport == NULL) rport = fc_remote_port_add() port->rport = rport; Now the rport was erroneously unblocked while the zfcp_port is blocked. This is another situation we want to avoid due to scsi_eh potential. This state would at least remain until the new recovery from the other context finished successfully, or potentially forever if it failed. In order to close this race, we take the erp_lock inside zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or LUN. With that, the possible corresponding rport state sequences would be: (unblock[ERP thread],block[other context]) if the ERP thread gets erp_lock first and still sees ((port->status & ...UNBLOCK) != 0), (block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock after the other context has already cleard ...UNBLOCK from port->status. Since checking fields of struct erp_action is unsafe because they could have been overwritten (re-used for new recovery) meanwhile, we only check status of zfcp_port and LUN since these are only changed under erp_lock elsewhere. Regarding the check of the proper status flags (port or port_forced are similar to the shown adapter recovery): [zfcp_erp_adapter_shutdown()] zfcp_erp_adapter_reopen() zfcp_erp_adapter_block() * clear UNBLOCK ---------------------------------------+ zfcp_scsi_schedule_rports_block() \| write_lock_irqsave(&adapter->erp_lock, flags);-------+ \| zfcp_erp_action_enqueue() \| \| zfcp_erp_setup_act() \| \| * set ERP_INUSE -----------------------------------\|--\|--+ write_unlock_irqrestore(&adapter->erp_lock, flags);--+ \| \| .context-switch. \| \| zfcp_erp_thread() \| \| zfcp_erp_strategy() \| \| write_lock_irqsave(&adapter->erp_lock, flags);------+ \| \| ... \| \| \| zfcp_erp_strategy_check_target() \| \| \| zfcp_erp_strategy_check_adapter() \| \| \| zfcp_erp_adapter_unblock() \| \| \| * set UNBLOCK -----------------------------------\|--+ \| zfcp_erp_action_dequeue() \| \| * clear ERP_INUSE ---------------------------------\|-----+ ... \| write_unlock_irqrestore(&adapter->erp_lock, flags);-+ Hence, we should check for both UNBLOCK and ERP_INUSE because they are interleaved. Also we need to explicitly check ERP_FAILED for the link down case which currently does not clear the UNBLOCK flag in zfcp_fsf_link_down_info_eval(). Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: `8830271c48` ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport") Fixes: `a2fa0aede0` ("[SCSI] zfcp: Block FC transport rports early on errors") Fixes: `5f852be9e1` ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI") Fixes: `338151e066` ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable") Fixes: `3859f6a248` ("[PATCH] zfcp: add rports to enable scsi_add_device to work again") Cc: <stable@vger.kernel.org> #2.6.32+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>		2016-12-14 15:17:20 -05:00
..
accessibility	…
acpi	ACPI material for v4.10-rc1	2016-12-13 11:06:21 -08:00
amba	…
android	…
ata	Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata	2016-12-13 15:30:50 -08:00
atm	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-12-06 21:33:19 -05:00
auxdisplay	auxdisplay: ht16k33: select framebuffer helper modules	2016-11-30 13:04:31 +01:00
base	Driver core patches for 4.10-rc1	2016-12-13 11:42:18 -08:00
bcma	bcma: add Dell Inspiron 3148	2016-11-29 17:35:14 +02:00
block	SCSI misc on 20161213	2016-12-14 10:49:33 -08:00
bluetooth	Bluetooth: btmrvl: drop duplicate header slab.h	2016-12-08 07:44:56 +01:00
bus	…
cdrom	…
char	xen: features and fixes for 4.10 rc0	2016-12-13 16:07:55 -08:00
clk	clk: bcm: Fix 'maybe-uninitialized' warning in bcm2835_clock_choose_div_and_prate()	2016-12-12 11:25:40 -08:00
clocksource	…
connector	…
cpufreq	Power management material for v4.10-rc1	2016-12-13 10:41:53 -08:00
cpuidle	cpuidle: Add a kerneldoc comment to cpuidle_use_deepest_state()	2016-12-06 02:25:03 +01:00
crypto	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2016-12-12 14:27:49 -08:00
dax	device-dax: fix private mapping restriction, permit read-only	2016-12-06 17:42:37 -08:00
dca	…
devfreq	devfreq: rk3399_dmc: Don't use OPP structures outside of RCU locks	2016-12-08 01:46:07 +01:00
dio	…
dma	remoteproc updates for v4.10	2016-12-13 08:49:12 -08:00
dma-buf	…
edac	EDAC, amd64: Fix improper return value	2016-12-04 10:51:42 +01:00
eisa	…
extcon	…
firewire	…
firmware	arm64 updates for 4.10:	2016-12-13 16:39:21 -08:00
fmc	…
fpga	fpga: Clarify how write_init works streaming modes	2016-11-29 15:51:49 -06:00
gpio	Bulk GPIO changes for the v4.10 kernel cycle:	2016-12-13 07:54:57 -08:00
gpu	Main pull request for drm for 4.10 kernel	2016-12-13 09:35:09 -08:00
hid	HID: hid-sensor-hub: clear memory to avoid random data	2016-11-23 17:54:58 +01:00
hsi	…
hv	uio-hv-generic: new userspace i/o driver for VMBus	2016-12-06 11:52:49 +01:00
hwmon	hwmon: (g762) Fix overflows and crash seen when writing limit attributes	2016-12-12 11:33:44 -08:00
hwspinlock	…
hwtracing	coresight: perf: Add a missing call to etm_free_aux	2016-11-29 20:05:32 +01:00
i2c	Revert "i2c: octeon: thunderx: Limit register access retries"	2016-11-29 20:04:21 +01:00
ide	…
idle	Power management material for v4.10-rc1	2016-12-13 10:41:53 -08:00
iio	iio: magnetometer: separate the values of attributes based on their usage type for HID compass sensor	2016-11-24 20:41:30 +00:00
infiniband	…
input	xen: features and fixes for 4.10 rc0	2016-12-13 16:07:55 -08:00
iommu	Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2016-12-12 19:25:04 -08:00
ipack	…
irqchip	arm64 updates for 4.10:	2016-12-13 16:39:21 -08:00
isdn	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-12-10 16:21:55 -05:00
leds	leds: pca955x: Add ACPI support	2016-12-02 09:31:50 +01:00
lguest	…
lightnvm	Char/Misc driver patches for 4.10-rc1	2016-12-13 12:11:01 -08:00
macintosh	…
mailbox	…
mcb	…
md	. various fixes and improvements to request-based DM and DM multipath	2016-12-14 11:01:00 -08:00
media	USB/PHY patches for 4.10-rc1	2016-12-13 11:10:36 -08:00
memory	…
memstick	Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block	2016-12-13 10:19:16 -08:00
message	SCSI misc on 20161213	2016-12-14 10:49:33 -08:00
mfd	Staging/IIO patches for 4.10-rc1	2016-12-13 11:35:00 -08:00
misc	Char/Misc driver patches for 4.10-rc1	2016-12-13 12:11:01 -08:00
mmc	MMC core:	2016-12-14 10:55:56 -08:00
mtd	…
net	scsi: cxgb4i: libcxgbi: cxgb4: add T6 iSCSI completion feature	2016-12-14 15:09:13 -05:00
nfc	…
ntb	…
nubus	…
nvdimm	These are the documentation changes for 4.10.	2016-12-12 21:58:13 -08:00
nvme	Just one simple change from Andrzej to drop the pointless return value	2016-12-14 10:31:25 -08:00
nvmem	…
of	Char/Misc driver patches for 4.10-rc1	2016-12-13 12:11:01 -08:00
oprofile	oprofile/nmi timer: Convert to hotplug state machine	2016-12-02 00:52:34 +01:00
parisc	…
parport	…
pci	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux	2016-12-13 16:33:33 -08:00
pcmcia	drivers/pcmcia/m32r_pcc.c: check return from add_pcc_socket	2016-12-12 18:55:06 -08:00
perf	…
phy	SCSI misc on 20161213	2016-12-14 10:49:33 -08:00
pinctrl	Bulk pin control changes for the v4.10 kernel cycle:	2016-12-13 07:59:10 -08:00
platform	Char/Misc driver patches for 4.10-rc1	2016-12-13 12:11:01 -08:00
pnp	…
power	…
powercap	powercap / RAPL: Add Knights Mill CPUID	2016-11-30 23:41:33 +01:00
pps	…
ps3	…
ptp	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2016-12-12 19:56:15 -08:00
pwm	pwm: Fix device reference leak	2016-11-29 16:43:24 +01:00
rapidio	…
ras	…
regulator	Merge remote-tracking branches 'regulator/topic/tps65086' and 'regulator/topic/twl' into regulator-next	2016-12-12 12:17:31 +00:00
remoteproc	remoteproc: qcom_adsp_pil: select qcom_scm	2016-12-09 16:16:56 -08:00
reset	…
rpmsg	rpmsg updates for v4.10	2016-12-13 08:52:45 -08:00
rtc	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2016-12-12 19:56:15 -08:00
s390	scsi: zfcp: fix rport unblock race with LUN recovery	2016-12-14 15:17:20 -05:00
sbus	…
scsi	scsi: libcxgbi: return error if interface is not up	2016-12-14 15:11:53 -05:00
sfi	…
sh	lib: radix-tree: check accounting of existing slot replacement users	2016-12-12 18:55:08 -08:00
sn	…
soc	This is a fairly quiet release. We don't have any patches to the core	2016-12-13 08:54:27 -08:00
spi	Merge remote-tracking branches 'spi/topic/spidev', 'spi/topic/sunxi', 'spi/topic/ti-qspi', 'spi/topic/topcliff-pch' and 'spi/topic/xlp' into spi-next	2016-12-12 15:54:20 +00:00
spmi	…
ssb	…
staging	Staging/IIO patches for 4.10-rc1	2016-12-13 11:35:00 -08:00
target	SCSI misc on 20161213	2016-12-14 10:49:33 -08:00
tc	…
thermal	Power management material for v4.10-rc1	2016-12-13 10:41:53 -08:00
thunderbolt	Char/Misc driver patches for 4.10-rc1	2016-12-13 12:11:01 -08:00
tty	Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq	2016-12-13 12:59:57 -08:00
uio	uio-hv-generic: store physical addresses instead of virtual	2016-12-10 14:57:58 +01:00
usb	Just one simple change from Andrzej to drop the pointless return value	2016-12-14 10:31:25 -08:00
uwb	…
vfio	vfio iommu type1: Fix size argument to vfio_find_dma() in pin_pages/unpin_pages	2016-12-06 12:35:53 -07:00
vhost	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2016-12-12 10:48:02 -08:00
video	xen: features and fixes for 4.10 rc0	2016-12-13 16:07:55 -08:00
virt	…
virtio	…
vlynq	…
vme	…
w1	…
watchdog	Char/Misc driver patches for 4.10-rc1	2016-12-13 12:11:01 -08:00
xen	xen: features and fixes for 4.10 rc0	2016-12-13 16:07:55 -08:00
zorro	…
Kconfig	…
Makefile	…