Commit Graph

60 Commits

Author SHA1 Message Date
Karan Tilak Kumar 3256b46823 scsi: fnic: Validate io_req before others
We need to check for a valid io_req before we check other data. Also,
remove redundant checks.

Link: https://lore.kernel.org/r/20201121023337.19295-1-kartilak@cisco.com
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-23 22:38:40 -05:00
Karan Tilak Kumar 74ae6d6a68 scsi: fnic: Set scsi_set_resid() only for underflow
Set scsi_set_resid() only if FCPIO_ICMND_CMPL_RESID_UNDER is set.

Link: https://lore.kernel.org/r/20201121015134.18872-1-kartilak@cisco.com
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-23 22:37:50 -05:00
Karan Tilak Kumar f9e2beb990 scsi: fnic: Avoid looping in TRANS ETH on unload
Avoid looping in fnic_scsi_abort_io() before sending fw reset when fnic is
in TRANS ETH state and when we have not received any link events.

Link: https://lore.kernel.org/r/20201121012145.18522-1-kartilak@cisco.com
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-23 22:32:50 -05:00
Hannes Reinecke 712582e60f scsi: fnic: Do not call 'scsi_done()' for unhandled commands
The fnic drivers assigns an ioreq structure to each command and severs this
assignment once scsi_done() has been called and the command has been
completed.

When traversing commands to terminate outstanding I/O we should not call
scsi_done() on commands which do not have a corresponding ioreq structure;
these commands have either never entered the driver or have already been
completed.

[mkp: fixed unused label warning]

Link: https://lore.kernel.org/r/20200515112647.49260-1-hare@suse.de
Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Satish Kharat <satishkh@cisco.com>
Acked-by: Karan Tilak Kumar <kartilak@cisco.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-10-02 21:09:26 -04:00
Miaohe Lin 51d263cbdd scsi: fnic: Use eth_broadcast_addr() to assign broadcast address
Use eth_broadcast_addr() to assign broadcast address insetad of memset().

Link: https://lore.kernel.org/r/1595233498-13628-1-git-send-email-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-07-24 22:09:56 -04:00
Jason Yan b91857a5ca scsi: fnic: Use true, false for fnic->internal_reset_inprogress
Fix the following coccicheck warning:

drivers/scsi/fnic/fnic_scsi.c:2627:5-36: WARNING: Comparison of 0/1 to
bool variable

Link: https://lore.kernel.org/r/20200430121718.14970-1-yanaijie@huawei.com
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-05-07 22:03:07 -04:00
Hannes Reinecke 0e2209629f scsi: fnic: do not queue commands during fwreset
When a link is going down the driver will be calling fnic_cleanup_io(),
which will traverse all commands and calling 'done' for each found command.
While the traversal is handled under the host_lock, calling 'done' happens
after the host_lock is being dropped.

As fnic_queuecommand_lck() is being called with the host_lock held, it
might well be that it will pick the command being selected for abortion
from the above routine and enqueue it for sending, but then 'done' is being
called on that very command from the above routine.

Which of course confuses the hell out of the scsi midlayer.

So fix this by not queueing commands when fnic_cleanup_io is active.

Link: https://lore.kernel.org/r/20200116102053.62755-1-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-01-20 23:58:14 -05:00
Pan Bian ec990306f7 scsi: fnic: fix use after free
The memory chunk io_req is released by mempool_free. Accessing
io_req->start_time will result in a use after free bug. The variable
start_time is a backup of the timestamp. So, use start_time here to
avoid use after free.

Link: https://lore.kernel.org/r/1572881182-37664-1-git-send-email-bianpan2016@163.com
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-11-06 00:04:02 -05:00
Satish Kharat e8bfe3e7ff scsi: fnic: Warn when calling done for IO not issued to fw
The change is to print warning when scsi done is called for an IO that has
not yet been issued to the fw. Also adding sc and tag to debug print when
IO is cleaned up.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-01-22 21:18:34 -05:00
Satish Kharat 3567dca1ba scsi: fnic: fnic stats for max CQs processed and ISR time
This change is to add fnic stats for the max number of CQs (corresponding
to copy WQ) processed in a given interrupt, max time taken by the ISR.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-01-22 21:18:34 -05:00
Satish Kharat 68f03bd1ee scsi: fnic: use fnic_lock to guard fnic->state_flags
Need to use fnic_lock as well as host lock in that order to set state
flags.

[mkp: typos]
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-01-22 21:18:34 -05:00
Christoph Hellwig 511c49fe18 fnic: fix fnic_scsi_host_{start,end}_tag
The way these functions abuse ->special to try to store the dummy
request looks completely broken, given that it actually stores the
original scsi command.

Instead switch to ->host_scribble and store the actual dummy command.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-10 08:03:44 -07:00
Jens Axboe 4d5b4ac1ea scsi: fnic: replace gross legacy tag hack with blk-mq hack
Would be nice to fix up the SCSI midlayer instead, but this will do for
now.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Satish Kharat <satishkh@cisco.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-17 22:06:04 -04:00
Christoph Hellwig 7f9b0f774f scsi: fnic: switch to generic DMA API
Switch from the legacy PCI DMA API to the generic DMA API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-17 21:58:52 -04:00
Nicolas Iooss f280c77dc9 scsi: fnic: add a space after %p in printf format
fnic_fcpio_icmnd_cmpl_handler() displays the value of sc with:

    FNIC_SCSI_DBG(KERN_INFO...
        "... sc = 0x%p"
        "scsi_status ..."
        ...

As the literal strings get merged, the function uses %ps instead of the
intended raw %p format. Fix this by inserting a space.

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-12-11 21:43:00 -05:00
Hannes Reinecke 7c3a50bb9b scsi: fnic: do not call host reset from command abort
Command abort already returns FAILED, which will then be escalated to a
host reset. So no need to call host_reset directly.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-08-25 17:21:10 -04:00
Satish Kharat 4a1108d6ca scsi: fnic: changing queue command to return result DID_IMM_RETRY when rport is init
Currently the queue command returns DID_NO_CONNECT anytime the rport is
not in RPORT_ST_READY state. Changing it to return DID_NO_CONNECT only
when the rport is in RPORT_ST_DELETE state. When the rport is in one of
the init states retruning DID_IMM_RETRY.

Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-27 21:40:59 -04:00
Satish Kharat 1cdf8bc18f scsi: fnic: Zero io_cmpl_skip on fw reset completion
io_cmpl_skip keep track of number of completions to skip when stats are
reset. If a fw_reset happens immediately after stats reset it could put
it out of sync so need to reset io_cmpl_skip when fw reset is completed.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-06-27 21:40:59 -04:00
Satish Kharat 445d296086 scsi: fnic: Adding debug IO and Abort latency counter to fnic stats
The IO and Abort latency counter counts the time taken to complete the
IO and abort command into broad buckets. This is not intended for
performance measurement, just a debug statistic.  current_max_io_time
tries to keep track of the maximum time an IO has taken to complete if
it is > 30sec.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:57:23 -04:00
Satish Kharat 39fcbbc01b scsi: fnic: Adding Check Condition counter to misc fnicstats
Just a simple counter of number of check conditions encountered on that
host.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:57:23 -04:00
Satish Kharat b9202b4ae8 scsi: fnic: Avoid false out-of-order detection for aborted command
If SCSI-ML has already issued abort on a command i.e
FNIC_IOREQ_ABTS_PENDING is set and we get a IO completion, avoid this
being flagged as out-of-order completion by setting the FNIC_IO_DONE
flag in fnic_fcpio_icmnd_cmpl_handler

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:57:23 -04:00
Satish Kharat 7ef539c88d scsi: fnic: Fix for "Number of Active IOs" in fnicstats becoming negative
Fixing the IO stats update (Active IOs and IO completion) to prevent
"Number of Active IOs" from becoming negative in the fnistats output.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:57:23 -04:00
Satish Kharat ccc6d70460 scsi: fnic: minor cleanup in fnic_fcpio_itmf_cmpl_handler, removing else case
Getting rid of else case to make the flow look bit simpler.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:51:25 -04:00
Satish Kharat 9698b6f473 scsi: fnic: Avoid sending reset to firmware when another reset is in progress
This fix is to avoid calling fnic_fw_reset_handler through
fnic_host_reset when a finc reset is alreay in progress.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-05 20:41:31 -05:00
Satish Kharat 6008e96b81 scsi: fnic: Correcting rport check location in fnic_queuecommand_lck
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-16 20:41:49 -05:00
Hannes Reinecke 31c0a631a4 scsi: libfc: Replace ->lport_reset callback with function call
The ->lport_reset callback only ever had one implementation,
which already is exported. So remove it and use the function
directly.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-08 17:29:54 -05:00
Satish Kharat 1b6ac5e3ff fnic: Using rport->dd_data to check rport online instead of rport_lookup.
When issuing I/O we check if rport is online through libfc
rport_lookup() function which needs to be protected by mutex lock that
cannot acquired in I/O context. The change is to use midlayer remote
port s dd_data which is preserved until its devloss timeout and no
protection is required.  The the scsi_cmnd error code is expected to be
in the left 16 bits of the result field. Changed to correct this.  Fnic
driver version changed from 1.6.0.20 to 1.6.0.21

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-04-11 16:57:09 -04:00
Satish Kharat a36f5dd07d fnic: Cleanup the I/O pending with fw and has timed out and is used to issue LUN reset
In case of LUN reset, the device reset command is issued with one of the
I/Os that has timed out on that LUN. The change is to also return this
I/O with error status set to DID_RESET. In case when the reset is issued
using the sg_reset tool (from sg3_utils) it is a new command and new_sc
is set to 1.  Fnic driver version changed from 1.6.0.19 to 1.6.0.20

[mkp: Fixed checkpatch warning]

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-04-11 16:57:09 -04:00
Satish Kharat 691a837c20 fnic: Fix to cleanup aborted IO to avoid device being offlined by mid-layer
If an I/O times out and an abort issued by host, if the abort is
successful we need to set scsi status as DID_ABORT. Or else the
mid-layer error handler which looks for this error code, will offline
the device. Also if the original I/O is not found in fnic firmware, we
will consider the abort as successful.  The start_time assignment is
moved because of the new goto.  Fnic driver version changed from
1.6.0.17a to 1.6.0.19, version 1.6.0.18 has been skipped

[mkp: Fixed checkpatch warning]

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-04-11 16:57:09 -04:00
Maurizio Lombardi 14cee5b4de fnic: move printk()s outside of the critical code section.
This patch moves a printk() outside of the code section where interrupt
are disabled. In some cases a flood of error messages may cause a kernel
panic.  It also removes one of the printk()s because the same error
message was printed twice.

[709686.317197] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 12
[709686.317200] CPU: 12 PID: 1963 Comm: systemd-journal Tainted: GF          O--------------   3.10.0-229.el7.x86_64 #1
[709686.317201] Hardware name: Cisco Systems Inc UCSB-B200-M3/UCSB-B200-M3, BIOS B200M3.2.2.3.6.030620151309 03/06/2015
[709686.317206]  ffffffff8182b2e8 00000000392722ba ffff88046fcc5c48 ffffffff81603f36
[709686.317209]  ffff88046fcc5cc8 ffffffff815fd7da 0000000000000010 ffff88046fcc5cd8
[709686.317211]  ffff88046fcc5c78 00000000392722ba ffff88046fcc5c88 000000000000000c
[709686.317212] Call Trace:
[709686.317221]  <NMI>  [<ffffffff81603f36>] dump_stack+0x19/0x1b
[709686.317223]  [<ffffffff815fd7da>] panic+0xd8/0x1e7
[709686.317227]  [<ffffffff8110a760>] ? watchdog_enable_all_cpus.part.2+0x40/0x40
[709686.317229]  [<ffffffff8110a822>] watchdog_overflow_callback+0xc2/0xd0
[709686.317233]  [<ffffffff8114c901>] __perf_event_overflow+0xa1/0x250
[709686.317235]  [<ffffffff8114d404>] perf_event_overflow+0x14/0x20
[709686.317239]  [<ffffffff810301fd>] intel_pmu_handle_irq+0x1fd/0x410
[709686.317242]  [<ffffffff811908d1>] ? unmap_kernel_range_noflush+0x11/0x20
[709686.317246]  [<ffffffff81373574>] ? ghes_copy_tofrom_phys+0x124/0x210
[709686.317249]  [<ffffffff8160cfcb>] perf_event_nmi_handler+0x2b/0x50
[709686.317251]  [<ffffffff8160c719>] nmi_handle.isra.0+0x69/0xb0
[709686.317252]  [<ffffffff8160c830>] do_nmi+0xd0/0x340
[709686.317256]  [<ffffffff8160bb71>] end_repeat_nmi+0x1e/0x2e
[709686.317260]  [<ffffffff812e24fd>] ? memcpy+0xd/0x110
[709686.317263]  [<ffffffff812e24fd>] ? memcpy+0xd/0x110
[709686.317265]  [<ffffffff812e24fd>] ? memcpy+0xd/0x110
[709686.317269]  <<EOE>>  [<ffffffff8132c297>] ? vgacon_scroll+0x2d7/0x330
[709686.317273]  [<ffffffff813a086c>] scrup+0xfc/0x110
[709686.317275]  [<ffffffff813a0920>] lf+0xa0/0xb0
[709686.317278]  [<ffffffff813a1b32>] vt_console_print+0x2d2/0x420
[709686.317283]  [<ffffffff8106f4a1>] call_console_drivers.constprop.15+0x91/0xf0
[709686.317287]  [<ffffffff8107069f>] console_unlock+0x3bf/0x400
[709686.317291]  [<ffffffff81070996>] vprintk_emit+0x2b6/0x530
[709686.317294]  [<ffffffff815fd961>] printk_emit+0x44/0x5b
[709686.317297]  [<ffffffff81070d98>] devkmsg_writev+0x158/0x1d0
[709686.317303]  [<ffffffff811c5ef9>] do_sync_readv_writev+0x79/0xd0
[709686.317307]  [<ffffffff811c73ee>] do_readv_writev+0xce/0x260
[709686.317310]  [<ffffffff811c8d18>] ? __sb_start_write+0x58/0x110
[709686.317314]  [<ffffffff811c7615>] vfs_writev+0x35/0x60
[709686.317318]  [<ffffffff811c776c>] SyS_writev+0x5c/0xd0
[709686.317322]  [<ffffffff81613da9>] system_call_fastpath+0x16/0x1b

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-03-18 15:28:17 -04:00
Maurizio Lombardi fd6ddfa4c1 fnic: check pci_map_single() return value
the kernel prints some warnings when compiled with CONFIG_DMA_API_DEBUG.
This is because the fnic driver doesn't check the return value of
pci_map_single().

[   11.942770] scsi host12: fnic
[   11.950811] ------------[ cut here ]------------
[   11.950818] WARNING: at lib/dma-debug.c:937 check_unmap+0x47b/0x920()
[   11.950821] fnic 0000:0c:00.0: DMA-API: device driver failed to check map error[device address=0x0000002020a30040] [size=44 bytes] [mapped as single]

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed By: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-10-27 11:05:45 +09:00
Hiral Shah db196935d9 fnic: Use the local variable instead of I/O flag to acquire io_req_lock in fnic_queuecommand() to avoid deadloack
We added changes in fnic driver patch 1.6.0.16 to acquire
io_req_lock in fnic_queuecommand() before issuing I/O so that io completion
is serialized. But when releasing the lock we check for the I/O flag and
this could be modified if IO abort occurs before I/O completion. In this case
we wont release the lock and causes deadlock in some scenerios. Using the
local variable to check the IO lock status will resolve the problem.

Fixes: 41df7b02db
Signed-off-by: Hiral Shah <hishah@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Anil Chintalapati <achintal@cisco.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-08-18 08:11:23 -07:00
Anil Chintalapati (achintal) efc7a28838 fnic: IOMMU Fault occurs when IO and abort IO is out of order
When I/O is aborted by mid-layer, fnic FW will complete the I/O before
completing the abort task. In some cases abort request is completed before
the I/O, which could lead to inconsistent driver and firmware states.
In this case firmware reset would clear the inconsistent state.

Signed-off-by: Anil Chintalapati <achintal@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Hiral Shah <hishah@cisco.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-12-30 13:31:45 +01:00
Hiral Shah 41df7b02db Fnic: Fnic Driver crashed with NULL pointer reference
When issuing I/O request, if the I/O completes before returning from
fnic_queuecommand(), we may be referencing scsi_cmnd structure that may
be freed by interrupt handler. Acquring IO lock would synchronize
fnic_queuecommand and interrupt handler.

- Increment fnic version from 1.6.0.15 to 1.6.0.16

Signed-off-by: Hiral Shah <hishah@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Anil Chintalapati <achintal@cisco.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-20 09:11:00 +01:00
Hiral Shah 35061e21a1 Fnic: Improper resue of exchange Ids
IOs belonging to an rport are aborted with Internal terminate option
when rport goes offline. Any new IO issued to the rport during this
time can reuse the terminated exchange which will cause inconsistent
state of the exchange between local port and remote port.

fc_rport_priv is set to RPORT_ST_DELETE before exchanges are aborted by
libfc. Not issuing amy more I/O requests when RPORT_ST_DELETE is set,
will avoid inconsistent state of the exchange between local port and
remote port.

- Increment fnic version from 1.6.0.13 to 1.6.0.14

Signed-off-by: Hiral Shah <hishah@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Anil Chintalapati <achintal@cisco.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-20 09:10:39 +01:00
Christoph Hellwig 5066863337 scsi: remove abuses of scsi_populate_tag
Unless we want to build a SPI tag message we should just check SCMD_TAGGED
instead of reverse engineering a tag type through the use of
scsi_populate_tag_msg.

Also rename the function to spi_populate_tag_msg, make it behave like the
other spi message helpers, and move it to the spi transport class.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-12 11:19:41 +01:00
Christoph Hellwig 4d63716898 fnic: reject device resets without assigned tags for the blk-mq case
Current the midlayer fakes up a struct request for the explicit reset
ioctls, and those don't have a tag allocated to them.  The fnic driver pokes
into midlayer structures to paper over this design issue, but that won't
work for the blk-mq case.

Either someone who can actually test the hardware will have to come up with
a similar hack for the blk-mq case, or we'll have to bite the bullet and fix
the way the EH ioctls work for real, but until that happens we fail these
explicit requests here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
Cc: Hiral Patel <hiralpat@cisco.com>
Cc: Suma Ramars <sramars@cisco.com>
Cc: Brian Uchino <buchino@cisco.com>
2014-07-25 17:16:34 -04:00
Hannes Reinecke 9cb78c16f5 scsi: use 64-bit LUNs
The SCSI standard defines 64-bit values for LUNs, and large arrays
employing large or hierarchical LUN numbers become more and more
common.

So update the linux SCSI stack to use 64-bit LUN numbers.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:37 +02:00
Hiral Shah 668186637e fnic: Failing to queue aborts due to Q full cause terminate driver timeout
In fnic abort handler, abort queuing can be failed when hardware queue is full.
The command state is left as abort queued. The command with abort queued state
will never be queued next time for abort or termiantion.
Fix restores the command state in above case.

Signed-off-by: Hiral Shah <hishah@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Narsimhulu Musini <nmusini@cisco.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-05-19 13:32:55 +02:00
Hiral Patel 67125b0287 [SCSI] fnic: Fnic Statistics Collection
This feature gathers active and cumulative per fnic stats for io,
abort, terminate, reset, vlan discovery path and it also includes
various important stats for debugging issues. It also provided
debugfs and ioctl interface for user to retrieve these stats.
It also provides functionality to reset cumulative stats through
user interface.

Signed-off-by: Hiral Patel <hiralpat@cisco.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-25 09:57:57 +01:00
Narsimhulu Musini 441fbd2595 [SCSI] fnic: host reset returns nonzero value(errno) on success
Fixed appropriate error codes that returns negative error number on failure,
and 0 on success. fnic_reset() is used directly by the fc transport callback
issue_fc_host_lip which requires a negative error number on failure.

Signed-off-by: Narsimhulu Musini <nmusini@cisco.com>
Signed-off-by: Hiral Patel <hiralpat@cisco.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-25 09:57:56 +01:00
Hiral Patel fc85799ee3 [SCSI] fnic: fnic Driver Tuneables Exposed through CLI
Introduced module params to provide dynamic way of configuring
queue depth.

Added support to get max io throttle count through UCSM to
configure maximum outstanding IOs supported by fnic and push
that value to scsi mid-layer.

  Supported IO throttle values:

  UCSM IO THROTTLE VALUE        FNIC MAX OUTSTANDING IOS
  ------------------------------------------------------
        16 (Default)                    2048
        <= 256                          256
        > 256                           <ucsm value>

Signed-off-by: Hiral Patel <hiralpat@cisco.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-09-11 15:59:25 -07:00
Sesidhar Beddel d0385d9265 [SCSI] fnic: Kernel panic while running sh/nosh with max lun cfg
Kernel panics due to NULL lport while executing the log message because
of synchronization issues between libfc and scsi transport fc. Checking
for NULL pointers at the beginning of this routine would resolve the issue
from kernel panic point of view.

Signed-off-by: Sesidhar Baddel <sebaddel@cisco.com>
Signed-off-by: Hiral Patel <hiralpat@cisco.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-09-11 15:56:44 -07:00
Sesidhar Beddel 1259c5dc75 [SCSI] fnic: Hitting BUG_ON(io_req->abts_done) in fnic_rport_exch_reset
Hitting BUG_ON(io_req->abts_done) in fnic_rport_exch_reset in case of
timing issue and also to some extent locking issue where abts and terminate
is happening around same timing.

The code changes are intended to update CMD_STATE(sc) and
io_req->abts_done together.

Signed-off-by: Sesidhar Beddel <sebaddel@cisco.com>
Signed-off-by: Hiral Patel <hiralpat@cisco.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-09-11 15:54:08 -07:00
Suma Ramars 318c7c4325 [SCSI] fnic: Remove QUEUE_FULL handling code
Remove fnic driver QUEUE_FULL handling code instead let SCSI mid layer
handle queue full and use its algorithm to ramp down/up queue

Signed-off-by: Suma Ramars <sramars@cisco.com>
Signed-off-by: Hiral Patel <hiralpat@cisco.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-09-11 15:49:30 -07:00
Dan Carpenter 5d65f91896 [SCSI] fnic: potential dead lock in fnic_is_abts_pending()
There is an unlock missing if the == FNIC_IOREQ_ABTS_PENDING is
false.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Hiral Patel <hiralpat@cisco.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-06-26 17:41:44 -07:00
Hiral Patel 4d7007b49d [SCSI] fnic: Fnic Trace Utility
Fnic Trace utility is a tracing functionality built directly into fnic driver
to trace events. The benefit that trace buffer brings to fnic driver is the
ability to see what it happening inside the fnic driver. It also provides the
capability to trace every IO event inside fnic driver to debug panics, hangs
and potentially IO corruption issues. This feature makes it easy to find
problems in fnic driver and it also helps in tracking down strange bugs in a
more manageable way. Trace buffer is shared across all fnic instances for
this implementation.

Signed-off-by: Hiral Patel <hiralpat@cisco.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-02-22 17:32:07 +00:00
Hiral Patel 14eb5d905d [SCSI] fnic: New debug flags and debug log messages
Added new fnic debug flags for identifying IO state at every stage of IO while
debugging and also added more log messages for better debugging capability.

Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Hiral Patel <hiralpat@cisco.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-02-22 17:31:09 +00:00
Hiral Patel a0bf1ca27b [SCSI] fnic: fnic driver may hit BUG_ON on device reset
The issue was observed when LUN Reset is issued through IOCTL or sg_reset
utility.

fnic driver issues LUN RESET to firmware. On successful completion of device
reset, driver cleans up all the pending IOs that were issued prior to device
reset. These pending IOs are expected to be in ABTS_PENDING state. This works
fine, when the device reset operation resulted from midlayer, but not when
device reset was triggered from IOCTL path as the pending IOs were not in
ABTS_PENDING state. execution path hits panic if the pending IO is not in
ABTS_PENDING state.

Changes:
The fix replaces BUG_ON check in fnic_clean_pending_aborts() with marking
pending IOs as ABTS_PENDING if they were not in ABTS_PENDING state and skips
if they were already in ABTS_PENDING state. An extra check is added to validate
the abort status of the commands after a delay of 2 * E_D_TOV using a
helper function. The helper function returns 1 if it finds any pending IO in
ABTS_PENDING state, belong to the LUN on which device reset was issued else 0.
With this, device reset operation returns success only if the helper funciton
returns 0, otherwise it returns failure.

Other changes:
- Removed code in fnic_clean_pending_aborts() that returns failure if it finds
  io_req NULL, instead of returning failure added code to continue with next io
- Added device reset flags for debugging in fnic_terminate_rport_io,
  fnic_rport_exch_reset, and fnic_clean_pending_aborts

Signed-off-by: Narsimhulu Musini <nmusini@cisco.com>
Signed-off-by: Hiral Patel <hiralpat@cisco.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-02-22 17:30:19 +00:00
Hiral Patel 03298552cb [SCSI] fnic: fixing issues in device and firmware reset code
1. Handling overlapped firmware resets
     This fix serialize multiple firmware resets to avoid situation where fnic
     device fails to come up for link up event, when firmware resets are issued
     back to back. If there are overlapped firmware resets are issued,
     the firmware reset operation checks whether there is any firmware reset in
     progress, if so it polls for its completion in a loop with 100ms delay.

2. Handling device reset timeout
     fnic_device_reset code has been modified to handle Device reset timeout:
     - Issue terminate on device reset timeout.
     - Introduced flags field (one of the scratch fields in scsi_cmnd).
     With this, device reset request would have DEVICE_RESET flag set for other
     routines to determine the type of the request.
     Also modified fnic_terminate_rport_io, fnic_rport_exch_rset, completion
     routines to handle SCSI commands with DEVICE_RESET flag.

3. LUN/Device Reset hangs when issued through IOCTL using utilities like
   sg_reset.
     Each SCSI command is associated with a valid tag, fnic uses this tag to
     retrieve associated scsi command on completion. the LUN/Device Reset issued
     through IOCTL resulting into a SCSI command that is not associated with a
     valid tag. So fnic fails to retrieve associated scsi command on completion,
     which causes hang. This fix allocates tag, associates it with the
     scsi command and frees the tag, when the operation completed.

4. Preventing IOs during firmware reset.
     Current fnic implementation allows IO submissions during firmware reset.
     This fix synchronizes IO submissions and firmware reset operations.
     It ensures that IOs issued to fnic prior to reset will be issued to the
     firmware before firmware reset.

Signed-off-by: Narsimhulu Musini <nmusini@cisco.com>
Signed-off-by: Hiral Patel <hiralpat@cisco.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-02-22 17:28:19 +00:00