When the system is shutting down, iscsid is not running so we will not get
a response to the ISCSI_ERR_INVALID_HOST error event. The system shutdown
will then hang waiting on userspace to remove the session.
This has libiscsi force the destruction of the session from the kernel when
iscsi_host_remove() is called from a driver's shutdown callout.
This fixes a regression added in qedi boot with commit d1f2ce7763 ("scsi:
qedi: Fix host removal with running sessions") which made qedi use the
common session removal function that waits on userspace instead of rolling
its own kernel based removal.
Link: https://lore.kernel.org/r/20220616222738.5722-7-michael.christie@oracle.com
Fixes: d1f2ce7763 ("scsi: qedi: Fix host removal with running sessions")
Tested-by: Nilesh Javali <njavali@marvell.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When handling errors that lead to host removal use QEDI_MODE_NORMAL. There
is currently no difference in behavior between NORMAL and SHUTDOWN, but in
a subsequent commit we will want to know when we are called from the
pci_driver shutdown callout vs remove/err_handler so we know when userspace
is up.
Link: https://lore.kernel.org/r/20220616222738.5722-6-michael.christie@oracle.com
Tested-by: Nilesh Javali <njavali@marvell.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We set the qedi_ep state to EP_STATE_OFLDCONN_START when the ep is
created. Then in qedi_set_path we kick off the offload work. If userspace
times out the connection and calls ep_disconnect, qedi will only flush the
offload work if the qedi_ep state has transitioned away from
EP_STATE_OFLDCONN_START. If we can't connect we will not have transitioned
state and will leave the offload work running, and we will free the qedi_ep
from under it.
This patch just has us init the work when we create the ep, then always
flush it.
Link: https://lore.kernel.org/r/20220408001314.5014-10-michael.christie@oracle.com
Tested-by: Manish Rangankar <mrangankar@marvell.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This series consists of the usual driver updates (qla2xxx, pm8001,
libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates
and bug fixes. The high blast radius core update is the removal of
write same, which affects block and several non-SCSI devices. The
other big change, which is more local, is the removal of the SCSI
pointer.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYjzDQyYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQMYAQDEWUGV
6U0+736AHVtOfiMNfiRN79B1HfXVoHvemnPcTwD/UlndwFfy/3GGOtoZmqEpc73J
Ec1HDuUCE18H1H2QAh0=
=/Ty9
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This series consists of the usual driver updates (qla2xxx, pm8001,
libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates
and bug fixes.
The high blast radius core update is the removal of write same, which
affects block and several non-SCSI devices. The other big change,
which is more local, is the removal of the SCSI pointer"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (281 commits)
scsi: scsi_ioctl: Drop needless assignment in sg_io()
scsi: bsg: Drop needless assignment in scsi_bsg_sg_io_fn()
scsi: lpfc: Copyright updates for 14.2.0.0 patches
scsi: lpfc: Update lpfc version to 14.2.0.0
scsi: lpfc: SLI path split: Refactor BSG paths
scsi: lpfc: SLI path split: Refactor Abort paths
scsi: lpfc: SLI path split: Refactor SCSI paths
scsi: lpfc: SLI path split: Refactor CT paths
scsi: lpfc: SLI path split: Refactor misc ELS paths
scsi: lpfc: SLI path split: Refactor VMID paths
scsi: lpfc: SLI path split: Refactor FDISC paths
scsi: lpfc: SLI path split: Refactor LS_RJT paths
scsi: lpfc: SLI path split: Refactor LS_ACC paths
scsi: lpfc: SLI path split: Refactor the RSCN/SCR/RDF/EDC/FARPR paths
scsi: lpfc: SLI path split: Refactor PLOGI/PRLI/ADISC/LOGO paths
scsi: lpfc: SLI path split: Refactor base ELS paths and the FLOGI path
scsi: lpfc: SLI path split: Introduce lpfc_prep_wqe
scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4
scsi: lpfc: SLI path split: Refactor lpfc_iocbq
scsi: lpfc: Use kcalloc()
...
Instead of storing the iSCSI task pointer and the session age in the SCSI
pointer, use command-private variables. This patch prepares for removal of
the SCSI pointer from struct scsi_cmnd.
The list of iSCSI drivers has been obtained as follows:
$ git grep -lw iscsi_host_alloc
drivers/infiniband/ulp/iser/iscsi_iser.c
drivers/scsi/be2iscsi/be_main.c
drivers/scsi/bnx2i/bnx2i_iscsi.c
drivers/scsi/cxgbi/libcxgbi.c
drivers/scsi/iscsi_tcp.c
drivers/scsi/libiscsi.c
drivers/scsi/qedi/qedi_main.c
drivers/scsi/qla4xxx/ql4_os.c
include/scsi/libiscsi.h
Note: it is not clear to me how the qla4xxx driver can work without this
patch since it uses the scsi_cmnd::SCp.ptr member for two different
purposes:
- The qla4xxx driver uses this member to store a struct srb pointer.
- libiscsi uses this member to store a struct iscsi_task pointer.
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Cc: Chris Leech <cleech@redhat.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Nilesh Javali <njavali@marvell.com>
Cc: Manish Rangankar <mrangankar@marvell.com>
Cc: Karen Xie <kxie@chelsio.com>
Cc: Ketan Mukadam <ketan.mukadam@broadcom.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
iscsi
Link: https://lore.kernel.org/r/20220218195117.25689-26-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This fixes a deadlock added with commit b40f3894e3 ("scsi: qedi: Complete
TMF works before disconnect")
Bug description from Jia-Ju Bai:
qedi_process_tmf_resp()
spin_lock(&session->back_lock); --> Line 201 (Lock A)
spin_lock(&qedi_conn->tmf_work_lock); --> Line 230 (Lock B)
qedi_process_cmd_cleanup_resp()
spin_lock_bh(&qedi_conn->tmf_work_lock); --> Line 752 (Lock B)
spin_lock_bh(&conn->session->back_lock); --> Line 784 (Lock A)
When qedi_process_tmf_resp() and qedi_process_cmd_cleanup_resp() are
concurrently executed, the deadlock can occur.
This patch fixes the deadlock by not holding the tmf_work_lock in
qedi_process_cmd_cleanup_resp while holding the back_lock. The
tmf_work_lock is only needed while we remove the tmf_work from the
work_list.
Link: https://lore.kernel.org/r/20220208185448.6206-1-michael.christie@oracle.com
Fixes: b40f3894e3 ("scsi: qedi: Complete TMF works before disconnect")
Cc: Manish Rangankar <mrangankar@marvell.com>
Cc: Nilesh Javali <njavali@marvell.com>
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
destroy_workqueue() already drains the queue before destroying it, so there
is no need to flush it explicitly.
Remove the redundant flush_workqueue() calls.
Link: https://lore.kernel.org/r/20220127013934.1184923-1-chi.minghao@zte.com.cn
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn>
Signed-off-by: CGEL ZTE <cgel.zte@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull in the 5.16 fixes branch to resolve a conflict in the UFS driver
core.
Conflicts:
drivers/scsi/ufs/ufshcd.c
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The format used for formatting SYSFS_FLAG_FW_SEL_BOOT creates the
following warning:
drivers/scsi/qedi/qedi_main.c:2259:35: warning: format specifies type
'char' but the argument has type 'int' [-Wformat]
rc = snprintf(buf, 3, "%hhd\n",
SYSFS_FLAG_FW_SEL_BOOT);
Fix this to cast the constant as a char since the intention is to print it
via sysfs as a byte.
Link: https://lore.kernel.org/r/20211130203813.12138-2-f.fainelli@gmail.com
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The variable 'page' is set but never used throughout qedi_alloc_bdq().
Therefore remove it.
Link: https://lore.kernel.org/r/20211126201708.27140-2-f.fainelli@gmail.com
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This series consists of the usual driver updates (ufs, smartpqi, lpfc,
target, megaraid_sas, hisi_sas, qla2xxx) and minor updates and bug
fixes. Notable core changes are the removal of scsi->tag which caused
some churn in obsolete drivers and a sweep through all drivers to call
scsi_done() directly instead of scsi->done() which removes a pointer
indirection from the hot path and a move to register core sysfs files
earlier, which means they're available to KOBJ_ADD processing, which
necessitates switching all drivers to using attribute groups.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYYUfBCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishbUJAQDZt4oc
vUx9JpyrdHxxTCuOzVFd8W1oJn0k5ltCBuz4yAD8DNbGhGm93raMSJ3FOOlzLEbP
RG8vBdpxMudlvxAPi/A=
=BSFz
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This consists of the usual driver updates (ufs, smartpqi, lpfc,
target, megaraid_sas, hisi_sas, qla2xxx) and minor updates and bug
fixes.
Notable core changes are the removal of scsi->tag which caused some
churn in obsolete drivers and a sweep through all drivers to call
scsi_done() directly instead of scsi->done() which removes a pointer
indirection from the hot path and a move to register core sysfs files
earlier, which means they're available to KOBJ_ADD processing, which
necessitates switching all drivers to using attribute groups"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits)
scsi: lpfc: Update lpfc version to 14.0.0.3
scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss
scsi: lpfc: Fix link down processing to address NULL pointer dereference
scsi: lpfc: Allow PLOGI retry if previous PLOGI was aborted
scsi: lpfc: Fix use-after-free in lpfc_unreg_rpi() routine
scsi: lpfc: Correct sysfs reporting of loop support after SFP status change
scsi: lpfc: Wait for successful restart of SLI3 adapter during host sg_reset
scsi: lpfc: Revert LOG_TRACE_EVENT back to LOG_INIT prior to driver_resource_setup()
scsi: ufs: ufshcd-pltfrm: Fix memory leak due to probe defer
scsi: ufs: mediatek: Avoid sched_clock() misuse
scsi: mpt3sas: Make mpt3sas_dev_attrs static
scsi: scsi_transport_sas: Add 22.5 Gbps link rate definitions
scsi: target: core: Stop using bdevname()
scsi: aha1542: Use memcpy_{from,to}_bvec()
scsi: sr: Add error handling support for add_disk()
scsi: sd: Add error handling support for add_disk()
scsi: target: Perform ALUA group changes in one step
scsi: target: Replace lun_tg_pt_gp_lock with rcu in I/O path
scsi: target: Fix alua_tg_pt_gps_count tracking
scsi: target: Fix ordered tag handling
...
struct device supports attribute groups directly but does not support
struct device_attribute directly. Hence switch to attribute groups.
Link: https://lore.kernel.org/r/20211012233558.4066756-39-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Initialize 2 MSL timeout value used for the TCP TIME_WAIT state to
non-zero default.
This patch also removes magic number from qedi/qedi_main.c.
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Nikolay Assa <nassa@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update TCP silly-window-syndrome timeout, for the cases where
initiator's small TCP window size prevents FW from transmitting
packets on the connection. Timeout causes FW to retransmit
window probes if needed, preventing I/O stall if initiator ignores
first window probe.
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Nikolay Assa <nassa@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing qed/qede/qedr/qedi/qedf code uses chip-specific naming in
structures, functions, variables and defines in FW HSI (Hardware
Software Interface).
The new FW version introduced a generic naming convention in HSI
in-which the same code will be used across different versions
for simpler maintainability. It also eases in providing support for
new features.
With this patch every "_e4" or "e4_" prefix or suffix is not needed
anymore and it will be removed.
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Reviewed-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function had some left over code that returned 1 on error instead
negative error codes. Convert everything to use negative error codes. The
caller treats all non-zero returns the same so this does not affect run
time.
A couple places set "rc" instead of "status" so those error paths ended up
returning success by mistake. Get rid of the "rc" variable and use
"status" everywhere.
Remove the bogus "status = 0" initialization, as a future proofing measure
so the compiler will warn about uninitialized error codes.
Link: https://lore.kernel.org/r/20210810084753.GD23810@kili
Fixes: ace7f46ba5 ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver fastpath employs doorbells to indicate to the device that work is
available. Each doorbell translates to a message sent to the device over
PCI. These messages are queued by the doorbell queue HW block, and handled
by the HW.
If a sufficient amount of CPU cores are sending messages at a sufficient
rate, the queue can overflow, and messages can be dropped. There are many
entities in the driver which can send doorbell messages. When overflow
happens, a fatal HW attention is indicated, and the Doorbell HW block stops
accepting new doorbell messages until recovery procedure is done.
When overflow occurs, all doorbells are dropped. Since doorbells are
aggregatives, if more doorbells are sent nothing has to be done. But if
the "last" doorbell is dropped, the doorbelling entity doesn’t know this
happened, and may wait forever for the device to perform the action. The
doorbell recovery mechanism addresses just that - it sends the last
doorbell of every entity.
[mkp: fix missing brackets reported by Guenter Roeck]
Link: https://lore.kernel.org/r/20210804221412.5048-1-smalin@marvell.com
Co-developed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Prepare for removal of the request pointer by using scsi_cmd_to_rq()
instead. This patch does not change any functionality.
Link: https://lore.kernel.org/r/20210809230355.8186-37-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use DEVICE_ATTR_RO() macro helper instead of plain DEVICE_ATTR(), which
makes the code a bit shorter and easier to read.
Link: https://lore.kernel.org/r/20210616034419.725-2-thunder.leizhen@huawei.com
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
qedi_clear_session_ctx() could race with the in-kernel or userspace driven
recovery/removal and we could access a NULL conn or do a double free.
We should be using iscsi_host_remove() to start the removal process from
the driver. It will start the in-kernel recovery and notify userspace that
the driver's scsi_hosts are being removed. iscsid will then drive the
session removal like is done when the logout command is run. When the
sessions are removed, iscsi_host_remove() will return so qedi can finish
knowing there are no running sessions and no new sessions will be allowed.
This also fixes an issue where we check for a NULL conn after already
accessing it introduced in commit 27e986289e ("scsi: iscsi: Drop suspend
calls from ep_disconnect") by just removing the function completely.
Link: https://lore.kernel.org/r/20210609192709.5094-1-michael.christie@oracle.com
Fixes: 27e986289e ("scsi: iscsi: Drop suspend calls from ep_disconnect")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If we got a response then we should always wake up the conn. For both the
cmd_cleanup_req == 0 or cmd_cleanup_req > 0, we shouldn't dig into
iscsi_itt_to_task because we don't know what the upper layers are doing.
We can also remove the qedi_clear_task_idx call here because once we signal
success libiscsi will loop over the affected commands and end up calling
the cleanup_task callout which will release it.
Link: https://lore.kernel.org/r/20210525181821.7617-29-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We need to make sure that abort and reset completion work has completed
before ep_disconnect returns. After ep_disconnect we can't manipulate
cmds because libiscsi will call conn_stop and take onwership.
We are trying to make sure abort work and reset completion work has
completed before we do the cmd clean up in ep_disconnect. The problem is
that:
1. the work function sets the QEDI_CONN_FW_CLEANUP bit, so if the work was
still pending we would not see the bit set. We need to do this before
the work is queued.
2. If we had multiple works queued then we could break from the loop in
qedi_ep_disconnect early because when abort work 1 completes it could
clear QEDI_CONN_FW_CLEANUP. qedi_ep_disconnect could then see that
before work 2 has run.
3. A TMF reset completion work could run after ep_disconnect starts
cleaning up cmds via qedi_clearsq. ep_disconnect's call to qedi_clearsq
-> qedi_cleanup_all_io would might think it's done cleaning up cmds,
but the reset completion work could still be running. We then return
from ep_disconnect while still doing cleanup.
This replaces the bit with a counter to track the number of queued TMF
works, and adds a bool to prevent new works from starting from the
completion path once a ep_disconnect starts.
Link: https://lore.kernel.org/r/20210525181821.7617-28-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
qedi_abort_work knows what task to abort so just pass it to send_iscsi_tmf.
Link: https://lore.kernel.org/r/20210525181821.7617-27-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Drivers shouldn't be calling block/unblock session for cmd cleanup because
the functions can change the session state from under libiscsi. This adds
a new a driver level bit so it can block all I/O the host while it drains
the card.
Link: https://lore.kernel.org/r/20210525181821.7617-26-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Drivers shouldn't be calling block/unblock session for tmf handling because
the functions can change the session state from under libiscsi.
iscsi_queuecommand's call to iscsi_prep_scsi_cmd_pdu->
iscsi_check_tmf_restrictions will prevent new cmds from being sent to qedi
after we've started handling a TMF. So we don't need to try and block it in
the driver, and we can remove these block calls.
Link: https://lore.kernel.org/r/20210525181821.7617-25-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We run from a workqueue with no locks held so use GFP_NOIO.
Link: https://lore.kernel.org/r/20210525181821.7617-24-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
qedi_iscsi_abort_work and qedi_tmf_work both allocate a tid then call
qedi_send_iscsi_tmf which also allocates a tid. This removes the tid
allocation from the callers.
Link: https://lore.kernel.org/r/20210525181821.7617-23-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If qedi_tmf_work's qedi_wait_for_cleanup_request call times out we will
also force the clean up of the qedi_work_map but
qedi_process_cmd_cleanup_resp could still be accessing the qedi_cmd.
To fix this issue we extend where we hold the tmf_work_lock and back_lock
so the qedi_process_cmd_cleanup_resp access is serialized with the cleanup
done in qedi_tmf_work and any completion handling for the iscsi_task.
Link: https://lore.kernel.org/r/20210525181821.7617-22-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the SCSI cmd completes after qedi_tmf_work calls iscsi_itt_to_task then
the qedi qedi_cmd->task_id could be freed and used for another cmd. If we
then call qedi_iscsi_cleanup_task with that task_id we will be cleaning up
the wrong cmd.
Wait to release the task_id until the last put has been done on the
iscsi_task. Because libiscsi grabs a ref to the task when sending the
abort, we know that for the non-abort timeout case that the task_id we are
referencing is for the cmd that was supposed to be aborted.
A latter commit will fix the case where the abort times out while we are
running qedi_tmf_work.
Link: https://lore.kernel.org/r/20210525181821.7617-21-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If qedi_process_cmd_cleanup_resp finds the cmd it frees the work and sets
list_tmf_work to NULL, so qedi_tmf_work should check if list_tmf_work is
non-NULL when it wants to force cleanup.
Link: https://lore.kernel.org/r/20210525181821.7617-20-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The iscsi offload drivers are setting the shost->max_id to the max number
of sessions they support. The problem is that max_id is not the max number
of targets but the highest identifier the targets can have. To use it to
limit the number of targets we need to set it to max sessions - 1, or we
can end up with a session we might not have preallocated resources for.
Link: https://lore.kernel.org/r/20210525181821.7617-15-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Subsequent commits allow the kernel to do ep_disconnect. In that case we
will have to get a proper refcount on the ep so one thread does not delete
it from under another.
Link: https://lore.kernel.org/r/20210525181821.7617-7-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
libiscsi will now suspend the send/tx queue for the drivers so we can drop
it from the drivers ep_disconnect.
Link: https://lore.kernel.org/r/20210525181821.7617-4-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During ep_disconnect we have been doing iscsi_suspend_tx/queue to block new
I/O but every driver except cxgbi and iscsi_tcp can still get I/O from
__iscsi_conn_send_pdu() if we haven't called iscsi_conn_failure() before
ep_disconnect. This could happen if we were terminating the session, and
the logout timed out before it was even sent to libiscsi.
Fix the issue by adding a helper which reverses the bind_conn call that
allows new I/O to be queued. Drivers implementing ep_disconnect can use this
to make sure new I/O is not queued to them when handling the disconnect.
Link: https://lore.kernel.org/r/20210525181821.7617-3-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull 5.12/scsi-fixes into the 5.13 SCSI tree to provide a baseline for
some UFS changes that would otherwise cause conflicts during the
merge.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Variable err is assigned -ENOMEM followed by an error return path via label
err_udev that does not access the variable and returns with the -ENOMEM
error return code. The assignment to err is redundant and can be removed.
Link: https://lore.kernel.org/r/20210327230650.25803-1-colin.king@canonical.com
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When kzalloc() returns NULL to qedi->global_queues[i], no error return code
of qedi_alloc_global_queues() is assigned. To fix this bug, status is
assigned with -ENOMEM in this case.
Link: https://lore.kernel.org/r/20210308033024.27147-1-baijiaju1990@gmail.com
Fixes: ace7f46ba5 ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Acked-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The CHAP secret displayed garbage characters causing iSCSI login
authentication failure. Correct the CHAP password max length.
Link: https://lore.kernel.org/r/20201217105144.8055-1-njavali@marvell.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add the missing destroy_workqueue() before return from __qedi_probe in the
error handling case when fails to create workqueue qedi->offload_thread.
Link: https://lore.kernel.org/r/20201109091518.55941-1-miaoqinglang@huawei.com
Fixes: ace7f46ba5 ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On fan failure event from MFW, bring down active connections and unload
the firmware context.
Link: https://lore.kernel.org/r/20200924070338.8270-1-mrangankar@marvell.com
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The error recovery is handled by management firmware (MFW) with the help of
qed/qedi drivers. Upon detecting errors, driver informs MFW about this
event which in turn starts a recovery process. MFW sends ERROR_RECOVERY
notification to the driver which performs the required cleanup/recovery
from the driver side.
Link: https://lore.kernel.org/r/20200908095657.26821-9-mrangankar@marvell.com
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add support to initiate MFW process recovery for all the devices if storage
function receives the event first.
Also added fix for kernel test robot warning,
>> drivers/scsi/qedi/qedi_main.c:1119:6: warning: no previous prototype
>> for 'qedi_schedule_hw_err_handler' [-Wmissing-prototypes]
Link: https://lore.kernel.org/r/20200908095657.26821-8-mrangankar@marvell.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For short time cable pulls, the in-flight I/O to the firmware is never
cleaned up, resulting in the behaviour of stale I/O completion causing
list_del corruption and soft lockup of the system.
On link down event, mark all the connections for recovery, causing cleanup
of all the in-flight I/O immediately.
Link: https://lore.kernel.org/r/20200908095657.26821-7-mrangankar@marvell.com
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Protect active command list for non-I/O commands like login response,
logout response, text response, and recovery cleanup of active list to
avoid list corruption.
Link: https://lore.kernel.org/r/20200908095657.26821-5-mrangankar@marvell.com
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While aborting the I/O, the firmware cleanup task timed out and driver
deleted the I/O from active command list. Some time later the firmware
sent the cleanup task response and driver again deleted the I/O from
active command list causing firmware to send completion for non-existent
I/O and list_del corruption of active command list.
Add fix to check if I/O is present before deleting it from the active
command list to ensure firmware sends valid I/O completion and protect
against list_del corruption.
Link: https://lore.kernel.org/r/20200908095657.26821-4-mrangankar@marvell.com
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In boot from SAN scenario when qedi PCI shutdown handler is called with
active iSCSI sessions, sometimes target takes too long time to respond to
firmware connection termination request. Instead skip sending termination
ramrod and progress with unload path.
Link: https://lore.kernel.org/r/20200908095657.26821-3-mrangankar@marvell.com
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
To avoid unnecessary vector allocation when the number of fast-path queues
is less then available msix vectors, use return count from module
qed->set_fp_int.
Link: https://lore.kernel.org/r/20200908095657.26821-2-mrangankar@marvell.com
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
kfree_skb() handles a NULL skb argument so the additional check is
unnecessary. Remove it.
Link: https://lore.kernel.org/r/20200827092606.32148-1-vulab@iscas.ac.cn
Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>