The following Oops may be encountered if the device is reset, i.e. EEH
recovery, while there is heavy I/O traffic:
59:mon> t
[c000200db64bb680] c008000009264c40 cxlflash_queuecommand+0x3b8/0x500
[cxlflash]
[c000200db64bb770] c00000000090d3b0 scsi_dispatch_cmd+0x130/0x2f0
[c000200db64bb7f0] c00000000090fdd8 scsi_request_fn+0x3c8/0x8d0
[c000200db64bb900] c00000000067f528 __blk_run_queue+0x68/0xb0
[c000200db64bb930] c00000000067ab80 __elv_add_request+0x140/0x3c0
[c000200db64bb9b0] c00000000068daac blk_execute_rq_nowait+0xec/0x1a0
[c000200db64bba00] c00000000068dbb0 blk_execute_rq+0x50/0xe0
[c000200db64bba50] c0000000006b2040 sg_io+0x1f0/0x520
[c000200db64bbaf0] c0000000006b2e94 scsi_cmd_ioctl+0x534/0x610
[c000200db64bbc20] c000000000926208 sd_ioctl+0x118/0x280
[c000200db64bbcc0] c00000000069f7ac blkdev_ioctl+0x7fc/0xe30
[c000200db64bbd20] c000000000439204 block_ioctl+0x84/0xa0
[c000200db64bbd40] c0000000003f8514 do_vfs_ioctl+0xd4/0xa00
[c000200db64bbde0] c0000000003f8f04 SyS_ioctl+0xc4/0x130
[c000200db64bbe30] c00000000000b184 system_call+0x58/0x6c
When there is no room to send the I/O request, the cached room is refreshed
by reading the memory mapped command room value from the AFU. The AFU
register mapping is refreshed during a reset, creating a race condition that
can lead to the Oops above.
During a device reset, the AFU should not be unmapped until all the active
send threads quiesce. An atomic counter, cmds_active, is currently used to
track internal AFU commands and quiesce during reset. This same counter can
also be used for the active send threads.
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently we don't check that device is not gone before dereferencing
its elements in the function hisi_sas_task_exec() (specifically, the DQ
pointer).
This patch fixes this issue by filling in the DQ pointer in
hisi_sas_task_prep() after we check that the device pointer is still
safe to reference.
[mkp: typo]
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The IPTT of a slot is unique, and we currently use hisi_hba lock to
protect it.
Now slot is managed on hisi_sas_device.list, so use DQ lock to protect
for allocating and freeing the slot.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently we lock the DQ to protect whole delivery process. So this
stops us building slots for the same queue in parallel, and can affect
performance.
To optimise it, only lock the DQ during special periods, specifically
when allocating a slot from the DQ and when delivering a slot to the HW.
This approach is now safe, thanks to the previous patches to ensure that
we always deliver a slot to the HW once allocated.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently we allocate the slot's memory buffer after allocating the DQ
slot.
To aid DQ lockout reduction, and allow slots to be built in parallel,
move this step (which can fail) prior to allocating the slot.
Also a stray spin_unlock_irqrestore() is removed from internal task exec
function.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since the task prep functions now should not fail, adjust the return
types to void.
In addition, some checks in the task prep functions are relocated to the
main module; this is specifically the check for the number of elements
in an sg list exceeded the HW SGE limit.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently we use DQ lock to protect delivery of DQ entry one by one.
To optimise to allow more than one slot to be built for a single DQ in
parallel, we need to remove the DQ lock when preparing slots, prior to
delivery.
To achieve this, we rearrange the slot build order to ensure that once
we allocate a slot for a task, we do cannot fail to deliver the task.
In this patch, we rearrange the slot building for SMP tasks to ensure
that sg mapping part (which can fail) happens before we allocate the
slot in the DQ.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This makes ufshcd_config_pwr_mode non-static so that other vendors like
exynos can use it.
Signed-off-by: Seungwon Jeon <essuuj@gmail.com>
Signed-off-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some host controllers don't support host controller enable via HCE.
Signed-off-by: Seungwon Jeon <essuuj@gmail.com>
Signed-off-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some host controllers support interrupt aggregation but don't allow
resetting counter and timer in software.
Signed-off-by: Seungwon Jeon <essuuj@gmail.com>
Signed-off-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In the right behavior, setting the bit to '0' indicates clear and '1'
indicates no change. If host controller handles this the other way,
UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR can be used.
[mkp: typo]
Signed-off-by: Seungwon Jeon <essuuj@gmail.com>
Signed-off-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Reviewed-by: "Asutosh Das (asd)" <asutoshd@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On the quest to remove all VLAs from the kernel[1] this moves buffers
off the stack. In the second instance, this collapses two separately
allocated buffers into a single buffer, since they are used
consecutively, which saves 256 bytes (QUERY_DESC_MAX_SIZE + 1) of stack
space.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This shall help avoid copying uninitialized memory to the userspace when
calling ioctl(fd, SG_IO) with an empty command.
Reported-by: syzbot+7d26fc1eea198488deab@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls. Switch to use
proc_create_seq where applicable.
Also don't bother handling proc_create* failures - the driver works
perfectly fine without the proc files, and the cleanup will handle
missing files gracefully.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls. Switch to use
proc_create_single.
Also don't bother handling proc_create* failures - the driver works
perfectly fine without the proc files, and the cleanup will handle
missing files gracefully.
Signed-off-by: Christoph Hellwig <hch@lst.de>
"make clean" should remove the generated file "scsi_devinfo_tbl.c", so
list it in the clean-files variable so that the file gets cleaned up.
Fixes: 345e29608b ("scsi: scsi: Export blacklist flags to sysfs")
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch adds new adapter error log for P9 system with the new AZ SAS
cable.
Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Trivial fix to spelling mistake in esas2r_debug message.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On the quest to remove all VLAs from the kernel[1] this rearranges the
code to avoid a VLA warning under -Wvla (gcc doesn't recognize "const"
variables as not triggering VLA creation). Additionally cleans up
variable naming to avoid 80 character column limit.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Same numerical value (for now at least), but a much better documentation
of intent.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk_old_get_request already has it at hand, and in blk_queue_bio, which
is the fast path, it is constant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Switch everyone to blk_get_request_flags, and then rename
blk_get_request_flags to blk_get_request.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Always GFP_KERNEL, and keeping it would cause serious complications for
the next change.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In tw_chrdev_ioctl(), the length of the data buffer is firstly copied
from the userspace pointer 'argp' and saved to the kernel object
'data_buffer_length'. Then a security check is performed on it to make
sure that the length is not more than 'TW_MAX_IOCTL_SECTORS *
512'. Otherwise, an error code -EINVAL is returned. If the security
check is passed, the entire ioctl command is copied again from the
'argp' pointer and saved to the kernel object 'tw_ioctl'. Then, various
operations are performed on 'tw_ioctl' according to the 'cmd'. Given
that the 'argp' pointer resides in userspace, a malicious userspace
process can race to change the buffer length between the two
copies. This way, the user can bypass the security check and inject
invalid data buffer length. This can cause potential security issues in
the following execution.
This patch checks for capable(CAP_SYS_ADMIN) in tw_chrdev_open() to
avoid the above issues.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In twa_chrdev_ioctl(), the ioctl driver command is firstly copied from
the userspace pointer 'argp' and saved to the kernel object
'driver_command'. Then a security check is performed on the data buffer
size indicated by 'driver_command', which is
'driver_command.buffer_length'. If the security check is passed, the
entire ioctl command is copied again from the 'argp' pointer and saved
to the kernel object 'tw_ioctl'. Then, various operations are performed
on 'tw_ioctl' according to the 'cmd'. Given that the 'argp' pointer
resides in userspace, a malicious userspace process can race to change
the buffer size between the two copies. This way, the user can bypass
the security check and inject invalid data buffer size. This can cause
potential security issues in the following execution.
This patch checks for capable(CAP_SYS_ADMIN) in twa_chrdev_open()t o
avoid the above issues.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If we had more than 32 megaraid cards then it would cause memory
corruption. That's not likely, of course, but it's handy to enforce it
and make the static checker happy.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Trivial fix to spelling mistakes in lpfc_printf_log log message
"mabilbox" -> "mailbox"
"maibox" -> "mailbox"
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is an SoC bug of v3 hw development version. When hot- unplugging a
directly attached disk, the PHY down interrupt may not happen. It is
very easy to appear on some boards.
When this issue occurs, the controller will receive many invalid dword
frames, and the "alos" fields of register HILINK_ERR_DFX can indicate
that disk was unplugged.
As an workaround solution, this patch detects this issue in the channel
interrupt, and workaround it by following steps:
- Disable the PHY
- Clear error code and interrupt
- Enable the PHY
Then the HW will reissue PHY down interrupt.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It is common to use readl poll timeout helpers in the driver, so create
custom wrappers.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Event95 is used for DFX purpose. The relevant bit for this interrupt in
the ENT_INT_SRC_MSK3 register has been disabled, so remove the
processing.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
As a unconstrained command, a command can be sent to SATA disk even if
SATA disk status is BUSY, ERR or DRQ.
If an ATA reset assert is successful but ATA reset de-assert fails, then
it will retry the reset de-assert. If reset de- assert retry is
successful, we think it is okay to probe the device but actually it
still has Err status.
Apparently we need to retry the ATA reset assertion and de- assertion
instead for this mentioned scenario.
As such, we config ATA reset assert as a constrained command, if ATA
reset de-assert fails, then ATA reset de-assert retry will also
fail. Then we will retry the proper process of ATA reset assert and
de-assert again.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After the controller is reset, we currently may not honour the PHY max
linkrate set via sysfs, in that after a reset we always revert to max
linkrate of 12Gbps, ignoring the value set via sysfs.
This patch modifies to policy to set the programmed PHY linkrate,
honouring the max linkrate programmed via sysfs.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We should only have the timer enabled after PHY up after controller
reset, so disable prior to reset.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It is possible to dereference a NULL-pointer in hisi_sas_abort_task() in
special scenario when the device has been removed.
If an SMP task times-out, it will call hisi_sas_abort_task() to
recover. And currently there is a check in hisi_sas_abort_task() to
avoid the situation of processing the abort for the removed device.
However we have an ordering problem, in that we may reference a task for
the removed device before checking if the device has been removed.
Fix this by only referencing the sas_dev after we know it is still
present.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are 28 bytes of protection information record of SSP for v3 hw, 16
bytes for v2 hw, and probably 24 for v1 hw (forgotten now).
So use a value big enough in hisi_sas_command_table_ssp.prot to cover
all cases.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the host is frozen in SCSI EH state, at any point after the LLDD
sets SAS_TASK_STATE_DONE for the sas_task task state, libsas may free
the task; see sas_scsi_find_task().
This puts the LLDD in a difficult position, in that once it sets
SAS_TASK_STATE_DONE for the task state it should not reference the
sas_task again. But the LLDD needs will check the sas_task indirectly in
calling task->task_done()->sas_scsi_task_done() or sas_ata_task_done()
(to check if the host is frozen state actually).
And the LLDD cannot set SAS_TASK_STATE_DONE for the task state after
task->task_done() is called (as the sas_task is free'd at this point).
This situation would seem to be a problem made by libsas.
To work around, check in the LLDD whether the host is in frozen state to
ensure it is ok to call task->task_done() function. If in the frozen
state, we rely on SCSI EH and libsas to free the sas_task directly.
We do not do this for the following IO types:
- SMP - they are managed in libsas directly, outside SCSI EH
- Any internally originated IO, for similar reason
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the SCSI host enters EH, any pending IO will be processed by SCSI
EH. However it is possible that SCSI EH will try to abort the IO and
also at the same time the IO completes in the driver. In this situation
there is a small chance of freeing the sas_task twice.
Then if another IO re-uses freed sas_task before the second time of
free'ing sas_task, it is possible to free incorrect sas_task.
To avoid this situation, add some checks to increase reliability. The
sas_task task state flag SAS_TASK_STATE_ABORTED is used to mutually
protect the LLDD and libsas freeing the task.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In the DQ tasklet processing it is not necessary to take the DQ lock, as
there is no contention between adding slots to the CQ and removing slots
from the matching DQ.
In addition, since we run each DQ in a separate tasklet context, there
would be no possible contention between DQ processing running for the
same queue in parallel.
It is still necessary to take hisi_hba lock when free'ing slots.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix small formatting and wording nits in Broadcom copyright header
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update the driver version to 12.0.0.3
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Enhance log messages for CQEs as they were not reporting certain fields.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix up log messages and add an fcp error stat counter in the IO submit
code path to make diagnosing problems easier
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the cpu count is larger than the number of WQ resources available,
adapter attachment eventually failes due to a WQ_CREATE failure.
Calculate the number of WQs desired (which initializes to cpu count)
after accounting for the number of queues the adapter supports and the
number allocated to SCSI and the control/ELS path, and scale down if
necessary.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver encounters a link event ACQE with a fault code it doesn't
recognize, it logs an "Invalid" fault type and futher treats the unknown
value as a mailbox command failure. First off, there is no "invalid"
value, only values that are unknown. Secondly, the fault code doesn't
indicate status - the rest of the ACQE contains that status so there is
no reason to "fail the commands".
Change the "Invalid" to "Unknown". There is no "invalid" code value.
Separate fault code parsing and message genaration from any mbx handling
status.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In situations when the firmware image in inappropriate for the chip
type, initial validation checks were light, allowing the checks to pass,
thus allowing the firmware to be downloaded. Eventually, after the
download, the chip rejects the firmware but it is logged as a generic
firmware download error.
Revise the initial checks to validate the image vs asic type so that the
correct message is displayed and the download process is avoided.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver builds the control structures in host memory using
definitions that are based on 32-bit words. After building the structure
it is then written to the adapter.
This patch slightly optimizes LE hosts by copying the structures via
64-bit copies. This is doable as the adapter interface is LE thus there
is no byteswapping as the copy is performed.
The same optimization would be nice on BE systems, but when byteswapping
occurs, it swaps 32-bit words as well, thus trashing the control
structure. Given amount of code that is dependent upon the 32-bit word
definition, it was decided to not change things for the minor
optimization. Thus PPC 64-bit systems sticks with doing 32-bit copies.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
I/O submission paths in the lpfc nvme path are rejecting the io with an
error code that reflects back to the callee as a hard io failure. Many
of these conditions are transient and would likely resolve if retried.
Correct by returning -EBUSY, which the FC transport triggers off of to
return busy status codes to the blk-mq layer.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During an uplink toggle test all error handling is done via timeout and
firmware error conditions which can occur concurrently:
- SCSI layer timeouts
- Error detect CQEs
- Firmware detected underruns
- ABTS timeouts
All these concurrent events require more defensive checks in the driver
including:
- Check both internally and externally generated aborts to make sure the
xid is not already been aborted in another context or in cleanup.
- Check back pointers in qedf_cmd_timeout to verify the context of the
io_req, fcport and qedf_ctx
- Check rport state in host reset handler to not reset the whole host
if the rport is already uploaded or in the process of relogin
- Check to state for an fcport before initiating a middle path ELS
request
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Similar to what we do when we remove a PCI function, set the
QEDF_UNLOADING flag to prevent any requests from being queued while a
vport is being deleted. This prevents any requests from getting stuck
in limbo when the vport is unloaded or deleted.
Fixes the crash:
PID: 106676 TASK: ffff9a436aa90000 CPU: 12 COMMAND: "multipathd"
#0 [ffff9a43567d3550] machine_kexec+522 at ffffffffaca60b2a
#1 [ffff9a43567d35b0] __crash_kexec+114 at ffffffffacb13512
#2 [ffff9a43567d3680] crash_kexec+48 at ffffffffacb13600
#3 [ffff9a43567d3698] oops_end+168 at ffffffffad117768
#4 [ffff9a43567d36c0] no_context+645 at ffffffffad106f52
#5 [ffff9a43567d3710] __bad_area_nosemaphore+116 at ffffffffad106fe9
#6 [ffff9a43567d3760] bad_area+70 at ffffffffad107379
#7 [ffff9a43567d3788] __do_page_fault+1247 at ffffffffad11a8cf
#8 [ffff9a43567d37f0] do_page_fault+53 at ffffffffad11a915
#9 [ffff9a43567d3820] page_fault+40 at ffffffffad116768
[exception RIP: qedf_init_task+61]
RIP: ffffffffc0e13c2d RSP: ffff9a43567d38d0 RFLAGS: 00010046
RAX: 0000000000000000 RBX: ffffbe920472c738 RCX: ffff9a434fa0e3e8
RDX: ffff9a434f695280 RSI: ffffbe920472c738 RDI: ffff9a43aa359c80
RBP: ffff9a43567d3950 R8: 0000000000000c15 R9: ffff9a3fb09b9880
R10: ffff9a434fa0e3e8 R11: ffff9a43567d35ce R12: 0000000000000000
R13: ffff9a434f695280 R14: ffff9a43aa359c80 R15: ffff9a3fb9e005c0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are a couple of kernel cases when we restart a remote port due to
ABTS timeout that we need to handle:
1. Flush any outstanding ABTS requests when flushing I/Os so that we do
not hold up the eh_abort handler indefinitely causing process hangs.
2. Check if we are currently uploading a connection before issuing an
ABTS.
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Get all firmware debug data instead of just a grc dump.
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
PROBLEM DESCRIPTION:
According to the logs, STAG was changing and it was triggering soft
reset. In soft reset we used to virtual link down and up and also we
were disabling DCBx flag. Since this was virtual link flap, DCBx never
used to converge again.
SOLUTION:
Code change is to remove disabling DCBx flag from soft reset.
Signed-off-by: Saurav Kashyap <saurav.kashyap@cavium.com>
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Helps to corroborate which requests we can't get reference on and if
it's real bug or not.
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When an RRQ request times out the reference is not getting decremented
correctly as there are still ELS commands leftover when we flush any
pending I/Os during offload:
[ 281.788553] [0000:21:00.3]:[qedf_cmd_timeout:58]:4: ELS timeout, xid=0x96a.
...
[ 281.788553] [0000:21:00.3]:[qedf_cmd_timeout:58]:4: ELS timeout, xid=0x96a.
[ 281.788772] [0000:21:00.3]:[qedf_rrq_compl:182]:4: Entered.
[ 281.788774] [0000:21:00.3]:[qedf_rrq_compl:200]:4: rrq_compl: orig io = ffffc90004c556f8, orig xid = 0x81b, rrq_xid = 0x96a, refcount=1
...
[ 331.448032] [0000:21:00.3]:[qedf_flush_els_req:1512]:4: Flushing ELS request xid=0x96a refcount=2.
The fix is to call kref_put on the rrq_req in case of timeout as the
timeout handler will call rrq_compl directly vs. a normal completion
where it is call from els_compl.
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We currently hard code the priority in the 8021q tag to 3 for FCoE
traffic. The vast majority of the time this is fine but if the priority
is something else besides 3, any VLAN ID comparison either in the
non-offload path or offload path will fail and cause dropped frames
where none are expected.
Change the behavior so that the driver default is 3 if we do not get any
DCBX convergence.
If DCBX does converge, then set the FIP/FCoE priority in the following
manner:
1. If the qedf_default_prio modparam is set use that
2. If the DCBX FCoE priority is not in range (0..7) use 3
3. Use the DCBX FCoE priority we get in the driver's DCBX handler
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This module parameter is to work around cases where we do not receive
the DCBX handler notification from qed but discovery is still possible
if we send out a FIP VLAN request irregardless of the DCBX state.
[mkp: zeroday warning]
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some configurations need more than 30 seconds to respond to a FIP VLAN
request so increase the default to 60 seconds.
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For target mode, task management command is queued to specific cpu base
on where the SCSI command is residing. This prevent race condition of
task management command getting ahead of regular scsi command.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
- Uses predefine inline function to access add_cdb_len field in ATIO.
- Return SS_RESIDUAL_UNDER status when sending BUSY
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a connection is established, the target core session may not be
created immediately. Current code will drop/terminate the command based
on the session state. This patch will return BUSY status for any
commands arriving on wire before the session is created.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Move GPSC & GFPNID commands out of session management to reduce time lag
in reporting the session state to remote port. These commands are not
essential when it comes to maintaining the rport state. Delay sending
these commands after rport state is set to Online.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For each RSCN that triggers a rescan of the fabric, ADISC is used to
revalidate an existing session. If the RSCN is not affecting all
existing sessions, then driver should not send redundant ADISC for all
existing sessions.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch fixes rport state and session state getting out of sync.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch fixes login_retry login for ADISC command.
when login_retry count reaches 0, further attempt to send ADISC command
is ignored by the code. Remove this redundant login_retry count check
from qla24xx_fcport_handle_login()
[mkp: fix typo]
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update driver version to match OOB/internal driver version.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In ioctl exit path driver refers ioc_list to free memory associated with
diag buffers and event_log pointer used to save events by driver.
If ctl_exit() func is called after unregistering driver, then ioc_list will
be empty and hence driver will not be able to free the allocated memory
which in turn causes memory leak.
So call ctl_exit() function before unregistering mpt3sas driver.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
1) Manufacturing Page 11 contains parameters to control internal
firmware behavior. Based on AddlFlags2 field FW/Driver behaviour can
be changed, (flag tm_custom_handling is used for this)
a) For PCIe device, protocol level reset should be used if flag
tm_custom_handling is 0. Since Abort Task Set, LUN reset and Target
reset will result in a protocol level reset. Drivers should issue
only one type of this reset, if that fails then it should escalate to
a controller reset (diag reset/OCR).
b) If the driver has control over the TM reset timeout value, then
driver should use the value exposed in PCIe Device Page 2 for pcie
device (field ControllerResetTO).
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update MPI Files to support protocol level reset for NVMe device.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Added function _base_display_fwpkg_version, which sends FWUpload request
to pull FW package version from FW Image Header. Now driver prints FW
package version in addition to FW version if the PackageVersion is
valid.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In function _scsih_add_device, for each device connected to an
enclosure, driver reads the enclosure page(To get details like enclosure
handle, enclosure logical ID, enclosure level etc.)
With this patch, instead of reading enclosure page everytime, driver
maintains a list for enclosure device(During enclosure add event,
enclosure device is added to the list and removed from the list on
delete events) and uses the enclosure page from the list.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Events were not processed during driver unload, hence unloading of
driver doesn't complete when drives are disconnected while unloading of
driver. So don't block events in ISR path, i,e., remove the flag
ioc->remove_host so that events are getting processed during driver
unload. Thus allowing driver unload to complete by processing drive
removal events during driver unload.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For 24 port HBA's events generated by IOC are more in certain cases and
the current circular buffer may be overwritten.Hence increased the event
log buffer to accommodate more events.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The SAS Device Discovery Error Event is sent to the host when discovery
for a particular device is failed during discovery, even after maximum
retries by the IOC.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Enhanced DMA allocation for Sense Buffer, if the allocation does not fit
within same 4GB.Introduced is_MSB_are_same function to check if allocted
buffer within 4GB range or not.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For every IO, memory of PAGE size is allocated for handling NVMe native
PRPS. And in addition to that for every IO (chains need per IO * chain
buffer size, e.g. 38 * 128byte) amount of memory is allocated for chain
buffers.
However, at any point of time; the IO request can be for NVMe target
device (where PRP's page is used for framing PRP's) or can be for SCSI
target device (where chain buffers are used for framing chain
SGE's). This patch modifies the driver to reuse same pre-allocated PRP
page buffers as a chain buffer for IO's targeted for SCSI target
devices. No need to allocate separate buffers for chain SGE's buffers.
Suppose if the number of chain buffers need for IO doesn't fit in the
PRP Page size then driver maintain's separate buffers for those extra
chain buffers that exceeds the PRP page size. For example consider PRP
page size as 4K and chain buffer size as 128 bytes, then number of chain
buffers that can fit in PRP page is 4096/128 => 32. if the number of
chain buffer need per IO exceeds 32; for example consider number of
chains need per IO is 36 then for remaining 4 chain buffer's driver
allocates them individual.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Introduces Chain lookup table/tracker and implements accessing chain
buffer using smid. Removed link list based access of chain buffer which
requires lock and allocated as many chains needed.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of allocating RDPQ array (This stores the address's of each RDPQ
pools) at run time, now it will be allocated once during driver load
time and same will be reused during host reset operation also (instead
of allocating & freeing this buffer on the fly during every host reset
operation) and then freed during driver unload.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch fixes sparse warnings and bugs on big endian systems.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch adds driver changes for supporting the Unified Fabric Port
(UFP). This is a new paritioning mode wherein MFW provides the set of
parameters to be used by the device such as traffic class, outer-vlan
tag value, priority type etc. Drivers receives this info via notifications
from mfw and configures the hardware accordingly.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can rely on the dma-mapping code to handle any DMA limits that is
bigger than the ISA DMA mask for us (either using an iommu or swiotlb),
so remove setting the block layer bounce limit for anything but the
unchecked_isa_dma case, or the bouncing for highmem pages.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Trivial fix to spelling mistake in module parameter description text
[mkp: applied by hand]
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Trivial fix to spelling mistake in module parameter description text
[mkp: applied by hand]
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Trivial fix to spelling mistake in module description text
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The sanity check on u->in_connection_align_insertion_frequency is being
performed twice and hence the first check can be removed since it is
redundant. Cleans up cppcheck warning:
drivers/scsi/ibmvscsi/ibmvscsi.c:1711: (warning) Identical inner 'if'
condition is always true.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove boilerplate code by using macro module_pci_driver.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove boilerplate code by using macro module_pci_driver.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove boilerplate code by using macro module_pci_driver.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
commit b60710ec7d ("scsi: aacraid: enable sending of TMFs from
aac_hba_send()") allows aac_hba_send() to send scsi commands, and TMF
requests, but the existing code only updates the iu_type for scsi
commands. For TMF requests we are sending an unknown iu_type to
firmware, which causes a fault.
Include iu_type prior to determining the validity of the command
Reported-by: Noah Misner <nmisner@us.ibm.com>
Fixes: b60710ec7d ("aacraid: enable sending of TMFs from aac_hba_send()")
Fixes: 423400e64d ("aacraid: Include HBA direct interface")
Tested-by: Noah Misner <nmisner@us.ibm.com>
cc: stable@vger.kernel.org
Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The vmw_pvscsi driver returns DID_ABORT for commands aborted internally
by the adapter, leading to the filesystem going read-only. Change the
result to DID_BUS_BUSY, causing the kernel to retry the command.
Signed-off-by: Jim Gill <jgill@vmware.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
All three instance of ->smp_handler deal with highmem backed requests
just fine.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
do_gettimeofday() is deprecated since it will stop working in 2038 on
32-bit platforms, leading to incorrect times passed to the firmware.
On 64-bit platforms the current code appears to be fine, as the
calculation passes an 8-bit century number into the firmware that can
represent times long in the future (possibly until 25599).
Using ktime_get_real_seconds() to get a 64-bit seconds value and
time64_to_tm() to convert it into the firmware format greatly simplifies
the ips timekeeping code, makes 32-bit and 64-bit behave the same way
here, and gets us closer to removing the deprecated interfaces.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
do_gettimeofday() is deprecated because of the y2038 overflow. Here, we
use the result to pass into a 32-bit field in the firmware, which still
risks an overflow, but if the firmware is written to expect unsigned
values, it can at least last until y2106, and there is not much we can
do about it.
This changes do_gettimeofday() to ktime_get_real_seconds(), which at
least simplifies the code a bit, and avoids the deprecated
interface. I'm adding a comment about the overflow to document what
happens.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove boilerplate code by using macro module_pci_driver.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In the case when the phy_mask is bitwise anded with the phy_index bit is
zero the continue statement currently jumps to the next iteration of the
while loop and phy_index is never actually incremented, potentially
causing an infinite loop if phy_index is less than SCI_MAX_PHS. Fix this
by turning the while loop into a for loop.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
new_tape_buffer() is never called in atomic context. new_tape_buffer()
is only called by st_probe(), which is only set as ".probe" in struct
scsi_driver.
Despite never getting called from atomic context, new_tape_buffer()
calls kzalloc() with GFP_ATOMIC, which does not sleep for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL, which
can sleep and improve the possibility of sucessful allocation.
This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
st_probe() is never called in atomic context. st_probe() is only set as
".probe" in struct scsi_driver.
Despite never getting called from atomic context, st_probe() calls
kzalloc() with GFP_ATOMIC, which does not sleep for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL, which
can sleep and improve the possibility of sucessful allocation.
This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On Fujitsu ETERNUS systems, sense code ABORTED COMMAND with ASC/Q C1/01
is used to indicate temporary condition where the storage-internal path
to a target is switched from one controller to another. SCSI commands
that return with this error code must be retried unconditionally
(i.e. without the "maybe_retry" logic in scsi_decide_disposition);
otherwise dm-multipath might initiate a failover from a healthy path
e.g. for REQ_FAILFAST_DEV commands.
Introduce a new blist flag for this case.
[mkp: applied by hand]
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
EMC Symmetrix returns 'internal target error' for a variety of
conditions, most of which will be transient. So we should always retry
it, even with failfast set. Otherwise we'd get spurious path flaps with
multipath.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Warn if a device (or the user) sets blist flags which are unknown
or have been removed. This should enable us to reuse freed blist
bits in later releases.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Space for SCSI blist flags is gradually running out. Change the type to
__u64 and fix a checkpatch complaint about symbolic mode flags in
scsi_devinfo.c.
Make checkpatch happy by replacing simple_strtoul() with kstrtoull().
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use the just introduced const_ilog2() macro to avoid sparse errors.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is a best effort for estimating on how busy the ring buffer is for
that channel, based on available buffer to write in percentage. It is
still possible that at the time of actual ring buffer write, the space
may not be available due to other processes may be writing at the time.
Selecting a channel based on how full it is can reduce the possibility
that a ring buffer write will fail, and avoid the situation a channel is
over busy.
Now it's possible that storvsc can use a smaller ring buffer size
(e.g. 40k bytes) to take advantage of cache locality.
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Unlike SCSI and FC, we don't use multiple channels for IDE. Also fix
the calculation for sub-channels.
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since SCSI scanning occurs asynchronously, since sd_revalidate_disk() is
called from sd_probe_async() and since sd_revalidate_disk() calls
sd_zbc_read_zones() it can happen that sd_zbc_read_zones() is called
concurrently with blkdev_report_zones() and/or blkdev_reset_zones(). That can
cause these functions to fail with -EIO because sd_zbc_read_zones() e.g. sets
q->nr_zones to zero before restoring it to the actual value, even if no drive
characteristics have changed. Avoid that this can happen by making the
following changes:
- Protect the code that updates zone information with blk_queue_enter()
and blk_queue_exit().
- Modify sd_zbc_setup_seq_zones_bitmap() and sd_zbc_setup() such that
these functions do not modify struct scsi_disk before all zone
information has been obtained.
Note: since commit 055f6e18e0 ("block: Make q_usage_counter also track
legacy requests"; kernel v4.15) the request queue freezing mechanism also
affects legacy request queues.
Fixes: 89d9475610 ("sd: Implement support for ZBC devices")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: stable@vger.kernel.org # v4.16
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi_io_completion() translates the sense key ILLEGAL REQUEST / ASC 0x21 into
ACTION_FAIL. That means that setting cmd->allowed to zero in sd_zbc_complete()
for this sense code / ASC combination is not necessary. Hence remove the code
that resets cmd->allowed from sd_zbc_complete().
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch does not change any functionality but makes it clear that it is on
purpose that these fields are 32 bits wide.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The default already is to never bounce, so the call is a no-op.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The default already is to never bounce, so the call is a no-op.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use new return type vm_fault_t for fault handler in struct
vm_operations_struct.
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
New combined SCSI driver for all ESP based Zorro SCSI boards for m68k Amiga.
Code largely based on board specific parts of the old drivers (blz1230.c,
blz2060.c, cyberstorm.c, cyberstormII.c, fastlane.c which were removed after
the 2.6 kernel series for lack of maintenance) with contributions by Tuomas
Vainikka (TCQ bug tests and workaround) and Finn Thain (TCQ bugfix by use of
PIO in extended message in transfer).
New Kconfig option and Makefile entries for new Amiga Zorro ESP SCSI driver
included in this patch.
Use DMA transfers wherever possible, with board-specific DMA set-up functions
copied from the old driver code. Three byte reselection messages do appear to
cause DMA timeouts. So wire up a PIO transfer routine for these
instead. esp_reselect_with_tag explicitly sets
esp->cmd_block_dma as target address for the message bytes but PIO
requires a virtual address. Substiute kernel virtual address
esp->cmd_block in PIO transfer call if DMA address is esp->cmd_block_dma
and phase is message in.
PIO code taken from mac_esp.c where the reselection timeout issue was debugged
and fixed first, with minor macro and function rename.
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Reviewed-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian T. Steigies <cts@debian.org>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A drive being sanitized will return NOT READY / ASC 0x4 / ASCQ
0x1b ("LOGICAL UNIT NOT READY. SANITIZE IN PROGRESS").
Prevent spinning up the drive until this condition clears.
[mkp: tweaked commit message]
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes: 2d2c233167 ("scsi: megaraid_sas: modified few prints in OCR and IOC INIT path")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add UFS Protocol Information Units(upiu) trace events for ufs driver,
used to trace various ufs transaction types- command, task-management
and device management.
The trace-point format is generic and can be easily adapted to trace
other upius if needed.
Currently tracing ufs transaction of type 'device management', which
this patch introduce, cannot be obtained from any other trace.
Device management transactions are used for communication with the
device such as reading and writing descriptor or attributes etc.
Signed-off-by: Ohad Sharabi <ohad.sharabi@sandisk.com>
Reviewed-by: Stanislav Nijnikov <stanislav.nijnikov@wdc.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Trivial fix to spelling mistake in fnic stats message text.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A patch titled: "[PATCH v2] scsi_debug: implement IMMED bit" introduced
long delays to the Start stop unit (SSU) and Synchronize cache (SC)
commands when the IMMED bit is clear. This patch makes those delays
more realistic. It causes SSU to only delay when the start stop state is
changed; SC only delays when there's been a write since the previous
SC. It also reduced the SC delay from 1 second to 50 milliseconds.
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Reported-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of always multicasting responses, send a unicast netlink message
directed at the correct pid. This will be needed if we ever want to
support multiple userspace processes interacting with the kernel over
iSCSI netlink simultaneously. Limitations can currently be seen if you
attempt to run multiple iscsistart commands in parallel.
We've fixed up the userspace issues in iscsistart that prevented
multiple instances from running, so now attempts to speed up booting by
bringing up multiple iscsi sessions at once in the initramfs are just
running into misrouted responses that this fixes.
Signed-off-by: Chris Leech <cleech@redhat.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
SGI/TP9100 is not an RDAC array:
^^^
https://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=blob;f=libmultipath/hwtable.c;h=88b4700beb1d8940008020fbe4c3cd97d62f4a56;hb=HEAD#l235
This partially reverts commit 35204772ea ("[SCSI] scsi_dh_rdac :
Consolidate rdac strings together")
[mkp: fixed up the new entries to align with rest of struct]
Cc: NetApp RDAC team <ng-eseries-upstream-maintainers@netapp.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: SCSI ML <linux-scsi@vger.kernel.org>
Cc: DM ML <dm-devel@redhat.com>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The revision field is currently unused by the devinfo pattern matching
code. Combine two blacklist entries into one.
$ egrep "Generic.*Storage-SMC" /proc/scsi/device_info
'Generic' 'USB Storage-SMC' 0x402
'Generic' 'USB Storage-SMC' 0x402
[mkp: tweaked commit desc]
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Cc: SCSI ML <linux-scsi@vger.kernel.org>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update the driver version to 12.0.0.2
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remote port disappearance/reappearances would cause a series of RSCN
events to be delivered to the driver. During the resulting GID_FT
handling, the driver clears the fc4 settings on the remote port, which
makes it skip registration. As such, the nvme associations eventually
fail and return io errors to the applications.
Correct by not clearng the nlp_fc4_types for all nodes in
lpfc_issue_gidft. Instead, when the GID_FT response is handled, clear
the nlp_fc4_types of FCP and NVME prior to evaluating the fc4_type
returned by the GID_FT response. This approach leaves "skipped" nodes
with their nlp_fc4_types intacted.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Points referencing local port structures didn't accommodate cases where
the localport may not be registered yet.
Add NULL pointer checks to logic.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On tests adding and removing a remote port, calls to nvme_info would
eventually show fewer target ports discovered than were present in the
san. Additionally, the following error messages were seen:
6031 RemotePort Registration failed err: -116, DID x471301
There is a race condition that exists between the driver and the nvme
transport on remote port unregister vs the confirmed deletion. It's
possible that the driver may rediscover the remote port and reregister
the remote port before a prior unregister delete callback was made (as
it rebinded to the prior remoteport structure). However, the driver was
coded to expect the callback before seeing the remote port again thus a
new registration. The logic results in the driver having an invalid
remoteport pointer set.
Correct by tracking when waiting for the delete callback. In cases where
the ndlp remoteport pointer is updated, it is only cleared when the wait
has not been superceded by a prior registration.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During target-side port faults, the driver would not recover all target
port logins. This resulted in a loss of nvme device discovery.
The driver is coded to wait for all GID_FT requests to complete before
restarting discovery. A fault is seen where the outstanding GIT_FT
counts are not properly decremented, thus discovery would never
start. Another fault was found in the clearing of the gidft_inp counter
that would be skipped in this condition. And a third fault found with
lpfc_nvme_register_port that would remove a reverence on the ndlp which
then allows a node swap on a port address change to prematurely remove
the reference and release the ndlp.
The following changes are made:
- Correct the decrementing of the outstanding GID_FT counters.
- In RSCN handling, no longer zero the counter before calling to issue
another GID_FT.
- No longer remove the reference on the dlp when the ndlp->nrport value
is not yet null.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The patch to enlarge WQ/CQ creation keys off of an adapter response that
indicates support for the larger values. Older adapters return an
incorrect response and are limited in size. Thus the adapters fail the
WQ creation steps.
Augment the WQ sizing checks with a check on the older adapter types and
limit them to the restricted sizes.
Fixes: c176ffa084 ("scsi: lpfc: Increase CQ and WQ sizes for SCSI")
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After making remoteport unregister requests, the ndlp nrport pointer was
stale.
Track when waiting for waiting for unregister completion callback and
adjust nldp pointer assignment. Add a few safety checks for NULL
pointer values.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After driver unloads, lpfc_wq remains active. The destroy_workqueue
calls were not being made in driver unload. Additionally, SLI3 is
allocating lpfc_wq resources, but never uses it.
Make the destroy_workqueue calls on driver unload. Modify the SLI3 code
path no longer allocate lpfc_wq resources.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When running loads that generated aborts, io errors where seen. Turns
out the abort requests where not placed on the proper WQ resulting in
the errors. Closer inspection inspection of this error also showed
improper spinlock api use.
Correct the WQ selection policy for the abort requests. Correct
spin_lock/spin_lock_irq/spin_lock_irqsave usage.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Under large io load, the current sizing of asynchronous buffer counts
could be exceeded, indicated by a 2885 log message:
2885 Port Status Event: port status reg 0x81800000, port smphr
reg 0xc000, error 1=0x52004a01, error 2=0x0
Enlarge the async receive queue size. Allow for a configurable number
of buffers to be posted to each RQ, using the new attribute
lpfc_nvmet_mrq_post.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When debugging various issues, per IO channel IO statistics were useful
to understand what was happening. However, many of the stats were on a
port basis rather than an io channel basis.
Move statistics to an io channel basis.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The max_scsicmpl_time parameter can be used to perform scsi cmd queue
depth mgmt based on io completion time: the queue depth is reduced to
make completion time shorter. However, as soon as an io completes and
the completion time is within limits, the code immediately bumps the
queue depth limit back up to the target queue depth. Thus the procedure
restarts, effectively limiting the usefulness of adjusting queue depth
to help completion time.
This patch makes the following changes:
- Removes the code at io completion that resets the queue depth as soon
as within limits.
- As the code removed was where the target queue depth was first
applied, change target queue depth application so that it occurs when
the parameter is changed.
- Makes target queue depth a standard parameter: both a module
parameter and a sysfs parameter.
- Optimizes the command pending count by using atomics rather than
locks.
- Updates the debugfs nodelist stats to allow better debugging of
pending command counts.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Nodelist entry for SCSI array ends up in UNMAPPED state. This is due to
illegal discovery State machine transition because of two PRLIs and the
first one failing with LS_RJT. Also, the error path was designed
assuming the PRLIs complete in the order they were sent, FCP first, then
NVME. In a failing case, the array thinks about the first PRLI (FCP),
but issues LS_RJT for the 2nd PRLI immediately.
Fix PRLI completion error path for the ordering expectation. Ensure the
discovery state machine update is not set until all outstanding PRLIs
are complete.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Hardware could time out Fastpath IOs one second earlier than the timeout
provided by the host.
For non-RAID devices, driver provides timeout value based on OS provided
timeout value. Under certain scenarios, if the OS provides a timeout
value of 1 second, due to above behavior hardware will timeout
immediately.
Increase timeout value for non-RAID fastpath IOs by 1 second.
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use pci_zalloc_consistent for allocating zeroed memory and remove
unnecessary memset function.
Done using Coccinelle.
Generated by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
Suggested-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>