Commit Graph

982206 Commits

Author SHA1 Message Date
Tyrel Datwyler 9e6b6b81aa scsi: ibmvfc: Define hcall wrapper for registering a Sub-CRQ
Sub-CRQs are registred with firmware via a hypercall. Abstract that
interface into a simpler helper function.

Link: https://lore.kernel.org/r/20210114203148.246656-6-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-14 22:27:44 -05:00
Tyrel Datwyler bb35ecb2a9 scsi: ibmvfc: Add size parameter to ibmvfc_init_event_pool()
With the upcoming addition of Sub-CRQs the event pool size may vary
per-queue.

Add a size parameter to ibmvfc_init_event_pool() such that different size
event pools can be requested by ibmvfc_alloc_queue().

Link: https://lore.kernel.org/r/20210114203148.246656-5-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-14 22:27:43 -05:00
Tyrel Datwyler 003d91a139 scsi: ibmvfc: Init/free event pool during queue allocation/free
The event pool and CRQ used to be separate entities of the adapter host
structure and as such were allocated and freed independently of each
other. Recent work as defined a generic queue structure with an event pool
specific to each queue. As such the event pool for each queue shouldn't be
allocated/freed independently, but instead performed as part of the queue
allocation/free routines.

Move the calls to ibmvfc_event_pool_{init|free} into
ibmvfc_{alloc|free}_queue respectively. The only functional change here is
that the CRQ cannot be released in ibmvfc_remove until after the event pool
has been successfully purged since releasing the queue will also free the
event pool.

Link: https://lore.kernel.org/r/20210114203148.246656-4-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-14 22:27:43 -05:00
Tyrel Datwyler 225acf5f1a scsi: ibmvfc: Move event pool init/free routines
The next patch in this series reworks the event pool allocation calls to
happen within the individual queue allocation routines instead of as
independent calls.

Move the init/free routines earlier in ibmvfc.c to prevent undefined
reference errors when calling these functions from the queue allocation
code. No functional change.

Link: https://lore.kernel.org/r/20210114203148.246656-3-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-14 22:27:43 -05:00
Tyrel Datwyler 6ae208e5d2 scsi: ibmvfc: Add vhost fields and defaults for MQ enablement
Introduce several new vhost fields for managing MQ state of the adapter as
well as initial defaults for MQ enablement.

Link: https://lore.kernel.org/r/20210114203148.246656-2-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-14 22:27:43 -05:00
Can Guo 9cd20d3f47 scsi: ufs: Protect PM ops and err_handler from user access through sysfs
User layer may access sysfs nodes when system PM ops or error handling is
running. This can cause various problems. Rename eh_sem to host_sem and use
it to protect PM ops and error handling from user layer intervention.

Link: https://lore.kernel.org/r/1610594010-7254-3-git-send-email-cang@codeaurora.org
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-14 22:12:35 -05:00
Can Guo fb7afe24ba scsi: ufs: Fix a possible NULL pointer issue
During system resume/suspend, hba could be NULL. In this case, do not touch
eh_sem.

Fixes: 88a92d6ae4 ("scsi: ufs: Serialize eh_work with system PM events and async scan")
Link: https://lore.kernel.org/r/1610594010-7254-2-git-send-email-cang@codeaurora.org
Acked-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-14 22:12:35 -05:00
John Garry e8e5df5edd scsi: MAINTAINERS: Remove intel-linux-scu@intel.com for INTEL C600 SAS DRIVER
The mail address intel-linux-scu@intel.com bounces. Remove it.

Link: https://lore.kernel.org/r/1610449890-198089-1-git-send-email-john.garry@huawei.com
Cc: Ahmed S. Darwish <a.darwish@linutronix.de>
Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Acked-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:31:12 -05:00
Bean Huo b64750a1b6 scsi: ufs: Remove unnecessary devm_kfree()
The memory allocated with devm_kzalloc() is freed automatically no need to
explicitly call devm_kfree(). Delete it and save some instruction cycles.

Link: https://lore.kernel.org/r/20210112092128.19295-1-huobean@gmail.com
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:30:04 -05:00
YANG LI af0c94afc0 scsi: lpfc: Simplify bool comparison
Fix the following coccicheck warning:

./drivers/scsi/lpfc/lpfc_bsg.c:5392:5-29: WARNING: Comparison to bool

Link: https://lore.kernel.org/r/1610439893-64872-1-git-send-email-abaci-bugfix@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: YANG LI <abaci-bugfix@linux.alibaba.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:28:22 -05:00
Jaegeuk Kim a2fca52ee6 scsi: ufs: WB is only available on LUN #0 to #7
Kernel stack violation when getting unit_descriptor/wb_buf_alloc_units from
rpmb LUN. The reason is that the unit descriptor length is different per
LU.

The length of Normal LU is 45 while the one of rpmb LU is 35.

int ufshcd_read_desc_param(struct ufs_hba *hba, ...)
{
	param_offset=41;
	param_size=4;
	buff_len=45;
	...
	buff_len=35 by rpmb LU;

	if (is_kmalloc) {
		/* Make sure we don't copy more data than available */
		if (param_offset + param_size > buff_len)
			param_size = buff_len - param_offset;
			--> param_size = 250;
		memcpy(param_read_buf, &desc_buf[param_offset], param_size);
		--> memcpy(param_read_buf, desc_buf+41, 250);

[  141.868974][ T9174] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: wb_buf_alloc_units_show+0x11c/0x11c
	}
}

Link: https://lore.kernel.org/r/20210111095927.1830311-1-jaegeuk@kernel.org
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:27:46 -05:00
Nilesh Javali dc0d9b12b8 scsi: qla2xxx: Update version to 10.02.00.105-k
Link: https://lore.kernel.org/r/20210111093134.1206-8-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:25:20 -05:00
Saurav Kashyap ffa018e3a5 scsi: qla2xxx: Enable NVMe CONF (BIT_7) when enabling SLER
Enable NVMe confirmation bit in PRLI.

Link: https://lore.kernel.org/r/20210111093134.1206-7-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:25:20 -05:00
Quinn Tran 044c218b04 scsi: qla2xxx: Fix mailbox Ch erroneous error
Mailbox Ch/dump ram extend expects mb register 10 to be set. If not
set/clear, firmware can pick up garbage from previous invocation of this
mailbox. Example: mctp dump can set mb10.  On subsequent flash read which
use mailbox cmd Ch, mb10 can retain previous value.

Link: https://lore.kernel.org/r/20210111093134.1206-6-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:25:20 -05:00
Bikash Hazarika a046585943 scsi: qla2xxx: Wait for ABTS response on I/O timeouts for NVMe
FW needs to wait for an ABTS response before completing the I/O.

Link: https://lore.kernel.org/r/20210111093134.1206-5-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:25:20 -05:00
Saurav Kashyap daaecb41a2 scsi: qla2xxx: Move some messages from debug to normal log level
This change will aid in debugging issues arising because of dropped frame,
DIF errors, queue full etc where debug level is not set.

Link: https://lore.kernel.org/r/20210111093134.1206-4-njavali@marvell.com
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:25:20 -05:00
Saurav Kashyap 307862e669 scsi: qla2xxx: Add error counters to debugfs node
Display error counters via debugfs node.

Link: https://lore.kernel.org/r/20210111093134.1206-3-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:25:19 -05:00
Saurav Kashyap dbf1f53cfd scsi: qla2xxx: Implementation to get and manage host, target stats and initiator port
This statistics will help in debugging process and checking specific error
counts. It also provides a capability to isolate the port or bring it out
of isolation.

Link: https://lore.kernel.org/r/20210111093134.1206-2-njavali@marvell.com
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:25:19 -05:00
YANG LI ac341c2d2f scsi: qedf: Simplify bool comparison
Fix the following coccicheck warning:

./drivers/scsi/qedf/qedf_main.c:3716:5-31: WARNING: Comparison to bool

Link: https://lore.kernel.org/r/1610357368-62866-1-git-send-email-abaci-bugfix@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: YANG LI <abaci-bugfix@linux.alibaba.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:15:13 -05:00
Sergey Shtylyov e4da5feb09 scsi: aha1542: Fix multi-line comment style
Some comments in this driver don't comply with the preferred multi-line
comment style, as reported by 'scripts/checkpatch.pl':

WARNING: Block comments use * on subsequent lines
WARNING: Block comments use a trailing */ on a separate line

Fix those comments, along with the (unreported for some reason?) starts of
the multi-line comments not being /* on their own line...

Link: https://lore.kernel.org/r/08c231e5-d86f-9d0b-19ac-ad46fa0c0b58@omprussia.ru
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:14:07 -05:00
Sergey Shtylyov 6075416cc4 scsi: aha1542: Kill trailing whitespace
Some source lines (mostly the comments) in this driver end with spaces, as
reported by 'scripts/checkpatch.pl'. Trim these lines.

Link: https://lore.kernel.org/r/59829052-4932-4ea3-b504-857bbb19e6a0@omprussia.ru
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:14:07 -05:00
Sergey Shtylyov 5637d5b769 scsi: aha1542: Clarify 'struct ccb' comments
This driver's original authors did pretty bad job of documenting the
Command Control Block (CCB) structure -- especially its 2nd byte, where the
bit numbers were completely left out. Sync up the 'struct ccb' comments to
the Adaptec AHA-154xA manual.

Link: https://lore.kernel.org/r/17a7be14-a9d2-9822-bb3e-1d7385f486b0@omprussia.ru
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:14:07 -05:00
Avri Altman fb475b74d6 scsi: ufs: A tad optimization in query upiu trace
Remove a redundant if clause in ufshcd_add_query_upiu_trace.

Link: https://lore.kernel.org/r/20210110084618.189371-1-avri.altman@wdc.com
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:11:11 -05:00
Pavel Begunkov 6b1dba3d8c scsi: target: file: Don't zero iter before iov_iter_bvec
iov_iter_bvec() initialises iterators well, no need to pre-zero it
beforehand as done in fd_execute_rw_aio(). Compilers can't optimise it out
and generate extra code for that (confirmed with assembly).

Link: https://lore.kernel.org/r/34cd22d6cec046e3adf402accb1453cc255b9042.1610207523.git.asml.silence@gmail.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:09:16 -05:00
Vishakha Channapattan 4f608fbce5 scsi: pm80xx: Log SATA IOMB completion status on failure
Added a log message in SATA completion path to capture the status of failed
command. If the status does not match any expected status, another message
will be logged.

On IO failure with known status, the log message will be:

  [ 1712.951735] pm80xx0:: mpi_sata_completion 2269: IO failed device_id 16385 status 0x1 tag XX

If the firmware returns unexpected status, a message of the following
format will be logged:

  [ 1712.951735] pm80xx0:: mpi_sata_completion XXXX: Unknown status device_id XXXXX status 0xX tag XX

Link: https://lore.kernel.org/r/20210109123849.17098-8-Viswas.G@microchip.com
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Vishakha Channapattan <vishakhavc@google.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Signed-off-by: Ruksar Devadi <Ruksar.devadi@microchip.com>
Signed-off-by: Ashokkumar N <Ashokkumar.N@microchip.com>
Signed-off-by: Radha Ramachandran <radha@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:02:01 -05:00
Bhavesh Jashnani 6b2f2d05b5 scsi: pm80xx: Simultaneous poll for all FW readiness
In check_fw_ready() we first wait for ILA to come up and then we wait for
RAAE to come up and IOPs and so on. This is a sequential check.  Because of
this, ILA image seems to be not ready in the allocated time and so the
driver marks it as "not ready" and then moves on to other FW images.

ILA does become ready eventually, but is not checked again. The driver
concludes that FW is not ready when it actually is.

Instead of sequentially polling each image, we keep polling for all images
to be ready. The timeout for the polling has been set to the sum of what
was used for each individual image.

Link: https://lore.kernel.org/r/20210109123849.17098-7-Viswas.G@microchip.com
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Bhavesh Jashnani <bjashnani@google.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Signed-off-by: Ruksar Devadi <Ruksar.devadi@microchip.com>
Signed-off-by: Ashokkumar N <Ashokkumar.N@microchip.com>
Signed-off-by: Radha Ramachandran <radha@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:02:01 -05:00
Viswas G ec2e7e1aff scsi: pm80xx: Fix driver fatal dump failure
The function pm80xx_get_fatal_dump() has two issues that result in the
fatal dump not being able to complete successfully.

 1. Trying to collect fatal_logs from the application fails because we are
    not shifting the MEMBASE-II register properly. Once we read 64K region
    of data we have to shift the MEMBASE-II register and read the next
    chunk. Only then would we be able to get complete data.

 2. If a timeout occurs, our application will get stuck.

Link: https://lore.kernel.org/r/20210109123849.17098-6-Viswas.G@microchip.com
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Signed-off-by: Ruksar Devadi <Ruksar.devadi@microchip.com>
Signed-off-by: Ashokkumar N <Ashokkumar.N@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:02:01 -05:00
akshatzen 5d28026891 scsi: pm80xx: Fix missing tag_free in NVMD DATA req
Tag was not freed in NVMD get/set data request failure scenario. This
caused a tag leak each time a request failed.

Link: https://lore.kernel.org/r/20210109123849.17098-5-Viswas.G@microchip.com
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: akshatzen <akshatzen@google.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Signed-off-by: Ruksar Devadi <Ruksar.devadi@microchip.com>
Signed-off-by: Radha Ramachandran <radha@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:02:01 -05:00
akshatzen 95652f98b1 scsi: pm80xx: Check main config table address
The driver initializes main configuration, general status, inbound queue
and outbound queue table addresses based on a value read from
MSGU_SCRATCH_PAD_0 register.

We should validate these addresses before dereferencing them.

Adds two validations:

 1. Check if main configuration table offset lies within the pcibar
    mapped

 2. Check if first dword of main configuration table reads "PMCS"

There are two calls to init_pci_device_addresses() done during
pm8001_pci_probe() in this sequence:

 1. First inside chip_soft_rst, where if init_pci_device_addresses fails we
    will go ahead assuming MPI state is not ready and reset the device as
    long as bootloader is okay.  This gives chance to second call of
    init_pci_device_addresses to set up the addresses after reset.

 2. The second call is via pm80xx_chip_init, after soft reset is done and
    firmware is checked to be ready. Once that is done we are safe to go
    ahead and initialize default table values and use them.

Tests:

 1. Enabled debugging logs and observed no issues during initialization,
    with a controller with no issues:

    pm80xx0:: pm8001_setup_msix 1034: pci_alloc_irq_vectors request ret:64 no of intr 64
    pm80xx0:: init_pci_device_addresses 917: Scratchpad 0 Offset: 0x2000 value 0x40002000
    pm80xx0:: init_pci_device_addresses 925: Scratchpad 0 PCI BAR: 0
    pm80xx0:: init_pci_device_addresses 952: VALID main config signature 0x53434d50
    pm80xx0:: init_pci_device_addresses 975: GST OFFSET 0xc4
    pm80xx0:: init_pci_device_addresses 978: INBND OFFSET 0x20000128
    pm80xx0:: init_pci_device_addresses 981: OBND OFFSET 0x24000928
    pm80xx0:: init_pci_device_addresses 984: IVT OFFSET 0x8001408
    pm80xx0:: init_pci_device_addresses 987: PSPA OFFSET 0x8001608
    pm80xx0:: init_pci_device_addresses 991: addr - main cfg (ptrval) general status (ptrval)
    pm80xx0:: init_pci_device_addresses 995: addr - inbnd (ptrval) obnd (ptrval)
    pm80xx0:: init_pci_device_addresses 999: addr - pspa (ptrval) ivt (ptrval)
    pm80xx0:: pm80xx_chip_soft_rst 1446: reset register before write : 0x0
    pm80xx0:: pm80xx_chip_soft_rst 1478: reset register after write 0x40
    pm80xx0:: pm80xx_chip_soft_rst 1544: SPCv soft reset Complete
    pm80xx0:: init_pci_device_addresses 917: Scratchpad 0 Offset: 0x2000 value 0x40002000
    pm80xx0:: init_pci_device_addresses 925: Scratchpad 0 PCI BAR: 0
    pm80xx0:: init_pci_device_addresses 952: VALID main config signature 0x53434d50
    pm80xx0:: init_pci_device_addresses 975: GST OFFSET 0xc4
    pm80xx0:: init_pci_device_addresses 978: INBND OFFSET 0x20000128
    pm80xx0:: init_pci_device_addresses 981: OBND OFFSET 0x24000928
    pm80xx0:: init_pci_device_addresses 984: IVT OFFSET 0x8001408
    pm80xx0:: init_pci_device_addresses 987: PSPA OFFSET 0x8001608
    pm80xx0:: init_pci_device_addresses 991: addr - main cfg (ptrval) general status (ptrval)
    pm80xx0:: init_pci_device_addresses 995: addr - inbnd (ptrval) obnd (ptrval)
    pm80xx0:: init_pci_device_addresses 999: addr - pspa (ptrval) ivt (ptrval)
    pm80xx0:: pm80xx_chip_init 1329: MPI initialize successful!

 2. Tested controller with firmware known to have initialization issue and
    observed no crashes with this fix:

    pm80xx 0000:01:00.0: pm80xx: driver version 0.1.38
    pm80xx 0000:01:00.0: Removing from 1:1 domain
    pm80xx 0000:01:00.0: Requesting non-1:1 mappings
    pm80xx0:: init_pci_device_addresses 948: BAD main config signature 0x0
    pm80xx0:: mpi_uninit_check 1365: Failed to init pci addresses
    pm80xx0:: pm80xx_chip_soft_rst 1435: MPI state is not ready scratch:0:8:62a01000:0
    pm80xx0:: pm80xx_chip_soft_rst 1518: Firmware is not ready!
    pm80xx0:: pm80xx_chip_soft_rst 1532: iButton Feature is not Available!!!
    pm80xx0:: pm80xx_chip_init 1301: Firmware is not ready!
    pm80xx0:: pm8001_pci_probe 1215: chip_init failed [ret: -16]
    pm80xx: probe of 0000:01:00.0 failed with error -16
    pm80xx 0000:07:00.0: pm80xx: driver version 0.1.38
    pm80xx 0000:07:00.0: Removing from 1:1 domain
    pm80xx 0000:07:00.0: Requesting non-1:1 mappings
    scsi host6: pm80xx
    pm80xx1:: pm8001_setup_sgpio 5568: failed sgpio_req timeout
    pm80xx1:: mpi_phy_start_resp 3447: phy start resp status:0x0, phyid:0x0
    pm80xx 0000:08:00.0: pm80xx: driver version 0.1.38
    pm80xx 0000:08:00.0: Removing from 1:1 domain
    pm80xx 0000:08:00.0: Requesting non-1:1 mappings

 3. Without this fix we observe crash on the same controller:

    pm80xx 0000:01:00.0: pm80xx: driver version 0.1.38
    pm80xx 0000:01:00.0: Removing from 1:1 domain
    pm80xx 0000:01:00.0: Requesting non-1:1 mappings
    [<ffffffffc0451b3b>] pm80xx_chip_soft_rst+0x6b/0x4c0 [pm80xx]
    [<ffffffffc043a933>] pm8001_pci_probe+0xa43/0x1630 [pm80xx]
    RIP: 0010:pm80xx_chip_soft_rst+0x71/0x4c0 [pm80xx]
    [<ffffffffc0451b3b>] ? pm80xx_chip_soft_rst+0x6b/0x4c0 [pm80xx]
    [<ffffffffc043a933>] pm8001_pci_probe+0xa43/0x1630 [pm80xx]
    pm80xx0:: mpi_uninit_check 1339: TIMEOUT:IBDB value/=2
    pm80xx0:: pm80xx_chip_soft_rst 1387: MPI state is not ready scratch:0:8:62a01000:0
    pm80xx0:: pm80xx_chip_soft_rst 1470: Firmware is not ready!
    pm80xx0:: pm80xx_chip_soft_rst 1484: iButton Feature is not Available!!!
    pm80xx0:: pm80xx_chip_init 1266: Firmware is not ready!
    pm80xx0:: pm8001_pci_probe 1207: chip_init failed [ret: -16]
    pm80xx: probe of 0000:01:00.0 failed with error -16

Link: https://lore.kernel.org/r/20210109123849.17098-4-Viswas.G@microchip.com
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: akshatzen <akshatzen@google.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Signed-off-by: Ruksar Devadi <Ruksar.devadi@microchip.com>
Signed-off-by: Radha Ramachandran <radha@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:02:01 -05:00
akshatzen a961ea0afd scsi: pm80xx: Check for fatal error
When the controller runs into a fatal error, commands get stuck due to no
response. If the controller is in fatal error state, abort requests issued
to the controller get stuck too.

Check the controller state for fatal error conditions.

Link: https://lore.kernel.org/r/20210109123849.17098-3-Viswas.G@microchip.com
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: akshatzen <akshatzen@google.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Signed-off-by: Ruksar Devadi <Ruksar.devadi@microchip.com>
Signed-off-by: Radha Ramachandran <radha@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:02:01 -05:00
akshatzen d71023af4b scsi: pm80xx: Do not busy wait in MPI init check
We do not need to busy wait during mpi_init_check() since it is not being
invoked in atomic context. mpi_init_check() is being called from
pm8001_pci_resume(), pm8001_pci_probe(). Hence we are replacing udelay with
msleep.

Link: https://lore.kernel.org/r/20210109123849.17098-2-Viswas.G@microchip.com
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: akshatzen <akshatzen@google.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Signed-off-by: Ruksar Devadi <Ruksar.devadi@microchip.com>
Signed-off-by: Radha Ramachandran <radha@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-13 00:02:01 -05:00
Ziqi Chen b61d041413 scsi: ufs-qcom: Fix ufs RST_n spec violation
According to the spec (JESD220E chapter 7.2), while powering off/on the ufs
device, RST_n signal should be between VSS(Ground) and VCCQ/VCCQ2.

Link: https://lore.kernel.org/r/1610103385-45755-3-git-send-email-ziqichen@codeaurora.org
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Ziqi Chen <ziqichen@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-12 23:37:34 -05:00
Ziqi Chen 528db9e563 scsi: ufs: core: Fix ufs clk specs violation
According to the spec (JESD220E chapter 7.2), while powering off/on the ufs
device, REF_CLK signal should be between VSS(Ground) and VCCQ/VCCQ2.

Link: https://lore.kernel.org/r/1610103385-45755-2-git-send-email-ziqichen@codeaurora.org
Reviewed-by: Can Guo <cang@codeaurora.org>
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Ziqi Chen <ziqichen@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-12 23:37:34 -05:00
YANG LI dc0bfdb563 scsi: isci: Remove the unneeded variable "status"
The variable 'status' is being initialized with SCI_SUCCESS and never
updated later with a new value. The initialization is redundant and can be
removed.

Link: https://lore.kernel.org/r/1609311860-102820-1-git-send-email-abaci-bugfix@linux.alibaba.com
Reported-by: Abaci <abaci@linux.alibaba.com>
Signed-off-by: YANG LI <abaci-bugfix@linux.alibaba.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-12 23:34:06 -05:00
Adrian Hunter b6cacaf204 scsi: ufs: ufs-debugfs: Add error counters
People testing have a need to know how many errors might be occurring over
time. Add error counters and expose them via debugfs.

A module initcall is used to create a debugfs root directory for
ufshcd-related items. In the case that modules are built-in, then
initialization is done in link order, so move ufshcd-core to the top of the
Makefile.

Link: https://lore.kernel.org/r/20210107072538.21782-1-adrian.hunter@intel.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-12 22:14:06 -05:00
Andrea Parri (Microsoft) 91b1b640b8 scsi: storvsc: Validate length of incoming packet in storvsc_on_channel_callback()
Check that the packet is of the expected size at least, don't copy data
past the packet.

Link: https://lore.kernel.org/r/20201217203321.4539-4-parri.andrea@gmail.com
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Reported-by: Saruhan Karademir <skarade@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:15:24 -05:00
Andrea Parri (Microsoft) 244808e030 scsi: storvsc: Resolve data race in storvsc_probe()
vmscsi_size_delta can be written concurrently by multiple instances of
storvsc_probe(), corresponding to multiple synthetic IDE/SCSI devices;
cf. storvsc_drv's probe_type == PROBE_PREFER_ASYNCHRONOUS.  Change the
global variable vmscsi_size_delta to per-synthetic-IDE/SCSI-device.

Link: https://lore.kernel.org/r/20201217203321.4539-3-parri.andrea@gmail.com
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Suggested-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:15:00 -05:00
Andrea Parri (Microsoft) ab548fd21e scsi: storvsc: Fix max_outstanding_req_per_channel for Win8 and newer
Current code overestimates the value of max_outstanding_req_per_channel for
Win8 and newer hosts, since vmscsi_size_delta is set to the initial value
of sizeof(vmscsi_win8_extension) rather than zero.  This may lead to wrong
decisions when using ring_avail_percent_lowater equals to zero.  The
estimate of max_outstanding_req_per_channel is 'exact' for Win7 and older
hosts.  A better choice, keeping the algorithm for the estimation simple,
is to err the other way around, i.e., to underestimate for Win7 and older
but to use the exact value for Win8 and newer.

Link: https://lore.kernel.org/r/20201217203321.4539-2-parri.andrea@gmail.com
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Suggested-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:15:00 -05:00
James Smart 181dd9a4c2 scsi: lpfc: Update lpfc version to 12.8.0.7
Update lpfc version to 12.8.0.7

Link: https://lore.kernel.org/r/20210104180240.46824-16-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:37 -05:00
James Smart 0b3ad32e26 scsi: lpfc: Enhancements to LOG_TRACE_EVENT for better readability
While testing recent discovery node rework, several items were seen that
could be done better with respect to the new trace event logic.

1) in the following msg:
      kernel: lpfc 0000:44:00.0: start 35 end 35 cnt 0
   If cnt is zero in the 1st message, there is no reason to display the
   1st message, which is just giving start/end positioning.

   Fix by not displaying message if cnt is 0.

2) If the driver is loaded with module log verbosity off, and later a
   single NPIV host instance verbosity is enabled via sysfs, it enables
   messages on all instances. This is due to the trace log verbosity checks
   (lpfc_dmp_dbg) looking at the phba only. It should look at the phba and
   the vport.

   Fix by enabling a check on both phba and vport.

3) in the following messages:
       2904 Firmware Dump Image Present on Adapter
       2887 Reset Needed: Attempting Port Recovery...
   These messages are not necessary for the trace event log, which is
   primarily for discovery.

   Fix by changing log level on these 2 messages to LOG_SLI.

Link: https://lore.kernel.org/r/20210104180240.46824-15-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:37 -05:00
James Smart a22d73b655 scsi: lpfc: Implement health checking when aborting I/O
Several errors have occurred where the adapter stops or fails but does not
raise the register values for the driver to detect failure. Thus driver is
unaware of the failure. The failure typically results in I/O timeouts, the
I/O timeout handler failing (after several seconds), and the error handler
escalating recovery policy and resulting in more errors. Eventually, the
driver is in a position where things have spiraled and it can't do recovery
because other recovery ops are still outstanding and it becomes unusable.

Resolve the situation by having the I/O timeout handler (actually a els,
SCSI I/O, NVMe ls, or NVMe I/O timeout), in addition to aborting the I/O,
perform a mailbox command and look for a response from the hardware.  If
the mailbox command fails, it will mark the adapter offline and then invoke
the adapter reset handler to clean up.

The new I/O timeout test will be limited to a test every 5s. If there are
multiple I/O timeouts concurrently, only the 1st I/O timeout will generate
the mailbox command. Further testing will only occur once a timeout occurs
after a 5s delay from the last mailbox command has expired.

Link: https://lore.kernel.org/r/20210104180240.46824-14-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:37 -05:00
James Smart 243156c010 scsi: lpfc: Fix crash when nvmet transport calls host_release
When lpfc is running in NVMET mode and supports the NVME-1 addendum
changes, a LIP on a bound NVME Initiator or lipping the lpfc NVMET's link
resulted in an Oops in lpfc_nvmet_host_release.

The fix requires lpfc NVMET to maintain an additional reference on any node
structure that acts as the hosthandle for the NVMET transport.  This
reference get is a one-time addition, is taken prior to the upcall of an
unsolicited LS_REQ, and is released when the NVMET transport releases the
hosthandle during the host_release downcall.

Link: https://lore.kernel.org/r/20210104180240.46824-13-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:37 -05:00
James Smart ff8a44bff5 scsi: lpfc: Fix vport create logging
When with testing with large numbers of npiv vports and link bounces, the
driver is flooding the messages file, even with log_verbose = 0.

The new LOG_TRACE_EVENT messages are still generating events to the
messages files.

Fix by converting the vport create msg from LOG_TRACE_EVENT to LOG_VPORT.

Link: https://lore.kernel.org/r/20210104180240.46824-12-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:36 -05:00
James Smart 9ec58ec7d4 scsi: lpfc: Fix NVMe recovery after mailbox timeout
If a mailbox command times out, the SLI port is deemed in error and the
port is reset.  The HBA cleanup is not returning I/Os to the NVMe layer
before the port is unregistered. This is due to the HBA being marked
offline (!SLI_ACTIVE) and cleanup being done by the mailbox timeout handler
rather than an general adapter reset routine.  The mailbox timeout handler
mailbox handler only cleaned up SCSI I/Os.

Fix by reworking the mailbox handler to:

 - After handling the mailbox error, detect the board is already in
   failure (may be due to another error), and leave cleanup to the
   other handler.

 - If the mailbox command timeout is initial detector of the port error,
   continue with the board cleanup and marking the adapter offline
   (!SLI_ACTIVE). Remove the SCSI-only I/O cleanup routine. The generic
   reset adapter routine that is subsequently invoked, will clean up the
   I/Os.

 - Have the reset adapter routine flush all NVMe and SCSI I/Os if the
   adapter has been marked failed (!SLI_ACTIVE).

 - Rework the NVMe I/O terminate routine to take a status code to fail the
   I/O with and update so that cleaned up I/O calls the wqe completion
   routine. Currently it is bypassing the wqe cleanup and calling the NVMe
   I/O completion directly. The wqe completion routine will take care of
   data structure and node cleanup then call the NVMe I/O completion
   handler.

Link: https://lore.kernel.org/r/20210104180240.46824-11-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:36 -05:00
James Smart 31051249f1 scsi: lpfc: Fix target reset failing
Target reset is failed by the target as an invalid command.

The Target Reset TMF has been obsoleted in T10 for a while, but continues
to be used. On (newer) devices, the TMF is rejected causing the reset
handler to escalate to adapter resets.

Fix by having Target Reset TMF rejections be translated into a LOGO and
re-PLOGI with the target device. This provides the same semantic action
(although, if the device also supports nvme traffic, it will terminate nvme
traffic as well - but it's still recoverable).

Link: https://lore.kernel.org/r/20210104180240.46824-10-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:36 -05:00
James Smart da09ae4864 scsi: lpfc: Fix error log messages being logged following SCSI task mgnt
A successful task mgmt command is logging errors, making it look like
problems were encountered.  This is due to log messages for the
device/target and bus reset handlers having the LOG_TRACE_EVENT flag set.

Fix by adjusting the event flag such that the call to the logging routine
only receives a LOG_TRACE_EVENT if a prior call actually failed.

Link: https://lore.kernel.org/r/20210104180240.46824-9-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:36 -05:00
James Smart f0871ab68a scsi: lpfc: Prevent duplicate requests to unregister with cpuhp framework
In the lpfc offline routine, called for various reasons such as sysfs
attribute, driver unload, or port error, the driver is calling
__lpfc_cpuhp_remove() to destroy the hot plug data. If the offline routine
is called while the driver is in the process of being unloaded, a request
using lpfc_cpuhp_remove() is also made from lpfc_sli4_hba_unset(). The
cpuhp elements are no longer valid when the second removal request is made.

Fix by only calling the cpuhp removal once when the adapter is in the
process of unloading.

Link: https://lore.kernel.org/r/20210104180240.46824-8-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:36 -05:00
James Smart 3ba6216aad scsi: lpfc: Fix FW reset action if I/Os are outstanding
If the port is configured for NVME and has any outstanding IOs when a FW
reset is requesteed, outstanding I/Os are not properly cleaned up. This
causes the fw download request to fail.

Fix by clearing the LPFC_SLI_ACTIVE flag to signify the I/O must be
manually flushed by the driver on port reset.

Link: https://lore.kernel.org/r/20210104180240.46824-7-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:36 -05:00
James Smart c33b160934 scsi: lpfc: Use the nvme-fc transport supplied timeout for LS requests
When lpfc generates a GEN_REQUEST wqe for the nvme LS (such as Create
Association), the timeout is set to R_A_TOV without regard to the timeout
value supplied by the nvme-fc transport. The driver should be setting the
timeout to the value passed into the routine. Additionally the caller
should be setting the timeout value to the value in the ls request set by
the nvme transport. Instead, it unconditionally is setting it to a driver
defined value.  So the driver actually overrode the value twice.

Fix by using the timeout provided to the routine, and for the caller, set
the timeout to the ls request timeout value.

Link: https://lore.kernel.org/r/20210104180240.46824-6-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:35 -05:00
James Smart 07aaefdf75 scsi: lpfc: Fix crash when a fabric node is released prematurely
The driver's management of the fabric controller (aka pseudo-scsi
initiator) node in SLI3 mode is causing this crash. The crash occurs
because of a node reference imbalance that frees the fabric controller node
while devloss is outstanding from the SCSI transport.  This is triggered by
an odd behavior where the switch reacts to a rejected RDP request with a
PLOGI and nothing else, not even a LOGO.  The driver ACKS the PLOGI and
after successfully registering the RPI, incorrectly registers the fabric
controller node because it has the NLP_FC4_FCP flag still set from the
fabric controller PRLI.  If a LIP is issued, the driver attempts to cleanup
on Link Up and ends up executing too many puts.

Fix by detecting the fabric node type and clearing out the nodes internal
flags that triggered a SCSI transport registration and subsequence dev_loss
event.  The driver cannot count on any persistence from fabric controller
nodes.

Link: https://lore.kernel.org/r/20210104180240.46824-5-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:35 -05:00