2019-05-19 21:51:43 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-or-later
|
2010-12-22 05:34:31 +08:00
|
|
|
/*
|
|
|
|
* Linux MegaRAID driver for SAS based RAID controllers
|
|
|
|
*
|
2014-11-17 17:54:03 +08:00
|
|
|
* Copyright (c) 2009-2013 LSI Corporation
|
2018-10-17 14:37:43 +08:00
|
|
|
* Copyright (c) 2013-2016 Avago Technologies
|
|
|
|
* Copyright (c) 2016-2018 Broadcom Inc.
|
2010-12-22 05:34:31 +08:00
|
|
|
*
|
|
|
|
* FILE: megaraid_sas_fusion.c
|
|
|
|
*
|
2018-10-17 14:37:43 +08:00
|
|
|
* Authors: Broadcom Inc.
|
2010-12-22 05:34:31 +08:00
|
|
|
* Sumant Patro
|
2014-11-17 17:54:03 +08:00
|
|
|
* Adam Radford
|
2018-10-17 14:37:43 +08:00
|
|
|
* Kashyap Desai <kashyap.desai@broadcom.com>
|
|
|
|
* Sumit Saxena <sumit.saxena@broadcom.com>
|
2010-12-22 05:34:31 +08:00
|
|
|
*
|
2018-10-17 14:37:43 +08:00
|
|
|
* Send feedback to: megaraidlinux.pdl@broadcom.com
|
2010-12-22 05:34:31 +08:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/types.h>
|
|
|
|
#include <linux/pci.h>
|
|
|
|
#include <linux/list.h>
|
|
|
|
#include <linux/moduleparam.h>
|
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/spinlock.h>
|
|
|
|
#include <linux/interrupt.h>
|
|
|
|
#include <linux/delay.h>
|
|
|
|
#include <linux/uio.h>
|
|
|
|
#include <linux/uaccess.h>
|
|
|
|
#include <linux/fs.h>
|
|
|
|
#include <linux/compat.h>
|
|
|
|
#include <linux/blkdev.h>
|
|
|
|
#include <linux/mutex.h>
|
|
|
|
#include <linux/poll.h>
|
2017-02-10 16:59:17 +08:00
|
|
|
#include <linux/vmalloc.h>
|
2018-10-17 14:37:39 +08:00
|
|
|
#include <linux/workqueue.h>
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
#include <linux/irq_poll.h>
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
#include <scsi/scsi.h>
|
|
|
|
#include <scsi/scsi_cmnd.h>
|
|
|
|
#include <scsi/scsi_device.h>
|
|
|
|
#include <scsi/scsi_host.h>
|
2014-09-12 21:27:58 +08:00
|
|
|
#include <scsi/scsi_dbg.h>
|
2015-04-23 19:02:09 +08:00
|
|
|
#include <linux/dmi.h>
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
#include "megaraid_sas_fusion.h"
|
|
|
|
#include "megaraid_sas.h"
|
|
|
|
|
2015-04-23 19:01:24 +08:00
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
extern void megasas_free_cmds(struct megasas_instance *instance);
|
|
|
|
extern struct megasas_cmd *megasas_get_cmd(struct megasas_instance
|
|
|
|
*instance);
|
|
|
|
extern void
|
|
|
|
megasas_complete_cmd(struct megasas_instance *instance,
|
|
|
|
struct megasas_cmd *cmd, u8 alt_status);
|
|
|
|
int
|
2014-03-10 17:51:56 +08:00
|
|
|
wait_and_poll(struct megasas_instance *instance, struct megasas_cmd *cmd,
|
|
|
|
int seconds);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
void
|
|
|
|
megasas_return_cmd(struct megasas_instance *instance, struct megasas_cmd *cmd);
|
|
|
|
int megasas_alloc_cmds(struct megasas_instance *instance);
|
|
|
|
int
|
2018-12-17 16:47:39 +08:00
|
|
|
megasas_clear_intr_fusion(struct megasas_instance *instance);
|
2010-12-22 05:34:31 +08:00
|
|
|
int
|
|
|
|
megasas_issue_polled(struct megasas_instance *instance,
|
|
|
|
struct megasas_cmd *cmd);
|
2011-02-25 12:56:05 +08:00
|
|
|
void
|
|
|
|
megasas_check_and_restore_queue_depth(struct megasas_instance *instance);
|
|
|
|
|
2011-10-09 09:14:27 +08:00
|
|
|
int megasas_transition_to_ready(struct megasas_instance *instance, int ocr);
|
2010-12-22 05:34:31 +08:00
|
|
|
void megaraid_sas_kill_hba(struct megasas_instance *instance);
|
|
|
|
|
|
|
|
extern u32 megasas_dbg_lvl;
|
2014-03-10 17:51:56 +08:00
|
|
|
int megasas_sriov_start_heartbeat(struct megasas_instance *instance,
|
|
|
|
int initial);
|
2017-10-23 06:30:04 +08:00
|
|
|
void megasas_start_timer(struct megasas_instance *instance);
|
2014-03-10 17:51:56 +08:00
|
|
|
extern struct megasas_mgmt_info megasas_mgmt_info;
|
2016-01-28 23:34:34 +08:00
|
|
|
extern unsigned int resetwaittime;
|
2016-01-28 23:34:30 +08:00
|
|
|
extern unsigned int dual_qdepth_disable;
|
2016-01-28 23:34:28 +08:00
|
|
|
static void megasas_free_rdpq_fusion(struct megasas_instance *instance);
|
|
|
|
static void megasas_free_reply_fusion(struct megasas_instance *instance);
|
2017-10-19 17:49:03 +08:00
|
|
|
static inline
|
|
|
|
void megasas_configure_queue_sizes(struct megasas_instance *instance);
|
2018-10-17 14:37:39 +08:00
|
|
|
static void megasas_fusion_crash_dump(struct megasas_instance *instance);
|
2018-12-17 16:47:40 +08:00
|
|
|
extern u32 megasas_readl(struct megasas_instance *instance,
|
|
|
|
const volatile void __iomem *addr);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2019-05-08 01:05:34 +08:00
|
|
|
/**
|
|
|
|
* megasas_adp_reset_wait_for_ready - initiate chip reset and wait for
|
|
|
|
* controller to come to ready state
|
|
|
|
* @instance - adapter's soft state
|
|
|
|
* @do_adp_reset - If true, do a chip reset
|
|
|
|
* @ocr_context - If called from OCR context this will
|
|
|
|
* be set to 1, else 0
|
|
|
|
*
|
|
|
|
* This function initates a chip reset followed by a wait for controller to
|
|
|
|
* transition to ready state.
|
|
|
|
* During this, driver will block all access to PCI config space from userspace
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
megasas_adp_reset_wait_for_ready(struct megasas_instance *instance,
|
|
|
|
bool do_adp_reset,
|
|
|
|
int ocr_context)
|
|
|
|
{
|
|
|
|
int ret = FAILED;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Block access to PCI config space from userspace
|
|
|
|
* when diag reset is initiated from driver
|
|
|
|
*/
|
2019-05-08 01:05:37 +08:00
|
|
|
if (megasas_dbg_lvl & OCR_DEBUG)
|
2019-05-08 01:05:34 +08:00
|
|
|
dev_info(&instance->pdev->dev,
|
|
|
|
"Block access to PCI config space %s %d\n",
|
|
|
|
__func__, __LINE__);
|
|
|
|
|
|
|
|
pci_cfg_access_lock(instance->pdev);
|
|
|
|
|
|
|
|
if (do_adp_reset) {
|
|
|
|
if (instance->instancet->adp_reset
|
|
|
|
(instance, instance->reg_set))
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Wait for FW to become ready */
|
|
|
|
if (megasas_transition_to_ready(instance, ocr_context)) {
|
|
|
|
dev_warn(&instance->pdev->dev,
|
|
|
|
"Failed to transition controller to ready for scsi%d.\n",
|
|
|
|
instance->host->host_no);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = SUCCESS;
|
|
|
|
out:
|
2019-05-08 01:05:37 +08:00
|
|
|
if (megasas_dbg_lvl & OCR_DEBUG)
|
2019-05-08 01:05:34 +08:00
|
|
|
dev_info(&instance->pdev->dev,
|
|
|
|
"Unlock access to PCI config space %s %d\n",
|
|
|
|
__func__, __LINE__);
|
|
|
|
|
|
|
|
pci_cfg_access_unlock(instance->pdev);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-10-19 17:49:05 +08:00
|
|
|
/**
|
|
|
|
* megasas_check_same_4gb_region - check if allocation
|
|
|
|
* crosses same 4GB boundary or not
|
|
|
|
* @instance - adapter's soft instance
|
|
|
|
* start_addr - start address of DMA allocation
|
|
|
|
* size - size of allocation in bytes
|
|
|
|
* return - true : allocation does not cross same
|
|
|
|
* 4GB boundary
|
|
|
|
* false: allocation crosses same
|
|
|
|
* 4GB boundary
|
|
|
|
*/
|
|
|
|
static inline bool megasas_check_same_4gb_region
|
|
|
|
(struct megasas_instance *instance, dma_addr_t start_addr, size_t size)
|
|
|
|
{
|
|
|
|
dma_addr_t end_addr;
|
|
|
|
|
|
|
|
end_addr = start_addr + size;
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
|
2017-10-19 17:49:05 +08:00
|
|
|
if (upper_32_bits(start_addr) != upper_32_bits(end_addr)) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed to get same 4GB boundary: start_addr: 0x%llx end_addr: 0x%llx\n",
|
|
|
|
(unsigned long long)start_addr,
|
|
|
|
(unsigned long long)end_addr);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/**
|
|
|
|
* megasas_enable_intr_fusion - Enables interrupts
|
|
|
|
* @regs: MFI register set
|
|
|
|
*/
|
|
|
|
void
|
2013-05-22 15:04:14 +08:00
|
|
|
megasas_enable_intr_fusion(struct megasas_instance *instance)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
2013-05-22 15:04:14 +08:00
|
|
|
struct megasas_register_set __iomem *regs;
|
|
|
|
regs = instance->reg_set;
|
2015-01-05 22:36:13 +08:00
|
|
|
|
|
|
|
instance->mask_interrupts = 0;
|
2011-10-09 09:14:50 +08:00
|
|
|
/* For Thunderbolt/Invader also clear intr on enable */
|
|
|
|
writel(~0, ®s->outbound_intr_status);
|
|
|
|
readl(®s->outbound_intr_status);
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
writel(~MFI_FUSION_ENABLE_INTERRUPT_MASK, &(regs)->outbound_intr_mask);
|
|
|
|
|
|
|
|
/* Dummy readl to force pci flush */
|
2019-05-08 01:05:45 +08:00
|
|
|
dev_info(&instance->pdev->dev, "%s is called outbound_intr_mask:0x%08x\n",
|
|
|
|
__func__, readl(®s->outbound_intr_mask));
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_disable_intr_fusion - Disables interrupt
|
|
|
|
* @regs: MFI register set
|
|
|
|
*/
|
|
|
|
void
|
2013-05-22 15:04:14 +08:00
|
|
|
megasas_disable_intr_fusion(struct megasas_instance *instance)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
|
|
|
u32 mask = 0xFFFFFFFF;
|
2013-05-22 15:04:14 +08:00
|
|
|
struct megasas_register_set __iomem *regs;
|
|
|
|
regs = instance->reg_set;
|
|
|
|
instance->mask_interrupts = 1;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
writel(mask, ®s->outbound_intr_mask);
|
|
|
|
/* Dummy readl to force pci flush */
|
2019-05-08 01:05:45 +08:00
|
|
|
dev_info(&instance->pdev->dev, "%s is called outbound_intr_mask:0x%08x\n",
|
|
|
|
__func__, readl(®s->outbound_intr_mask));
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int
|
2018-12-17 16:47:39 +08:00
|
|
|
megasas_clear_intr_fusion(struct megasas_instance *instance)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
|
|
|
u32 status;
|
2018-12-17 16:47:39 +08:00
|
|
|
struct megasas_register_set __iomem *regs;
|
|
|
|
regs = instance->reg_set;
|
2010-12-22 05:34:31 +08:00
|
|
|
/*
|
|
|
|
* Check if it is our interrupt
|
|
|
|
*/
|
2019-01-09 21:08:37 +08:00
|
|
|
status = megasas_readl(instance,
|
|
|
|
®s->outbound_intr_status);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
if (status & 1) {
|
|
|
|
writel(status, ®s->outbound_intr_status);
|
|
|
|
readl(®s->outbound_intr_status);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
if (!(status & MFI_FUSION_ENABLE_INTERRUPT_MASK))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_get_cmd_fusion - Get a command from the free pool
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
*
|
2015-04-23 19:01:24 +08:00
|
|
|
* Returns a blk_tag indexed mpt frame
|
2010-12-22 05:34:31 +08:00
|
|
|
*/
|
2015-04-23 19:01:24 +08:00
|
|
|
inline struct megasas_cmd_fusion *megasas_get_cmd_fusion(struct megasas_instance
|
|
|
|
*instance, u32 blk_tag)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
2015-04-23 19:01:24 +08:00
|
|
|
struct fusion_context *fusion;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2015-04-23 19:01:24 +08:00
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
return fusion->cmd_list[blk_tag];
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_return_cmd_fusion - Return a cmd to free command pool
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @cmd: Command packet to be returned to free command pool
|
|
|
|
*/
|
2014-09-12 21:27:58 +08:00
|
|
|
inline void megasas_return_cmd_fusion(struct megasas_instance *instance,
|
|
|
|
struct megasas_cmd_fusion *cmd)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
|
|
|
cmd->scmd = NULL;
|
2017-02-10 16:59:03 +08:00
|
|
|
memset(cmd->io_request, 0, MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE);
|
|
|
|
cmd->r1_alt_dev_handle = MR_DEVHANDLE_INVALID;
|
|
|
|
cmd->cmd_completed = false;
|
2014-09-12 21:27:58 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
2019-06-25 19:04:19 +08:00
|
|
|
* megasas_write_64bit_req_desc - PCI writes 64bit request descriptor
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @req_desc: 64bit Request descriptor
|
2014-09-12 21:27:58 +08:00
|
|
|
*/
|
2015-04-23 19:04:09 +08:00
|
|
|
static void
|
2019-06-25 19:04:19 +08:00
|
|
|
megasas_write_64bit_req_desc(struct megasas_instance *instance,
|
2017-02-10 16:59:04 +08:00
|
|
|
union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc)
|
2014-09-12 21:27:58 +08:00
|
|
|
{
|
2017-02-10 16:59:04 +08:00
|
|
|
#if defined(writeq) && defined(CONFIG_64BIT)
|
2018-02-14 16:10:52 +08:00
|
|
|
u64 req_data = (((u64)le32_to_cpu(req_desc->u.high) << 32) |
|
|
|
|
le32_to_cpu(req_desc->u.low));
|
|
|
|
writeq(req_data, &instance->reg_set->inbound_low_queue_port);
|
2017-02-10 16:59:04 +08:00
|
|
|
#else
|
2018-02-14 16:10:52 +08:00
|
|
|
unsigned long flags;
|
|
|
|
spin_lock_irqsave(&instance->hba_lock, flags);
|
|
|
|
writel(le32_to_cpu(req_desc->u.low),
|
|
|
|
&instance->reg_set->inbound_low_queue_port);
|
|
|
|
writel(le32_to_cpu(req_desc->u.high),
|
|
|
|
&instance->reg_set->inbound_high_queue_port);
|
|
|
|
spin_unlock_irqrestore(&instance->hba_lock, flags);
|
2015-04-23 19:04:09 +08:00
|
|
|
#endif
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
2019-06-25 19:04:19 +08:00
|
|
|
/**
|
|
|
|
* megasas_fire_cmd_fusion - Sends command to the FW
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @req_desc: 32bit or 64bit Request descriptor
|
|
|
|
*
|
|
|
|
* Perform PCI Write. AERO SERIES supports 32 bit Descriptor.
|
|
|
|
* Prior to AERO_SERIES support 64 bit Descriptor.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
megasas_fire_cmd_fusion(struct megasas_instance *instance,
|
|
|
|
union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc)
|
|
|
|
{
|
|
|
|
if (instance->atomic_desc_support)
|
|
|
|
writel(le32_to_cpu(req_desc->u.low),
|
|
|
|
&instance->reg_set->inbound_single_queue_port);
|
|
|
|
else
|
|
|
|
megasas_write_64bit_req_desc(instance, req_desc);
|
|
|
|
}
|
|
|
|
|
2016-01-28 23:34:30 +08:00
|
|
|
/**
|
|
|
|
* megasas_fusion_update_can_queue - Do all Adapter Queue depth related calculations here
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* fw_boot_context: Whether this function called during probe or after OCR
|
|
|
|
*
|
|
|
|
* This function is only for fusion controllers.
|
|
|
|
* Update host can queue, if firmware downgrade max supported firmware commands.
|
|
|
|
* Firmware upgrade case will be skiped because underlying firmware has
|
|
|
|
* more resource than exposed to the OS.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
megasas_fusion_update_can_queue(struct megasas_instance *instance, int fw_boot_context)
|
|
|
|
{
|
|
|
|
u16 cur_max_fw_cmds = 0;
|
|
|
|
u16 ldio_threshold = 0;
|
|
|
|
|
2018-10-17 14:37:51 +08:00
|
|
|
/* ventura FW does not fill outbound_scratch_pad_2 with queue depth */
|
2017-10-19 17:48:52 +08:00
|
|
|
if (instance->adapter_type < VENTURA_SERIES)
|
2017-01-11 07:20:49 +08:00
|
|
|
cur_max_fw_cmds =
|
2018-12-17 16:47:40 +08:00
|
|
|
megasas_readl(instance,
|
|
|
|
&instance->reg_set->outbound_scratch_pad_2) & 0x00FFFF;
|
2016-01-28 23:34:30 +08:00
|
|
|
|
|
|
|
if (dual_qdepth_disable || !cur_max_fw_cmds)
|
2018-12-17 16:47:39 +08:00
|
|
|
cur_max_fw_cmds = instance->instancet->read_fw_status_reg(instance) & 0x00FFFF;
|
2016-01-28 23:34:30 +08:00
|
|
|
else
|
|
|
|
ldio_threshold =
|
2018-12-17 16:47:39 +08:00
|
|
|
(instance->instancet->read_fw_status_reg(instance) & 0x00FFFF) - MEGASAS_FUSION_IOCTL_CMDS;
|
2016-01-28 23:34:30 +08:00
|
|
|
|
|
|
|
dev_info(&instance->pdev->dev,
|
2017-11-02 18:18:10 +08:00
|
|
|
"Current firmware supports maximum commands: %d\t LDIO threshold: %d\n",
|
2017-10-19 17:49:03 +08:00
|
|
|
cur_max_fw_cmds, ldio_threshold);
|
2016-01-28 23:34:30 +08:00
|
|
|
|
|
|
|
if (fw_boot_context == OCR_CONTEXT) {
|
|
|
|
cur_max_fw_cmds = cur_max_fw_cmds - 1;
|
2017-02-10 16:59:26 +08:00
|
|
|
if (cur_max_fw_cmds < instance->max_fw_cmds) {
|
2016-01-28 23:34:30 +08:00
|
|
|
instance->cur_can_queue =
|
|
|
|
cur_max_fw_cmds - (MEGASAS_FUSION_INTERNAL_CMDS +
|
|
|
|
MEGASAS_FUSION_IOCTL_CMDS);
|
|
|
|
instance->host->can_queue = instance->cur_can_queue;
|
|
|
|
instance->ldio_threshold = ldio_threshold;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
instance->max_fw_cmds = cur_max_fw_cmds;
|
|
|
|
instance->ldio_threshold = ldio_threshold;
|
|
|
|
|
2016-04-15 15:23:30 +08:00
|
|
|
if (reset_devices)
|
|
|
|
instance->max_fw_cmds = min(instance->max_fw_cmds,
|
|
|
|
(u16)MEGASAS_KDUMP_QUEUE_DEPTH);
|
2016-01-28 23:34:30 +08:00
|
|
|
/*
|
|
|
|
* Reduce the max supported cmds by 1. This is to ensure that the
|
|
|
|
* reply_q_sz (1 more than the max cmd that driver may send)
|
|
|
|
* does not exceed max cmds that the FW can support
|
|
|
|
*/
|
|
|
|
instance->max_fw_cmds = instance->max_fw_cmds-1;
|
|
|
|
}
|
|
|
|
}
|
2020-01-14 19:21:21 +08:00
|
|
|
|
|
|
|
static inline void
|
|
|
|
megasas_get_msix_index(struct megasas_instance *instance,
|
|
|
|
struct scsi_cmnd *scmd,
|
|
|
|
struct megasas_cmd_fusion *cmd,
|
|
|
|
u8 data_arms)
|
|
|
|
{
|
|
|
|
int sdev_busy;
|
|
|
|
|
|
|
|
/* nr_hw_queue = 1 for MegaRAID */
|
|
|
|
struct blk_mq_hw_ctx *hctx =
|
|
|
|
scmd->device->request_queue->queue_hw_ctx[0];
|
|
|
|
|
|
|
|
sdev_busy = atomic_read(&hctx->nr_active);
|
|
|
|
|
|
|
|
if (instance->perf_mode == MR_BALANCED_PERF_MODE &&
|
|
|
|
sdev_busy > (data_arms * MR_DEVICE_HIGH_IOPS_DEPTH))
|
|
|
|
cmd->request_desc->SCSIIO.MSIxIndex =
|
|
|
|
mega_mod64((atomic64_add_return(1, &instance->high_iops_outstanding) /
|
|
|
|
MR_HIGH_IOPS_BATCH_COUNT), instance->low_latency_index_start);
|
|
|
|
else if (instance->msix_load_balance)
|
|
|
|
cmd->request_desc->SCSIIO.MSIxIndex =
|
|
|
|
(mega_mod64(atomic64_add_return(1, &instance->total_io_count),
|
|
|
|
instance->msix_vectors));
|
|
|
|
else
|
|
|
|
cmd->request_desc->SCSIIO.MSIxIndex =
|
|
|
|
instance->reply_map[raw_smp_processor_id()];
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/**
|
2016-01-28 23:34:28 +08:00
|
|
|
* megasas_free_cmds_fusion - Free all the cmds in the free cmd pool
|
|
|
|
* @instance: Adapter soft state
|
2010-12-22 05:34:31 +08:00
|
|
|
*/
|
2016-01-28 23:34:28 +08:00
|
|
|
void
|
|
|
|
megasas_free_cmds_fusion(struct megasas_instance *instance)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct fusion_context *fusion = instance->ctrl_context;
|
|
|
|
struct megasas_cmd_fusion *cmd;
|
|
|
|
|
2017-10-19 17:49:05 +08:00
|
|
|
if (fusion->sense)
|
|
|
|
dma_pool_free(fusion->sense_dma_pool, fusion->sense,
|
|
|
|
fusion->sense_phys_addr);
|
|
|
|
|
|
|
|
/* SG */
|
|
|
|
if (fusion->cmd_list) {
|
|
|
|
for (i = 0; i < instance->max_mpt_cmds; i++) {
|
|
|
|
cmd = fusion->cmd_list[i];
|
|
|
|
if (cmd) {
|
|
|
|
if (cmd->sg_frame)
|
|
|
|
dma_pool_free(fusion->sg_dma_pool,
|
|
|
|
cmd->sg_frame,
|
|
|
|
cmd->sg_frame_phys_addr);
|
|
|
|
}
|
|
|
|
kfree(cmd);
|
2016-01-28 23:34:28 +08:00
|
|
|
}
|
2017-10-19 17:49:05 +08:00
|
|
|
kfree(fusion->cmd_list);
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
if (fusion->sg_dma_pool) {
|
2017-07-06 16:13:06 +08:00
|
|
|
dma_pool_destroy(fusion->sg_dma_pool);
|
2016-01-28 23:34:28 +08:00
|
|
|
fusion->sg_dma_pool = NULL;
|
|
|
|
}
|
|
|
|
if (fusion->sense_dma_pool) {
|
2017-07-06 16:13:06 +08:00
|
|
|
dma_pool_destroy(fusion->sense_dma_pool);
|
2016-01-28 23:34:28 +08:00
|
|
|
fusion->sense_dma_pool = NULL;
|
|
|
|
}
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
/* Reply Frame, Desc*/
|
|
|
|
if (instance->is_rdpq)
|
|
|
|
megasas_free_rdpq_fusion(instance);
|
|
|
|
else
|
|
|
|
megasas_free_reply_fusion(instance);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
/* Request Frame, Desc*/
|
2010-12-22 05:34:31 +08:00
|
|
|
if (fusion->req_frames_desc)
|
2016-01-28 23:34:28 +08:00
|
|
|
dma_free_coherent(&instance->pdev->dev,
|
|
|
|
fusion->request_alloc_sz, fusion->req_frames_desc,
|
|
|
|
fusion->req_frames_desc_phys);
|
|
|
|
if (fusion->io_request_frames)
|
2017-07-06 16:13:06 +08:00
|
|
|
dma_pool_free(fusion->io_request_frames_pool,
|
2016-01-28 23:34:28 +08:00
|
|
|
fusion->io_request_frames,
|
|
|
|
fusion->io_request_frames_phys);
|
|
|
|
if (fusion->io_request_frames_pool) {
|
2017-07-06 16:13:06 +08:00
|
|
|
dma_pool_destroy(fusion->io_request_frames_pool);
|
2016-01-28 23:34:28 +08:00
|
|
|
fusion->io_request_frames_pool = NULL;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
2016-01-28 23:34:28 +08:00
|
|
|
* megasas_create_sg_sense_fusion - Creates DMA pool for cmd frames
|
2010-12-22 05:34:31 +08:00
|
|
|
* @instance: Adapter soft state
|
|
|
|
*
|
|
|
|
*/
|
2016-01-28 23:34:28 +08:00
|
|
|
static int megasas_create_sg_sense_fusion(struct megasas_instance *instance)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
|
|
|
int i;
|
2017-02-10 16:59:25 +08:00
|
|
|
u16 max_cmd;
|
2010-12-22 05:34:31 +08:00
|
|
|
struct fusion_context *fusion;
|
|
|
|
struct megasas_cmd_fusion *cmd;
|
2017-10-19 17:49:05 +08:00
|
|
|
int sense_sz;
|
|
|
|
u32 offset;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
max_cmd = instance->max_fw_cmds;
|
2017-10-19 17:49:05 +08:00
|
|
|
sense_sz = instance->max_mpt_cmds * SCSI_SENSE_BUFFERSIZE;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
fusion->sg_dma_pool =
|
2017-07-06 16:13:06 +08:00
|
|
|
dma_pool_create("mr_sg", &instance->pdev->dev,
|
2017-02-10 16:59:10 +08:00
|
|
|
instance->max_chain_frame_sz,
|
|
|
|
MR_DEFAULT_NVME_PAGE_SIZE, 0);
|
2016-01-28 23:34:28 +08:00
|
|
|
/* SCSI_SENSE_BUFFERSIZE = 96 bytes */
|
|
|
|
fusion->sense_dma_pool =
|
2017-07-06 16:13:06 +08:00
|
|
|
dma_pool_create("mr_sense", &instance->pdev->dev,
|
2017-10-19 17:49:05 +08:00
|
|
|
sense_sz, 64, 0);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
if (!fusion->sense_dma_pool || !fusion->sg_dma_pool) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
2010-12-22 05:34:31 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
2017-10-19 17:49:05 +08:00
|
|
|
fusion->sense = dma_pool_alloc(fusion->sense_dma_pool,
|
|
|
|
GFP_KERNEL, &fusion->sense_phys_addr);
|
|
|
|
if (!fusion->sense) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* sense buffer, request frame and reply desc pool requires to be in
|
|
|
|
* same 4 gb region. Below function will check this.
|
|
|
|
* In case of failure, new pci pool will be created with updated
|
|
|
|
* alignment.
|
|
|
|
* Older allocation and pool will be destroyed.
|
|
|
|
* Alignment will be used such a way that next allocation if success,
|
|
|
|
* will always meet same 4gb region requirement.
|
|
|
|
* Actual requirement is not alignment, but we need start and end of
|
|
|
|
* DMA address must have same upper 32 bit address.
|
|
|
|
*/
|
|
|
|
|
|
|
|
if (!megasas_check_same_4gb_region(instance, fusion->sense_phys_addr,
|
|
|
|
sense_sz)) {
|
|
|
|
dma_pool_free(fusion->sense_dma_pool, fusion->sense,
|
|
|
|
fusion->sense_phys_addr);
|
|
|
|
fusion->sense = NULL;
|
|
|
|
dma_pool_destroy(fusion->sense_dma_pool);
|
|
|
|
|
|
|
|
fusion->sense_dma_pool =
|
|
|
|
dma_pool_create("mr_sense_align", &instance->pdev->dev,
|
|
|
|
sense_sz, roundup_pow_of_two(sense_sz),
|
|
|
|
0);
|
|
|
|
if (!fusion->sense_dma_pool) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
fusion->sense = dma_pool_alloc(fusion->sense_dma_pool,
|
|
|
|
GFP_KERNEL,
|
|
|
|
&fusion->sense_phys_addr);
|
|
|
|
if (!fusion->sense) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/*
|
|
|
|
* Allocate and attach a frame to each of the commands in cmd_list
|
|
|
|
*/
|
|
|
|
for (i = 0; i < max_cmd; i++) {
|
|
|
|
cmd = fusion->cmd_list[i];
|
2017-07-06 16:13:06 +08:00
|
|
|
cmd->sg_frame = dma_pool_alloc(fusion->sg_dma_pool,
|
2016-01-28 23:34:28 +08:00
|
|
|
GFP_KERNEL, &cmd->sg_frame_phys_addr);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2017-10-19 17:49:05 +08:00
|
|
|
offset = SCSI_SENSE_BUFFERSIZE * i;
|
|
|
|
cmd->sense = (u8 *)fusion->sense + offset;
|
|
|
|
cmd->sense_phys_addr = fusion->sense_phys_addr + offset;
|
|
|
|
|
|
|
|
if (!cmd->sg_frame) {
|
2016-01-28 23:34:28 +08:00
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
2010-12-22 05:34:31 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
}
|
2017-01-11 07:20:47 +08:00
|
|
|
|
|
|
|
/* create sense buffer for the raid 1/10 fp */
|
|
|
|
for (i = max_cmd; i < instance->max_mpt_cmds; i++) {
|
|
|
|
cmd = fusion->cmd_list[i];
|
2017-10-19 17:49:05 +08:00
|
|
|
offset = SCSI_SENSE_BUFFERSIZE * i;
|
|
|
|
cmd->sense = (u8 *)fusion->sense + offset;
|
|
|
|
cmd->sense_phys_addr = fusion->sense_phys_addr + offset;
|
|
|
|
|
2017-01-11 07:20:47 +08:00
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-07-23 22:34:50 +08:00
|
|
|
static int
|
2016-01-28 23:34:28 +08:00
|
|
|
megasas_alloc_cmdlist_fusion(struct megasas_instance *instance)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
2017-07-21 18:54:09 +08:00
|
|
|
u32 max_mpt_cmd, i, j;
|
2010-12-22 05:34:31 +08:00
|
|
|
struct fusion_context *fusion;
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
2017-01-11 07:20:47 +08:00
|
|
|
max_mpt_cmd = instance->max_mpt_cmds;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
/*
|
|
|
|
* fusion->cmd_list is an array of struct megasas_cmd_fusion pointers.
|
|
|
|
* Allocate the dynamic array first and then allocate individual
|
|
|
|
* commands.
|
|
|
|
*/
|
2017-01-11 07:20:47 +08:00
|
|
|
fusion->cmd_list =
|
treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:
kzalloc(a * b, gfp)
with:
kcalloc(a * b, gfp)
as well as handling cases of:
kzalloc(a * b * c, gfp)
with:
kzalloc(array3_size(a, b, c), gfp)
as it's slightly less ugly than:
kzalloc_array(array_size(a, b), c, gfp)
This does, however, attempt to ignore constant size factors like:
kzalloc(4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
kzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
kzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
- kzalloc
+ kcalloc
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
kzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
kzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
kzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@
(
kzalloc(sizeof(THING) * C2, ...)
|
kzalloc(sizeof(TYPE) * C2, ...)
|
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * E2
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- E1 * E2
+ E1, E2
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-13 05:03:40 +08:00
|
|
|
kcalloc(max_mpt_cmd, sizeof(struct megasas_cmd_fusion *),
|
2017-01-11 07:20:47 +08:00
|
|
|
GFP_KERNEL);
|
2016-01-28 23:34:28 +08:00
|
|
|
if (!fusion->cmd_list) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
2017-01-11 07:20:47 +08:00
|
|
|
for (i = 0; i < max_mpt_cmd; i++) {
|
2016-01-28 23:34:28 +08:00
|
|
|
fusion->cmd_list[i] = kzalloc(sizeof(struct megasas_cmd_fusion),
|
|
|
|
GFP_KERNEL);
|
|
|
|
if (!fusion->cmd_list[i]) {
|
2017-07-21 18:54:09 +08:00
|
|
|
for (j = 0; j < i; j++)
|
|
|
|
kfree(fusion->cmd_list[j]);
|
|
|
|
kfree(fusion->cmd_list);
|
2016-01-28 23:34:28 +08:00
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
}
|
2017-07-21 18:54:09 +08:00
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2019-07-23 22:34:50 +08:00
|
|
|
|
|
|
|
static int
|
2016-01-28 23:34:28 +08:00
|
|
|
megasas_alloc_request_fusion(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
2017-10-19 17:49:03 +08:00
|
|
|
retry_alloc:
|
2016-01-28 23:34:28 +08:00
|
|
|
fusion->io_request_frames_pool =
|
2017-07-06 16:13:06 +08:00
|
|
|
dma_pool_create("mr_ioreq", &instance->pdev->dev,
|
2016-01-28 23:34:28 +08:00
|
|
|
fusion->io_frames_alloc_sz, 16, 0);
|
|
|
|
|
|
|
|
if (!fusion->io_request_frames_pool) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
fusion->io_request_frames =
|
2017-07-06 16:13:06 +08:00
|
|
|
dma_pool_alloc(fusion->io_request_frames_pool,
|
2020-02-04 23:24:13 +08:00
|
|
|
GFP_KERNEL | __GFP_NOWARN,
|
|
|
|
&fusion->io_request_frames_phys);
|
2016-01-28 23:34:28 +08:00
|
|
|
if (!fusion->io_request_frames) {
|
2017-10-19 17:49:03 +08:00
|
|
|
if (instance->max_fw_cmds >= (MEGASAS_REDUCE_QD_COUNT * 2)) {
|
|
|
|
instance->max_fw_cmds -= MEGASAS_REDUCE_QD_COUNT;
|
|
|
|
dma_pool_destroy(fusion->io_request_frames_pool);
|
|
|
|
megasas_configure_queue_sizes(instance);
|
|
|
|
goto retry_alloc;
|
|
|
|
} else {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-10-19 17:49:05 +08:00
|
|
|
if (!megasas_check_same_4gb_region(instance,
|
|
|
|
fusion->io_request_frames_phys,
|
|
|
|
fusion->io_frames_alloc_sz)) {
|
|
|
|
dma_pool_free(fusion->io_request_frames_pool,
|
|
|
|
fusion->io_request_frames,
|
|
|
|
fusion->io_request_frames_phys);
|
|
|
|
fusion->io_request_frames = NULL;
|
|
|
|
dma_pool_destroy(fusion->io_request_frames_pool);
|
|
|
|
|
|
|
|
fusion->io_request_frames_pool =
|
|
|
|
dma_pool_create("mr_ioreq_align",
|
|
|
|
&instance->pdev->dev,
|
|
|
|
fusion->io_frames_alloc_sz,
|
|
|
|
roundup_pow_of_two(fusion->io_frames_alloc_sz),
|
|
|
|
0);
|
|
|
|
|
|
|
|
if (!fusion->io_request_frames_pool) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
fusion->io_request_frames =
|
|
|
|
dma_pool_alloc(fusion->io_request_frames_pool,
|
2020-02-04 23:24:13 +08:00
|
|
|
GFP_KERNEL | __GFP_NOWARN,
|
2017-10-19 17:49:05 +08:00
|
|
|
&fusion->io_request_frames_phys);
|
|
|
|
|
|
|
|
if (!fusion->io_request_frames) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-10-19 17:49:03 +08:00
|
|
|
fusion->req_frames_desc =
|
|
|
|
dma_alloc_coherent(&instance->pdev->dev,
|
|
|
|
fusion->request_alloc_sz,
|
|
|
|
&fusion->req_frames_desc_phys, GFP_KERNEL);
|
|
|
|
if (!fusion->req_frames_desc) {
|
2016-01-28 23:34:28 +08:00
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
2017-10-19 17:49:03 +08:00
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-07-23 22:34:50 +08:00
|
|
|
static int
|
2016-01-28 23:34:28 +08:00
|
|
|
megasas_alloc_reply_fusion(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
int i, count;
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
union MPI2_REPLY_DESCRIPTORS_UNION *reply_desc;
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
2011-10-09 09:15:13 +08:00
|
|
|
count = instance->msix_vectors > 0 ? instance->msix_vectors : 1;
|
2010-12-22 05:34:31 +08:00
|
|
|
fusion->reply_frames_desc_pool =
|
2017-07-06 16:13:06 +08:00
|
|
|
dma_pool_create("mr_reply", &instance->pdev->dev,
|
2011-10-09 09:15:13 +08:00
|
|
|
fusion->reply_alloc_sz * count, 16, 0);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
if (!fusion->reply_frames_desc_pool) {
|
2016-01-28 23:34:28 +08:00
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
fusion->reply_frames_desc[0] =
|
2017-07-06 16:13:06 +08:00
|
|
|
dma_pool_alloc(fusion->reply_frames_desc_pool,
|
2016-01-28 23:34:28 +08:00
|
|
|
GFP_KERNEL, &fusion->reply_frames_desc_phys[0]);
|
|
|
|
if (!fusion->reply_frames_desc[0]) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
2017-10-19 17:49:05 +08:00
|
|
|
|
|
|
|
if (!megasas_check_same_4gb_region(instance,
|
|
|
|
fusion->reply_frames_desc_phys[0],
|
|
|
|
(fusion->reply_alloc_sz * count))) {
|
|
|
|
dma_pool_free(fusion->reply_frames_desc_pool,
|
|
|
|
fusion->reply_frames_desc[0],
|
|
|
|
fusion->reply_frames_desc_phys[0]);
|
|
|
|
fusion->reply_frames_desc[0] = NULL;
|
|
|
|
dma_pool_destroy(fusion->reply_frames_desc_pool);
|
|
|
|
|
|
|
|
fusion->reply_frames_desc_pool =
|
|
|
|
dma_pool_create("mr_reply_align",
|
|
|
|
&instance->pdev->dev,
|
|
|
|
fusion->reply_alloc_sz * count,
|
|
|
|
roundup_pow_of_two(fusion->reply_alloc_sz * count),
|
|
|
|
0);
|
|
|
|
|
|
|
|
if (!fusion->reply_frames_desc_pool) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
fusion->reply_frames_desc[0] =
|
|
|
|
dma_pool_alloc(fusion->reply_frames_desc_pool,
|
|
|
|
GFP_KERNEL,
|
|
|
|
&fusion->reply_frames_desc_phys[0]);
|
|
|
|
|
|
|
|
if (!fusion->reply_frames_desc[0]) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
reply_desc = fusion->reply_frames_desc[0];
|
2011-10-09 09:15:13 +08:00
|
|
|
for (i = 0; i < fusion->reply_q_depth * count; i++, reply_desc++)
|
2015-04-23 19:03:09 +08:00
|
|
|
reply_desc->Words = cpu_to_le64(ULLONG_MAX);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
/* This is not a rdpq mode, but driver still populate
|
|
|
|
* reply_frame_desc array to use same msix index in ISR path.
|
|
|
|
*/
|
|
|
|
for (i = 0; i < (count - 1); i++)
|
|
|
|
fusion->reply_frames_desc[i + 1] =
|
|
|
|
fusion->reply_frames_desc[i] +
|
|
|
|
(fusion->reply_alloc_sz)/sizeof(union MPI2_REPLY_DESCRIPTORS_UNION);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2019-07-23 22:34:50 +08:00
|
|
|
static int
|
2016-01-28 23:34:28 +08:00
|
|
|
megasas_alloc_rdpq_fusion(struct megasas_instance *instance)
|
|
|
|
{
|
2017-10-19 17:49:05 +08:00
|
|
|
int i, j, k, msix_count;
|
2016-01-28 23:34:28 +08:00
|
|
|
struct fusion_context *fusion;
|
|
|
|
union MPI2_REPLY_DESCRIPTORS_UNION *reply_desc;
|
2017-10-19 17:49:05 +08:00
|
|
|
union MPI2_REPLY_DESCRIPTORS_UNION *rdpq_chunk_virt[RDPQ_MAX_CHUNK_COUNT];
|
|
|
|
dma_addr_t rdpq_chunk_phys[RDPQ_MAX_CHUNK_COUNT];
|
|
|
|
u8 dma_alloc_count, abs_index;
|
|
|
|
u32 chunk_size, array_size, offset;
|
2016-01-28 23:34:28 +08:00
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
2017-10-19 17:49:05 +08:00
|
|
|
chunk_size = fusion->reply_alloc_sz * RDPQ_MAX_INDEX_IN_ONE_CHUNK;
|
|
|
|
array_size = sizeof(struct MPI2_IOC_INIT_RDPQ_ARRAY_ENTRY) *
|
|
|
|
MAX_MSIX_QUEUES_FUSION;
|
2016-01-28 23:34:28 +08:00
|
|
|
|
cross-tree: phase out dma_zalloc_coherent()
We already need to zero out memory for dma_alloc_coherent(), as such
using dma_zalloc_coherent() is superflous. Phase it out.
This change was generated with the following Coccinelle SmPL patch:
@ replace_dma_zalloc_coherent @
expression dev, size, data, handle, flags;
@@
-dma_zalloc_coherent(dev, size, handle, flags)
+dma_alloc_coherent(dev, size, handle, flags)
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
[hch: re-ran the script on the latest tree]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-04 16:23:09 +08:00
|
|
|
fusion->rdpq_virt = dma_alloc_coherent(&instance->pdev->dev,
|
|
|
|
array_size, &fusion->rdpq_phys,
|
|
|
|
GFP_KERNEL);
|
2016-01-28 23:34:28 +08:00
|
|
|
if (!fusion->rdpq_virt) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
2017-10-19 17:49:05 +08:00
|
|
|
msix_count = instance->msix_vectors > 0 ? instance->msix_vectors : 1;
|
|
|
|
|
2017-07-06 16:13:06 +08:00
|
|
|
fusion->reply_frames_desc_pool = dma_pool_create("mr_rdpq",
|
|
|
|
&instance->pdev->dev,
|
2017-10-19 17:49:05 +08:00
|
|
|
chunk_size, 16, 0);
|
|
|
|
fusion->reply_frames_desc_pool_align =
|
|
|
|
dma_pool_create("mr_rdpq_align",
|
|
|
|
&instance->pdev->dev,
|
|
|
|
chunk_size,
|
|
|
|
roundup_pow_of_two(chunk_size),
|
|
|
|
0);
|
|
|
|
|
|
|
|
if (!fusion->reply_frames_desc_pool ||
|
|
|
|
!fusion->reply_frames_desc_pool_align) {
|
2016-01-28 23:34:28 +08:00
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
2017-10-19 17:49:05 +08:00
|
|
|
/*
|
|
|
|
* For INVADER_SERIES each set of 8 reply queues(0-7, 8-15, ..) and
|
|
|
|
* VENTURA_SERIES each set of 16 reply queues(0-15, 16-31, ..) should be
|
|
|
|
* within 4GB boundary and also reply queues in a set must have same
|
|
|
|
* upper 32-bits in their memory address. so here driver is allocating the
|
|
|
|
* DMA'able memory for reply queues according. Driver uses limitation of
|
|
|
|
* VENTURA_SERIES to manage INVADER_SERIES as well.
|
|
|
|
*/
|
|
|
|
dma_alloc_count = DIV_ROUND_UP(msix_count, RDPQ_MAX_INDEX_IN_ONE_CHUNK);
|
|
|
|
|
|
|
|
for (i = 0; i < dma_alloc_count; i++) {
|
|
|
|
rdpq_chunk_virt[i] =
|
|
|
|
dma_pool_alloc(fusion->reply_frames_desc_pool,
|
|
|
|
GFP_KERNEL, &rdpq_chunk_phys[i]);
|
|
|
|
if (!rdpq_chunk_virt[i]) {
|
2016-01-28 23:34:28 +08:00
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n", __func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
2017-10-19 17:49:05 +08:00
|
|
|
/* reply desc pool requires to be in same 4 gb region.
|
|
|
|
* Below function will check this.
|
|
|
|
* In case of failure, new pci pool will be created with updated
|
|
|
|
* alignment.
|
|
|
|
* For RDPQ buffers, driver always allocate two separate pci pool.
|
|
|
|
* Alignment will be used such a way that next allocation if
|
|
|
|
* success, will always meet same 4gb region requirement.
|
|
|
|
* rdpq_tracker keep track of each buffer's physical,
|
|
|
|
* virtual address and pci pool descriptor. It will help driver
|
|
|
|
* while freeing the resources.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
if (!megasas_check_same_4gb_region(instance, rdpq_chunk_phys[i],
|
|
|
|
chunk_size)) {
|
|
|
|
dma_pool_free(fusion->reply_frames_desc_pool,
|
|
|
|
rdpq_chunk_virt[i],
|
|
|
|
rdpq_chunk_phys[i]);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2017-10-19 17:49:05 +08:00
|
|
|
rdpq_chunk_virt[i] =
|
|
|
|
dma_pool_alloc(fusion->reply_frames_desc_pool_align,
|
|
|
|
GFP_KERNEL, &rdpq_chunk_phys[i]);
|
|
|
|
if (!rdpq_chunk_virt[i]) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n",
|
|
|
|
__func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
fusion->rdpq_tracker[i].dma_pool_ptr =
|
|
|
|
fusion->reply_frames_desc_pool_align;
|
|
|
|
} else {
|
|
|
|
fusion->rdpq_tracker[i].dma_pool_ptr =
|
|
|
|
fusion->reply_frames_desc_pool;
|
|
|
|
}
|
2016-01-28 23:34:28 +08:00
|
|
|
|
2017-10-19 17:49:05 +08:00
|
|
|
fusion->rdpq_tracker[i].pool_entry_phys = rdpq_chunk_phys[i];
|
|
|
|
fusion->rdpq_tracker[i].pool_entry_virt = rdpq_chunk_virt[i];
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
2017-10-19 17:49:05 +08:00
|
|
|
|
|
|
|
for (k = 0; k < dma_alloc_count; k++) {
|
|
|
|
for (i = 0; i < RDPQ_MAX_INDEX_IN_ONE_CHUNK; i++) {
|
|
|
|
abs_index = (k * RDPQ_MAX_INDEX_IN_ONE_CHUNK) + i;
|
|
|
|
|
|
|
|
if (abs_index == msix_count)
|
|
|
|
break;
|
|
|
|
offset = fusion->reply_alloc_sz * i;
|
|
|
|
fusion->rdpq_virt[abs_index].RDPQBaseAddress =
|
|
|
|
cpu_to_le64(rdpq_chunk_phys[k] + offset);
|
|
|
|
fusion->reply_frames_desc_phys[abs_index] =
|
|
|
|
rdpq_chunk_phys[k] + offset;
|
|
|
|
fusion->reply_frames_desc[abs_index] =
|
|
|
|
(union MPI2_REPLY_DESCRIPTORS_UNION *)((u8 *)rdpq_chunk_virt[k] + offset);
|
|
|
|
|
|
|
|
reply_desc = fusion->reply_frames_desc[abs_index];
|
|
|
|
for (j = 0; j < fusion->reply_q_depth; j++, reply_desc++)
|
|
|
|
reply_desc->Words = ULLONG_MAX;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
return 0;
|
|
|
|
}
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
static void
|
|
|
|
megasas_free_rdpq_fusion(struct megasas_instance *instance) {
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
int i;
|
|
|
|
struct fusion_context *fusion;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
2017-10-19 17:49:05 +08:00
|
|
|
for (i = 0; i < RDPQ_MAX_CHUNK_COUNT; i++) {
|
|
|
|
if (fusion->rdpq_tracker[i].pool_entry_virt)
|
|
|
|
dma_pool_free(fusion->rdpq_tracker[i].dma_pool_ptr,
|
|
|
|
fusion->rdpq_tracker[i].pool_entry_virt,
|
|
|
|
fusion->rdpq_tracker[i].pool_entry_phys);
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
2018-12-03 04:52:11 +08:00
|
|
|
dma_pool_destroy(fusion->reply_frames_desc_pool);
|
|
|
|
dma_pool_destroy(fusion->reply_frames_desc_pool_align);
|
2016-01-28 23:34:28 +08:00
|
|
|
|
|
|
|
if (fusion->rdpq_virt)
|
2018-10-11 01:31:25 +08:00
|
|
|
dma_free_coherent(&instance->pdev->dev,
|
2016-01-28 23:34:28 +08:00
|
|
|
sizeof(struct MPI2_IOC_INIT_RDPQ_ARRAY_ENTRY) * MAX_MSIX_QUEUES_FUSION,
|
|
|
|
fusion->rdpq_virt, fusion->rdpq_phys);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
megasas_free_reply_fusion(struct megasas_instance *instance) {
|
|
|
|
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
|
|
|
if (fusion->reply_frames_desc[0])
|
2017-07-06 16:13:06 +08:00
|
|
|
dma_pool_free(fusion->reply_frames_desc_pool,
|
2016-01-28 23:34:28 +08:00
|
|
|
fusion->reply_frames_desc[0],
|
|
|
|
fusion->reply_frames_desc_phys[0]);
|
|
|
|
|
2018-12-03 04:52:11 +08:00
|
|
|
dma_pool_destroy(fusion->reply_frames_desc_pool);
|
2016-01-28 23:34:28 +08:00
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_alloc_cmds_fusion - Allocates the command packets
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* Each frame has a 32-bit field called context. This context is used to get
|
|
|
|
* back the megasas_cmd_fusion from the frame when a frame gets completed
|
|
|
|
* In this driver, the 32 bit values are the indices into an array cmd_list.
|
|
|
|
* This array is used only to look up the megasas_cmd_fusion given the context.
|
|
|
|
* The free commands themselves are maintained in a linked list called cmd_pool.
|
|
|
|
*
|
|
|
|
* cmds are formed in the io_request and sg_frame members of the
|
|
|
|
* megasas_cmd_fusion. The context field is used to get a request descriptor
|
|
|
|
* and is used as SMID of the cmd.
|
|
|
|
* SMID value range is from 1 to max_fw_cmds.
|
|
|
|
*/
|
2019-07-23 22:34:50 +08:00
|
|
|
static int
|
2016-01-28 23:34:28 +08:00
|
|
|
megasas_alloc_cmds_fusion(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
struct megasas_cmd_fusion *cmd;
|
|
|
|
u32 offset;
|
|
|
|
dma_addr_t io_req_base_phys;
|
|
|
|
u8 *io_req_base;
|
|
|
|
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
|
|
|
if (megasas_alloc_request_fusion(instance))
|
|
|
|
goto fail_exit;
|
|
|
|
|
|
|
|
if (instance->is_rdpq) {
|
|
|
|
if (megasas_alloc_rdpq_fusion(instance))
|
|
|
|
goto fail_exit;
|
|
|
|
} else
|
|
|
|
if (megasas_alloc_reply_fusion(instance))
|
|
|
|
goto fail_exit;
|
|
|
|
|
2017-10-19 17:49:03 +08:00
|
|
|
if (megasas_alloc_cmdlist_fusion(instance))
|
|
|
|
goto fail_exit;
|
|
|
|
|
|
|
|
dev_info(&instance->pdev->dev, "Configured max firmware commands: %d\n",
|
|
|
|
instance->max_fw_cmds);
|
2016-01-28 23:34:28 +08:00
|
|
|
|
|
|
|
/* The first 256 bytes (SMID 0) is not used. Don't add to the cmd list */
|
|
|
|
io_req_base = fusion->io_request_frames + MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE;
|
|
|
|
io_req_base_phys = fusion->io_request_frames_phys + MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Add all the commands to command pool (fusion->cmd_pool)
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* SMID 0 is reserved. Set SMID/index from 1 */
|
2017-01-11 07:20:47 +08:00
|
|
|
for (i = 0; i < instance->max_mpt_cmds; i++) {
|
2010-12-22 05:34:31 +08:00
|
|
|
cmd = fusion->cmd_list[i];
|
|
|
|
offset = MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE * i;
|
|
|
|
memset(cmd, 0, sizeof(struct megasas_cmd_fusion));
|
|
|
|
cmd->index = i + 1;
|
|
|
|
cmd->scmd = NULL;
|
2017-01-11 07:20:47 +08:00
|
|
|
cmd->sync_cmd_idx =
|
|
|
|
(i >= instance->max_scsi_cmds && i < instance->max_fw_cmds) ?
|
2015-04-23 19:01:24 +08:00
|
|
|
(i - instance->max_scsi_cmds) :
|
|
|
|
(u32)ULONG_MAX; /* Set to Invalid */
|
2010-12-22 05:34:31 +08:00
|
|
|
cmd->instance = instance;
|
|
|
|
cmd->io_request =
|
|
|
|
(struct MPI2_RAID_SCSI_IO_REQUEST *)
|
|
|
|
(io_req_base + offset);
|
|
|
|
memset(cmd->io_request, 0,
|
|
|
|
sizeof(struct MPI2_RAID_SCSI_IO_REQUEST));
|
|
|
|
cmd->io_request_phys_addr = io_req_base_phys + offset;
|
2017-02-10 16:59:03 +08:00
|
|
|
cmd->r1_alt_dev_handle = MR_DEVHANDLE_INVALID;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
if (megasas_create_sg_sense_fusion(instance))
|
|
|
|
goto fail_exit;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
fail_exit:
|
|
|
|
megasas_free_cmds_fusion(instance);
|
2010-12-22 05:34:31 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* wait_and_poll - Issues a polling command
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @cmd: Command packet to be issued
|
|
|
|
*
|
|
|
|
* For polling, MFI requires the cmd_status to be set to 0xFF before posting.
|
|
|
|
*/
|
|
|
|
int
|
2014-03-10 17:51:56 +08:00
|
|
|
wait_and_poll(struct megasas_instance *instance, struct megasas_cmd *cmd,
|
|
|
|
int seconds)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct megasas_header *frame_hdr = &cmd->frame->hdr;
|
2019-06-25 19:04:24 +08:00
|
|
|
u32 status_reg;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2014-03-10 17:51:56 +08:00
|
|
|
u32 msecs = seconds * 1000;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Wait for cmd_status to change
|
|
|
|
*/
|
|
|
|
for (i = 0; (i < msecs) && (frame_hdr->cmd_status == 0xff); i += 20) {
|
|
|
|
rmb();
|
|
|
|
msleep(20);
|
2019-06-25 19:04:24 +08:00
|
|
|
if (!(i % 5000)) {
|
|
|
|
status_reg = instance->instancet->read_fw_status_reg(instance)
|
|
|
|
& MFI_STATE_MASK;
|
|
|
|
if (status_reg == MFI_STATE_FAULT)
|
|
|
|
break;
|
|
|
|
}
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
2016-01-28 23:34:23 +08:00
|
|
|
if (frame_hdr->cmd_status == MFI_STAT_INVALID_STATUS)
|
|
|
|
return DCMD_TIMEOUT;
|
|
|
|
else if (frame_hdr->cmd_status == MFI_STAT_OK)
|
|
|
|
return DCMD_SUCCESS;
|
|
|
|
else
|
|
|
|
return DCMD_FAILED;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_ioc_init_fusion - Initializes the FW
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
*
|
|
|
|
* Issues the IOC Init cmd
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
megasas_ioc_init_fusion(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
struct megasas_init_frame *init_frame;
|
2016-01-28 23:34:28 +08:00
|
|
|
struct MPI2_IOC_INIT_REQUEST *IOCInitMessage = NULL;
|
2010-12-22 05:34:31 +08:00
|
|
|
dma_addr_t ioc_init_handle;
|
|
|
|
struct megasas_cmd *cmd;
|
2016-01-28 23:34:28 +08:00
|
|
|
u8 ret, cur_rdpq_mode;
|
2010-12-22 05:34:31 +08:00
|
|
|
struct fusion_context *fusion;
|
2014-03-10 17:51:36 +08:00
|
|
|
union MEGASAS_REQUEST_DESCRIPTOR_UNION req_desc;
|
2010-12-22 05:34:31 +08:00
|
|
|
int i;
|
|
|
|
struct megasas_header *frame_hdr;
|
2015-04-23 19:02:09 +08:00
|
|
|
const char *sys_info;
|
2015-08-31 19:53:21 +08:00
|
|
|
MFI_CAPABILITIES *drv_ops;
|
2018-10-17 14:37:51 +08:00
|
|
|
u32 scratch_pad_1;
|
2018-01-17 22:48:51 +08:00
|
|
|
ktime_t time;
|
2017-10-19 17:49:05 +08:00
|
|
|
bool cur_fw_64bit_dma_capable;
|
2019-06-25 19:04:31 +08:00
|
|
|
bool cur_intr_coalescing;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
2017-10-19 17:48:56 +08:00
|
|
|
ioc_init_handle = fusion->ioc_init_request_phys;
|
|
|
|
IOCInitMessage = fusion->ioc_init_request;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2017-10-19 17:49:01 +08:00
|
|
|
cmd = fusion->ioc_init_cmd;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2018-12-17 16:47:40 +08:00
|
|
|
scratch_pad_1 = megasas_readl
|
|
|
|
(instance, &instance->reg_set->outbound_scratch_pad_1);
|
2016-01-28 23:34:28 +08:00
|
|
|
|
2018-10-17 14:37:51 +08:00
|
|
|
cur_rdpq_mode = (scratch_pad_1 & MR_RDPQ_MODE_OFFSET) ? 1 : 0;
|
2016-01-28 23:34:28 +08:00
|
|
|
|
2017-10-19 17:49:05 +08:00
|
|
|
if (instance->adapter_type == INVADER_SERIES) {
|
|
|
|
cur_fw_64bit_dma_capable =
|
2018-10-17 14:37:51 +08:00
|
|
|
(scratch_pad_1 & MR_CAN_HANDLE_64_BIT_DMA_OFFSET) ? true : false;
|
2017-10-19 17:49:05 +08:00
|
|
|
|
|
|
|
if (instance->consistent_mask_64bit && !cur_fw_64bit_dma_capable) {
|
|
|
|
dev_err(&instance->pdev->dev, "Driver was operating on 64bit "
|
|
|
|
"DMA mask, but upcoming FW does not support 64bit DMA mask\n");
|
|
|
|
megaraid_sas_kill_hba(instance);
|
|
|
|
ret = 1;
|
|
|
|
goto fail_fw_init;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
if (instance->is_rdpq && !cur_rdpq_mode) {
|
|
|
|
dev_err(&instance->pdev->dev, "Firmware downgrade *NOT SUPPORTED*"
|
|
|
|
" from RDPQ mode to non RDPQ mode\n");
|
|
|
|
ret = 1;
|
|
|
|
goto fail_fw_init;
|
|
|
|
}
|
|
|
|
|
2019-06-25 19:04:31 +08:00
|
|
|
cur_intr_coalescing = (scratch_pad_1 & MR_INTR_COALESCING_SUPPORT_OFFSET) ?
|
|
|
|
true : false;
|
|
|
|
|
|
|
|
if ((instance->low_latency_index_start ==
|
|
|
|
MR_HIGH_IOPS_QUEUE_COUNT) && cur_intr_coalescing)
|
2019-06-25 19:04:35 +08:00
|
|
|
instance->perf_mode = MR_BALANCED_PERF_MODE;
|
2019-06-25 19:04:31 +08:00
|
|
|
|
2019-06-25 19:04:35 +08:00
|
|
|
dev_info(&instance->pdev->dev, "Performance mode :%s\n",
|
|
|
|
MEGASAS_PERF_MODE_2STR(instance->perf_mode));
|
2019-06-25 19:04:31 +08:00
|
|
|
|
2018-10-17 14:37:51 +08:00
|
|
|
instance->fw_sync_cache_support = (scratch_pad_1 &
|
2016-10-21 21:33:33 +08:00
|
|
|
MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : 0;
|
|
|
|
dev_info(&instance->pdev->dev, "FW supports sync cache\t: %s\n",
|
|
|
|
instance->fw_sync_cache_support ? "Yes" : "No");
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
memset(IOCInitMessage, 0, sizeof(struct MPI2_IOC_INIT_REQUEST));
|
|
|
|
|
|
|
|
IOCInitMessage->Function = MPI2_FUNCTION_IOC_INIT;
|
|
|
|
IOCInitMessage->WhoInit = MPI2_WHOINIT_HOST_DRIVER;
|
2013-09-06 18:20:52 +08:00
|
|
|
IOCInitMessage->MsgVersion = cpu_to_le16(MPI2_VERSION);
|
|
|
|
IOCInitMessage->HeaderVersion = cpu_to_le16(MPI2_HEADER_VERSION);
|
|
|
|
IOCInitMessage->SystemRequestFrameSize = cpu_to_le16(MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE / 4);
|
|
|
|
|
|
|
|
IOCInitMessage->ReplyDescriptorPostQueueDepth = cpu_to_le16(fusion->reply_q_depth);
|
2016-01-28 23:34:28 +08:00
|
|
|
IOCInitMessage->ReplyDescriptorPostQueueAddress = instance->is_rdpq ?
|
|
|
|
cpu_to_le64(fusion->rdpq_phys) :
|
|
|
|
cpu_to_le64(fusion->reply_frames_desc_phys[0]);
|
|
|
|
IOCInitMessage->MsgFlags = instance->is_rdpq ?
|
|
|
|
MPI2_IOCINIT_MSGFLAG_RDPQ_ARRAY_MODE : 0;
|
2013-09-06 18:20:52 +08:00
|
|
|
IOCInitMessage->SystemRequestFrameBaseAddress = cpu_to_le64(fusion->io_request_frames_phys);
|
2017-10-19 17:49:05 +08:00
|
|
|
IOCInitMessage->SenseBufferAddressHigh = cpu_to_le32(upper_32_bits(fusion->sense_phys_addr));
|
2012-03-20 10:49:53 +08:00
|
|
|
IOCInitMessage->HostMSIxVectors = instance->msix_vectors;
|
2017-02-10 16:59:10 +08:00
|
|
|
IOCInitMessage->HostPageSize = MR_DEFAULT_NVME_PAGE_SHIFT;
|
2017-10-19 17:49:00 +08:00
|
|
|
|
2018-01-17 22:48:51 +08:00
|
|
|
time = ktime_get_real();
|
2017-10-19 17:49:00 +08:00
|
|
|
/* Convert to milliseconds as per FW requirement */
|
2018-01-17 22:48:51 +08:00
|
|
|
IOCInitMessage->TimeStamp = cpu_to_le64(ktime_to_ms(time));
|
2017-10-19 17:49:00 +08:00
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
init_frame = (struct megasas_init_frame *)cmd->frame;
|
2018-01-05 21:27:37 +08:00
|
|
|
memset(init_frame, 0, IOC_INIT_FRAME_SIZE);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
frame_hdr = &cmd->frame->hdr;
|
|
|
|
frame_hdr->cmd_status = 0xFF;
|
2018-10-17 14:37:53 +08:00
|
|
|
frame_hdr->flags |= cpu_to_le16(MFI_FRAME_DONT_POST_IN_REPLY_QUEUE);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
init_frame->cmd = MFI_CMD_INIT;
|
|
|
|
init_frame->cmd_status = 0xFF;
|
|
|
|
|
2015-08-31 19:53:21 +08:00
|
|
|
drv_ops = (MFI_CAPABILITIES *) &(init_frame->driver_operations);
|
|
|
|
|
2013-05-22 15:04:14 +08:00
|
|
|
/* driver support Extended MSIX */
|
2017-10-19 17:48:48 +08:00
|
|
|
if (instance->adapter_type >= INVADER_SERIES)
|
2015-08-31 19:53:21 +08:00
|
|
|
drv_ops->mfi_capabilities.support_additional_msix = 1;
|
2013-09-07 06:27:14 +08:00
|
|
|
/* driver supports HA / Remote LUN over Fast Path interface */
|
2015-08-31 19:53:21 +08:00
|
|
|
drv_ops->mfi_capabilities.support_fp_remote_lun = 1;
|
|
|
|
|
|
|
|
drv_ops->mfi_capabilities.support_max_255lds = 1;
|
|
|
|
drv_ops->mfi_capabilities.support_ndrive_r1_lb = 1;
|
|
|
|
drv_ops->mfi_capabilities.security_protocol_cmds_fw = 1;
|
|
|
|
|
2015-08-31 19:53:31 +08:00
|
|
|
if (instance->max_chain_frame_sz > MEGASAS_CHAIN_FRAME_SZ_MIN)
|
|
|
|
drv_ops->mfi_capabilities.support_ext_io_size = 1;
|
|
|
|
|
2016-01-28 23:34:27 +08:00
|
|
|
drv_ops->mfi_capabilities.support_fp_rlbypass = 1;
|
2016-01-28 23:34:30 +08:00
|
|
|
if (!dual_qdepth_disable)
|
|
|
|
drv_ops->mfi_capabilities.support_ext_queue_depth = 1;
|
2016-01-28 23:34:27 +08:00
|
|
|
|
2016-01-28 23:34:31 +08:00
|
|
|
drv_ops->mfi_capabilities.support_qd_throttling = 1;
|
2017-01-11 07:20:52 +08:00
|
|
|
drv_ops->mfi_capabilities.support_pd_map_target_id = 1;
|
2018-01-05 21:33:04 +08:00
|
|
|
drv_ops->mfi_capabilities.support_nvme_passthru = 1;
|
2019-01-29 17:38:14 +08:00
|
|
|
drv_ops->mfi_capabilities.support_fw_exposed_dev_list = 1;
|
2017-10-19 17:49:05 +08:00
|
|
|
|
|
|
|
if (instance->consistent_mask_64bit)
|
|
|
|
drv_ops->mfi_capabilities.support_64bit_mode = 1;
|
|
|
|
|
2013-09-06 18:20:52 +08:00
|
|
|
/* Convert capability to LE32 */
|
|
|
|
cpu_to_le32s((u32 *)&init_frame->driver_operations.mfi_capabilities);
|
2013-09-07 06:27:14 +08:00
|
|
|
|
2015-04-23 19:02:09 +08:00
|
|
|
sys_info = dmi_get_system_info(DMI_PRODUCT_UUID);
|
|
|
|
if (instance->system_info_buf && sys_info) {
|
|
|
|
memcpy(instance->system_info_buf->systemId, sys_info,
|
|
|
|
strlen(sys_info) > 64 ? 64 : strlen(sys_info));
|
|
|
|
instance->system_info_buf->systemIdLength =
|
|
|
|
strlen(sys_info) > 64 ? 64 : strlen(sys_info);
|
2017-10-19 17:49:05 +08:00
|
|
|
init_frame->system_info_lo = cpu_to_le32(lower_32_bits(instance->system_info_h));
|
|
|
|
init_frame->system_info_hi = cpu_to_le32(upper_32_bits(instance->system_info_h));
|
2015-04-23 19:02:09 +08:00
|
|
|
}
|
|
|
|
|
2014-01-16 18:25:33 +08:00
|
|
|
init_frame->queue_info_new_phys_addr_hi =
|
|
|
|
cpu_to_le32(upper_32_bits(ioc_init_handle));
|
|
|
|
init_frame->queue_info_new_phys_addr_lo =
|
|
|
|
cpu_to_le32(lower_32_bits(ioc_init_handle));
|
2013-09-06 18:20:52 +08:00
|
|
|
init_frame->data_xfer_len = cpu_to_le32(sizeof(struct MPI2_IOC_INIT_REQUEST));
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2019-06-25 19:04:32 +08:00
|
|
|
/*
|
|
|
|
* Each bit in replyqueue_mask represents one group of MSI-x vectors
|
|
|
|
* (each group has 8 vectors)
|
|
|
|
*/
|
2019-06-25 19:04:35 +08:00
|
|
|
switch (instance->perf_mode) {
|
|
|
|
case MR_BALANCED_PERF_MODE:
|
2019-06-25 19:04:32 +08:00
|
|
|
init_frame->replyqueue_mask =
|
2019-06-25 19:04:35 +08:00
|
|
|
cpu_to_le16(~(~0 << instance->low_latency_index_start/8));
|
|
|
|
break;
|
|
|
|
case MR_IOPS_PERF_MODE:
|
|
|
|
init_frame->replyqueue_mask =
|
|
|
|
cpu_to_le16(~(~0 << instance->msix_vectors/8));
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2019-06-25 19:04:32 +08:00
|
|
|
|
2015-01-05 22:35:58 +08:00
|
|
|
req_desc.u.low = cpu_to_le32(lower_32_bits(cmd->frame_phys_addr));
|
|
|
|
req_desc.u.high = cpu_to_le32(upper_32_bits(cmd->frame_phys_addr));
|
2014-03-10 17:51:36 +08:00
|
|
|
req_desc.MFAIo.RequestFlags =
|
2010-12-22 05:34:31 +08:00
|
|
|
(MEGASAS_REQ_DESCRIPT_FLAGS_MFA <<
|
2015-01-05 22:35:58 +08:00
|
|
|
MEGASAS_REQ_DESCRIPT_FLAGS_TYPE_SHIFT);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* disable the intr before firing the init frame
|
|
|
|
*/
|
2013-05-22 15:04:14 +08:00
|
|
|
instance->instancet->disable_intr(instance);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
for (i = 0; i < (10 * 1000); i += 20) {
|
2018-12-17 16:47:40 +08:00
|
|
|
if (megasas_readl(instance, &instance->reg_set->doorbell) & 1)
|
2010-12-22 05:34:31 +08:00
|
|
|
msleep(20);
|
|
|
|
else
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2019-06-25 19:04:19 +08:00
|
|
|
/* For AERO also, IOC_INIT requires 64 bit descriptor write */
|
|
|
|
megasas_write_64bit_req_desc(instance, &req_desc);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2018-10-17 14:37:54 +08:00
|
|
|
wait_and_poll(instance, cmd, MFI_IO_TIMEOUT_SECS);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
frame_hdr = &cmd->frame->hdr;
|
|
|
|
if (frame_hdr->cmd_status != 0) {
|
|
|
|
ret = 1;
|
|
|
|
goto fail_fw_init;
|
|
|
|
}
|
|
|
|
|
2019-06-25 19:04:19 +08:00
|
|
|
if (instance->adapter_type >= AERO_SERIES) {
|
|
|
|
scratch_pad_1 = megasas_readl
|
|
|
|
(instance, &instance->reg_set->outbound_scratch_pad_1);
|
|
|
|
|
|
|
|
instance->atomic_desc_support =
|
|
|
|
(scratch_pad_1 & MR_ATOMIC_DESCRIPTOR_SUPPORT_OFFSET) ? 1 : 0;
|
|
|
|
|
|
|
|
dev_info(&instance->pdev->dev, "FW supports atomic descriptor\t: %s\n",
|
|
|
|
instance->atomic_desc_support ? "Yes" : "No");
|
|
|
|
}
|
|
|
|
|
2018-03-22 05:04:12 +08:00
|
|
|
return 0;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
fail_fw_init:
|
2017-08-23 19:47:06 +08:00
|
|
|
dev_err(&instance->pdev->dev,
|
2018-03-22 05:04:12 +08:00
|
|
|
"Init cmd return status FAILED for SCSI host %d\n",
|
|
|
|
instance->host->host_no);
|
2017-08-23 19:47:06 +08:00
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2015-08-31 19:53:11 +08:00
|
|
|
/**
|
|
|
|
* megasas_sync_pd_seq_num - JBOD SEQ MAP
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @pend: set to 1, if it is pended jbod map.
|
|
|
|
*
|
|
|
|
* Issue Jbod map to the firmware. If it is pended command,
|
|
|
|
* issue command and return. If it is first instance of jbod map
|
|
|
|
* issue and receive command.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
megasas_sync_pd_seq_num(struct megasas_instance *instance, bool pend) {
|
|
|
|
int ret = 0;
|
2019-06-08 02:40:53 +08:00
|
|
|
size_t pd_seq_map_sz;
|
2015-08-31 19:53:11 +08:00
|
|
|
struct megasas_cmd *cmd;
|
|
|
|
struct megasas_dcmd_frame *dcmd;
|
|
|
|
struct fusion_context *fusion = instance->ctrl_context;
|
|
|
|
struct MR_PD_CFG_SEQ_NUM_SYNC *pd_sync;
|
|
|
|
dma_addr_t pd_seq_h;
|
|
|
|
|
|
|
|
pd_sync = (void *)fusion->pd_seq_sync[(instance->pd_seq_map_id & 1)];
|
|
|
|
pd_seq_h = fusion->pd_seq_phys[(instance->pd_seq_map_id & 1)];
|
2019-06-08 02:40:53 +08:00
|
|
|
pd_seq_map_sz = struct_size(pd_sync, seq, MAX_PHYSICAL_DEVICES - 1);
|
2015-08-31 19:53:11 +08:00
|
|
|
|
|
|
|
cmd = megasas_get_cmd(instance);
|
|
|
|
if (!cmd) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Could not get mfi cmd. Fail from %s %d\n",
|
|
|
|
__func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
dcmd = &cmd->frame->dcmd;
|
|
|
|
|
|
|
|
memset(pd_sync, 0, pd_seq_map_sz);
|
|
|
|
memset(dcmd->mbox.b, 0, MFI_MBOX_SIZE);
|
2017-10-19 17:49:05 +08:00
|
|
|
|
|
|
|
if (pend) {
|
|
|
|
dcmd->mbox.b[0] = MEGASAS_DCMD_MBOX_PEND_FLAG;
|
|
|
|
dcmd->flags = MFI_FRAME_DIR_WRITE;
|
|
|
|
instance->jbod_seq_cmd = cmd;
|
|
|
|
} else {
|
|
|
|
dcmd->flags = MFI_FRAME_DIR_READ;
|
|
|
|
}
|
|
|
|
|
2015-08-31 19:53:11 +08:00
|
|
|
dcmd->cmd = MFI_CMD_DCMD;
|
|
|
|
dcmd->cmd_status = 0xFF;
|
|
|
|
dcmd->sge_count = 1;
|
|
|
|
dcmd->timeout = 0;
|
|
|
|
dcmd->pad_0 = 0;
|
|
|
|
dcmd->data_xfer_len = cpu_to_le32(pd_seq_map_sz);
|
|
|
|
dcmd->opcode = cpu_to_le32(MR_DCMD_SYSTEM_PD_MAP_GET_INFO);
|
2017-10-19 17:49:05 +08:00
|
|
|
|
|
|
|
megasas_set_dma_settings(instance, dcmd, pd_seq_h, pd_seq_map_sz);
|
2015-08-31 19:53:11 +08:00
|
|
|
|
|
|
|
if (pend) {
|
|
|
|
instance->instancet->issue_dcmd(instance, cmd);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Below code is only for non pended DCMD */
|
2017-10-19 17:48:51 +08:00
|
|
|
if (!instance->mask_interrupts)
|
2016-01-28 23:34:23 +08:00
|
|
|
ret = megasas_issue_blocked_cmd(instance, cmd,
|
|
|
|
MFI_IO_TIMEOUT_SECS);
|
2015-08-31 19:53:11 +08:00
|
|
|
else
|
|
|
|
ret = megasas_issue_polled(instance, cmd);
|
|
|
|
|
|
|
|
if (le32_to_cpu(pd_sync->count) > MAX_PHYSICAL_DEVICES) {
|
|
|
|
dev_warn(&instance->pdev->dev,
|
|
|
|
"driver supports max %d JBOD, but FW reports %d\n",
|
|
|
|
MAX_PHYSICAL_DEVICES, le32_to_cpu(pd_sync->count));
|
|
|
|
ret = -EINVAL;
|
|
|
|
}
|
|
|
|
|
2017-10-19 17:48:51 +08:00
|
|
|
if (ret == DCMD_TIMEOUT)
|
2020-01-14 19:21:16 +08:00
|
|
|
dev_warn(&instance->pdev->dev,
|
|
|
|
"%s DCMD timed out, continue without JBOD sequence map\n",
|
|
|
|
__func__);
|
2016-01-28 23:34:23 +08:00
|
|
|
|
|
|
|
if (ret == DCMD_SUCCESS)
|
2015-08-31 19:53:11 +08:00
|
|
|
instance->pd_seq_map_id++;
|
|
|
|
|
|
|
|
megasas_return_cmd(instance, cmd);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/*
|
|
|
|
* megasas_get_ld_map_info - Returns FW's ld_map structure
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @pend: Pend the command or not
|
|
|
|
* Issues an internal command (DCMD) to get the FW's controller PD
|
|
|
|
* list structure. This information is mainly used to find out SYSTEM
|
|
|
|
* supported by the FW.
|
2014-09-12 21:27:33 +08:00
|
|
|
* dcmd.mbox value setting for MR_DCMD_LD_MAP_GET_INFO
|
|
|
|
* dcmd.mbox.b[0] - number of LDs being sync'd
|
|
|
|
* dcmd.mbox.b[1] - 0 - complete command immediately.
|
|
|
|
* - 1 - pend till config change
|
|
|
|
* dcmd.mbox.b[2] - 0 - supports max 64 lds and uses legacy MR_FW_RAID_MAP
|
|
|
|
* - 1 - supports max MAX_LOGICAL_DRIVES_EXT lds and
|
|
|
|
* uses extended struct MR_FW_RAID_MAP_EXT
|
2010-12-22 05:34:31 +08:00
|
|
|
*/
|
|
|
|
static int
|
|
|
|
megasas_get_ld_map_info(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
int ret = 0;
|
|
|
|
struct megasas_cmd *cmd;
|
|
|
|
struct megasas_dcmd_frame *dcmd;
|
2014-09-12 21:27:33 +08:00
|
|
|
void *ci;
|
2010-12-22 05:34:31 +08:00
|
|
|
dma_addr_t ci_h = 0;
|
|
|
|
u32 size_map_info;
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
|
|
|
|
cmd = megasas_get_cmd(instance);
|
|
|
|
|
|
|
|
if (!cmd) {
|
2015-07-08 04:52:34 +08:00
|
|
|
dev_printk(KERN_DEBUG, &instance->pdev->dev, "Failed to get cmd for map info\n");
|
2010-12-22 05:34:31 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
|
|
|
if (!fusion) {
|
|
|
|
megasas_return_cmd(instance, cmd);
|
2014-01-16 18:25:35 +08:00
|
|
|
return -ENXIO;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
dcmd = &cmd->frame->dcmd;
|
|
|
|
|
2014-09-12 21:27:33 +08:00
|
|
|
size_map_info = fusion->current_map_sz;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2014-09-12 21:27:33 +08:00
|
|
|
ci = (void *) fusion->ld_map[(instance->map_id & 1)];
|
2010-12-22 05:34:31 +08:00
|
|
|
ci_h = fusion->ld_map_phys[(instance->map_id & 1)];
|
|
|
|
|
|
|
|
if (!ci) {
|
2015-07-08 04:52:34 +08:00
|
|
|
dev_printk(KERN_DEBUG, &instance->pdev->dev, "Failed to alloc mem for ld_map_info\n");
|
2010-12-22 05:34:31 +08:00
|
|
|
megasas_return_cmd(instance, cmd);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
2014-09-12 21:27:33 +08:00
|
|
|
memset(ci, 0, fusion->max_map_sz);
|
2010-12-22 05:34:31 +08:00
|
|
|
memset(dcmd->mbox.b, 0, MFI_MBOX_SIZE);
|
|
|
|
dcmd->cmd = MFI_CMD_DCMD;
|
|
|
|
dcmd->cmd_status = 0xFF;
|
|
|
|
dcmd->sge_count = 1;
|
2017-10-19 17:49:05 +08:00
|
|
|
dcmd->flags = MFI_FRAME_DIR_READ;
|
2010-12-22 05:34:31 +08:00
|
|
|
dcmd->timeout = 0;
|
|
|
|
dcmd->pad_0 = 0;
|
2013-09-06 18:20:52 +08:00
|
|
|
dcmd->data_xfer_len = cpu_to_le32(size_map_info);
|
|
|
|
dcmd->opcode = cpu_to_le32(MR_DCMD_LD_MAP_GET_INFO);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2017-10-19 17:49:05 +08:00
|
|
|
megasas_set_dma_settings(instance, dcmd, ci_h, size_map_info);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2017-10-19 17:48:51 +08:00
|
|
|
if (!instance->mask_interrupts)
|
2014-09-12 21:27:58 +08:00
|
|
|
ret = megasas_issue_blocked_cmd(instance, cmd,
|
2016-01-28 23:34:23 +08:00
|
|
|
MFI_IO_TIMEOUT_SECS);
|
2014-09-12 21:27:58 +08:00
|
|
|
else
|
|
|
|
ret = megasas_issue_polled(instance, cmd);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2017-10-19 17:48:51 +08:00
|
|
|
if (ret == DCMD_TIMEOUT)
|
2020-01-14 19:21:16 +08:00
|
|
|
dev_warn(&instance->pdev->dev,
|
|
|
|
"%s DCMD timed out, RAID map is disabled\n",
|
|
|
|
__func__);
|
2016-01-28 23:34:23 +08:00
|
|
|
|
2015-04-23 19:01:24 +08:00
|
|
|
megasas_return_cmd(instance, cmd);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
u8
|
|
|
|
megasas_get_map_info(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
struct fusion_context *fusion = instance->ctrl_context;
|
|
|
|
|
|
|
|
fusion->fast_path_io = 0;
|
|
|
|
if (!megasas_get_ld_map_info(instance)) {
|
2018-01-05 21:27:44 +08:00
|
|
|
if (MR_ValidateMapInfo(instance, instance->map_id)) {
|
2010-12-22 05:34:31 +08:00
|
|
|
fusion->fast_path_io = 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* megasas_sync_map_info - Returns FW's ld_map structure
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
*
|
|
|
|
* Issues an internal command (DCMD) to get the FW's controller PD
|
|
|
|
* list structure. This information is mainly used to find out SYSTEM
|
|
|
|
* supported by the FW.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
megasas_sync_map_info(struct megasas_instance *instance)
|
|
|
|
{
|
2017-02-10 16:59:09 +08:00
|
|
|
int i;
|
2010-12-22 05:34:31 +08:00
|
|
|
struct megasas_cmd *cmd;
|
|
|
|
struct megasas_dcmd_frame *dcmd;
|
2017-02-10 16:59:19 +08:00
|
|
|
u16 num_lds;
|
2010-12-22 05:34:31 +08:00
|
|
|
struct fusion_context *fusion;
|
|
|
|
struct MR_LD_TARGET_SYNC *ci = NULL;
|
2014-09-12 21:27:33 +08:00
|
|
|
struct MR_DRV_RAID_MAP_ALL *map;
|
2010-12-22 05:34:31 +08:00
|
|
|
struct MR_LD_RAID *raid;
|
|
|
|
struct MR_LD_TARGET_SYNC *ld_sync;
|
|
|
|
dma_addr_t ci_h = 0;
|
|
|
|
u32 size_map_info;
|
|
|
|
|
|
|
|
cmd = megasas_get_cmd(instance);
|
|
|
|
|
|
|
|
if (!cmd) {
|
2015-07-08 04:52:34 +08:00
|
|
|
dev_printk(KERN_DEBUG, &instance->pdev->dev, "Failed to get cmd for sync info\n");
|
2010-12-22 05:34:31 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
|
|
|
if (!fusion) {
|
|
|
|
megasas_return_cmd(instance, cmd);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2014-09-12 21:27:33 +08:00
|
|
|
map = fusion->ld_drv_map[instance->map_id & 1];
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2014-11-17 17:54:28 +08:00
|
|
|
num_lds = le16_to_cpu(map->raidMap.ldCount);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
dcmd = &cmd->frame->dcmd;
|
|
|
|
|
|
|
|
memset(dcmd->mbox.b, 0, MFI_MBOX_SIZE);
|
|
|
|
|
|
|
|
ci = (struct MR_LD_TARGET_SYNC *)
|
|
|
|
fusion->ld_map[(instance->map_id - 1) & 1];
|
2014-09-12 21:27:33 +08:00
|
|
|
memset(ci, 0, fusion->max_map_sz);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
ci_h = fusion->ld_map_phys[(instance->map_id - 1) & 1];
|
|
|
|
|
|
|
|
ld_sync = (struct MR_LD_TARGET_SYNC *)ci;
|
|
|
|
|
|
|
|
for (i = 0; i < num_lds; i++, ld_sync++) {
|
|
|
|
raid = MR_LdRaidGet(i, map);
|
|
|
|
ld_sync->targetId = MR_GetLDTgtId(i, map);
|
|
|
|
ld_sync->seqNum = raid->seqNum;
|
|
|
|
}
|
|
|
|
|
2014-09-12 21:27:33 +08:00
|
|
|
size_map_info = fusion->current_map_sz;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
dcmd->cmd = MFI_CMD_DCMD;
|
|
|
|
dcmd->cmd_status = 0xFF;
|
|
|
|
dcmd->sge_count = 1;
|
2017-10-19 17:49:05 +08:00
|
|
|
dcmd->flags = MFI_FRAME_DIR_WRITE;
|
2010-12-22 05:34:31 +08:00
|
|
|
dcmd->timeout = 0;
|
|
|
|
dcmd->pad_0 = 0;
|
2013-09-06 18:20:52 +08:00
|
|
|
dcmd->data_xfer_len = cpu_to_le32(size_map_info);
|
2010-12-22 05:34:31 +08:00
|
|
|
dcmd->mbox.b[0] = num_lds;
|
|
|
|
dcmd->mbox.b[1] = MEGASAS_DCMD_MBOX_PEND_FLAG;
|
2013-09-06 18:20:52 +08:00
|
|
|
dcmd->opcode = cpu_to_le32(MR_DCMD_LD_MAP_GET_INFO);
|
2017-10-19 17:49:05 +08:00
|
|
|
|
|
|
|
megasas_set_dma_settings(instance, dcmd, ci_h, size_map_info);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
instance->map_update_cmd = cmd;
|
|
|
|
|
|
|
|
instance->instancet->issue_dcmd(instance, cmd);
|
|
|
|
|
2017-02-10 16:59:09 +08:00
|
|
|
return 0;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
2013-05-22 15:02:43 +08:00
|
|
|
/*
|
|
|
|
* meagasas_display_intel_branding - Display branding string
|
|
|
|
* @instance: per adapter object
|
|
|
|
*
|
|
|
|
* Return nothing.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
megasas_display_intel_branding(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
if (instance->pdev->subsystem_vendor != PCI_VENDOR_ID_INTEL)
|
|
|
|
return;
|
|
|
|
|
|
|
|
switch (instance->pdev->device) {
|
|
|
|
case PCI_DEVICE_ID_LSI_INVADER:
|
|
|
|
switch (instance->pdev->subsystem_device) {
|
|
|
|
case MEGARAID_INTEL_RS3DC080_SSDID:
|
|
|
|
dev_info(&instance->pdev->dev, "scsi host %d: %s\n",
|
|
|
|
instance->host->host_no,
|
|
|
|
MEGARAID_INTEL_RS3DC080_BRANDING);
|
|
|
|
break;
|
|
|
|
case MEGARAID_INTEL_RS3DC040_SSDID:
|
|
|
|
dev_info(&instance->pdev->dev, "scsi host %d: %s\n",
|
|
|
|
instance->host->host_no,
|
|
|
|
MEGARAID_INTEL_RS3DC040_BRANDING);
|
|
|
|
break;
|
|
|
|
case MEGARAID_INTEL_RS3SC008_SSDID:
|
|
|
|
dev_info(&instance->pdev->dev, "scsi host %d: %s\n",
|
|
|
|
instance->host->host_no,
|
|
|
|
MEGARAID_INTEL_RS3SC008_BRANDING);
|
|
|
|
break;
|
|
|
|
case MEGARAID_INTEL_RS3MC044_SSDID:
|
|
|
|
dev_info(&instance->pdev->dev, "scsi host %d: %s\n",
|
|
|
|
instance->host->host_no,
|
|
|
|
MEGARAID_INTEL_RS3MC044_BRANDING);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case PCI_DEVICE_ID_LSI_FURY:
|
|
|
|
switch (instance->pdev->subsystem_device) {
|
|
|
|
case MEGARAID_INTEL_RS3WC080_SSDID:
|
|
|
|
dev_info(&instance->pdev->dev, "scsi host %d: %s\n",
|
|
|
|
instance->host->host_no,
|
|
|
|
MEGARAID_INTEL_RS3WC080_BRANDING);
|
|
|
|
break;
|
|
|
|
case MEGARAID_INTEL_RS3WC040_SSDID:
|
|
|
|
dev_info(&instance->pdev->dev, "scsi host %d: %s\n",
|
|
|
|
instance->host->host_no,
|
|
|
|
MEGARAID_INTEL_RS3WC040_BRANDING);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
break;
|
2015-10-15 16:09:54 +08:00
|
|
|
case PCI_DEVICE_ID_LSI_CUTLASS_52:
|
|
|
|
case PCI_DEVICE_ID_LSI_CUTLASS_53:
|
|
|
|
switch (instance->pdev->subsystem_device) {
|
|
|
|
case MEGARAID_INTEL_RMS3BC160_SSDID:
|
|
|
|
dev_info(&instance->pdev->dev, "scsi host %d: %s\n",
|
|
|
|
instance->host->host_no,
|
|
|
|
MEGARAID_INTEL_RMS3BC160_BRANDING);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
break;
|
2013-05-22 15:02:43 +08:00
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-08-23 19:47:03 +08:00
|
|
|
/**
|
|
|
|
* megasas_allocate_raid_maps - Allocate memory for RAID maps
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
*
|
|
|
|
* return: if success: return 0
|
|
|
|
* failed: return -ENOMEM
|
|
|
|
*/
|
|
|
|
static inline int megasas_allocate_raid_maps(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
int i = 0;
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
|
|
|
fusion->drv_map_pages = get_order(fusion->drv_map_sz);
|
|
|
|
|
|
|
|
for (i = 0; i < 2; i++) {
|
|
|
|
fusion->ld_map[i] = NULL;
|
|
|
|
|
|
|
|
fusion->ld_drv_map[i] = (void *)
|
|
|
|
__get_free_pages(__GFP_ZERO | GFP_KERNEL,
|
|
|
|
fusion->drv_map_pages);
|
|
|
|
|
|
|
|
if (!fusion->ld_drv_map[i]) {
|
|
|
|
fusion->ld_drv_map[i] = vzalloc(fusion->drv_map_sz);
|
|
|
|
|
|
|
|
if (!fusion->ld_drv_map[i]) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Could not allocate memory for local map"
|
|
|
|
" size requested: %d\n",
|
|
|
|
fusion->drv_map_sz);
|
|
|
|
goto ld_drv_map_alloc_fail;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0; i < 2; i++) {
|
|
|
|
fusion->ld_map[i] = dma_alloc_coherent(&instance->pdev->dev,
|
|
|
|
fusion->max_map_sz,
|
|
|
|
&fusion->ld_map_phys[i],
|
|
|
|
GFP_KERNEL);
|
|
|
|
if (!fusion->ld_map[i]) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Could not allocate memory for map info %s:%d\n",
|
|
|
|
__func__, __LINE__);
|
|
|
|
goto ld_map_alloc_fail;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
ld_map_alloc_fail:
|
|
|
|
for (i = 0; i < 2; i++) {
|
|
|
|
if (fusion->ld_map[i])
|
|
|
|
dma_free_coherent(&instance->pdev->dev,
|
|
|
|
fusion->max_map_sz,
|
|
|
|
fusion->ld_map[i],
|
|
|
|
fusion->ld_map_phys[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
ld_drv_map_alloc_fail:
|
|
|
|
for (i = 0; i < 2; i++) {
|
|
|
|
if (fusion->ld_drv_map[i]) {
|
|
|
|
if (is_vmalloc_addr(fusion->ld_drv_map[i]))
|
|
|
|
vfree(fusion->ld_drv_map[i]);
|
|
|
|
else
|
|
|
|
free_pages((ulong)fusion->ld_drv_map[i],
|
|
|
|
fusion->drv_map_pages);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
2017-10-19 17:49:03 +08:00
|
|
|
/**
|
|
|
|
* megasas_configure_queue_sizes - Calculate size of request desc queue,
|
|
|
|
* reply desc queue,
|
|
|
|
* IO request frame queue, set can_queue.
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @return: void
|
|
|
|
*/
|
|
|
|
static inline
|
|
|
|
void megasas_configure_queue_sizes(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
u16 max_cmd;
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
max_cmd = instance->max_fw_cmds;
|
|
|
|
|
2018-12-17 16:47:37 +08:00
|
|
|
if (instance->adapter_type >= VENTURA_SERIES)
|
2017-10-19 17:49:03 +08:00
|
|
|
instance->max_mpt_cmds = instance->max_fw_cmds * RAID_1_PEER_CMDS;
|
|
|
|
else
|
|
|
|
instance->max_mpt_cmds = instance->max_fw_cmds;
|
|
|
|
|
2018-10-17 14:37:52 +08:00
|
|
|
instance->max_scsi_cmds = instance->max_fw_cmds - instance->max_mfi_cmds;
|
2017-10-19 17:49:03 +08:00
|
|
|
instance->cur_can_queue = instance->max_scsi_cmds;
|
|
|
|
instance->host->can_queue = instance->cur_can_queue;
|
|
|
|
|
|
|
|
fusion->reply_q_depth = 2 * ((max_cmd + 1 + 15) / 16) * 16;
|
|
|
|
|
|
|
|
fusion->request_alloc_sz = sizeof(union MEGASAS_REQUEST_DESCRIPTOR_UNION) *
|
|
|
|
instance->max_mpt_cmds;
|
|
|
|
fusion->reply_alloc_sz = sizeof(union MPI2_REPLY_DESCRIPTORS_UNION) *
|
|
|
|
(fusion->reply_q_depth);
|
|
|
|
fusion->io_frames_alloc_sz = MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE +
|
|
|
|
(MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE
|
|
|
|
* (instance->max_mpt_cmds + 1)); /* Extra 1 for SMID 0 */
|
|
|
|
}
|
|
|
|
|
2017-10-19 17:49:01 +08:00
|
|
|
static int megasas_alloc_ioc_init_frame(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
struct megasas_cmd *cmd;
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
2018-01-05 21:27:36 +08:00
|
|
|
cmd = kzalloc(sizeof(struct megasas_cmd), GFP_KERNEL);
|
2017-10-19 17:49:01 +08:00
|
|
|
|
|
|
|
if (!cmd) {
|
|
|
|
dev_err(&instance->pdev->dev, "Failed from func: %s line: %d\n",
|
|
|
|
__func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
cmd->frame = dma_alloc_coherent(&instance->pdev->dev,
|
|
|
|
IOC_INIT_FRAME_SIZE,
|
|
|
|
&cmd->frame_phys_addr, GFP_KERNEL);
|
|
|
|
|
|
|
|
if (!cmd->frame) {
|
|
|
|
dev_err(&instance->pdev->dev, "Failed from func: %s line: %d\n",
|
|
|
|
__func__, __LINE__);
|
|
|
|
kfree(cmd);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
fusion->ioc_init_cmd = cmd;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_free_ioc_init_cmd - Free IOC INIT command frame
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
*/
|
|
|
|
static inline void megasas_free_ioc_init_cmd(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
|
|
|
if (fusion->ioc_init_cmd && fusion->ioc_init_cmd->frame)
|
|
|
|
dma_free_coherent(&instance->pdev->dev,
|
|
|
|
IOC_INIT_FRAME_SIZE,
|
|
|
|
fusion->ioc_init_cmd->frame,
|
|
|
|
fusion->ioc_init_cmd->frame_phys_addr);
|
|
|
|
|
2018-12-03 04:52:11 +08:00
|
|
|
kfree(fusion->ioc_init_cmd);
|
2017-10-19 17:49:01 +08:00
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/**
|
|
|
|
* megasas_init_adapter_fusion - Initializes the FW
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
*
|
|
|
|
* This is the main function for initializing firmware.
|
|
|
|
*/
|
2019-07-23 22:34:50 +08:00
|
|
|
static u32
|
2010-12-22 05:34:31 +08:00
|
|
|
megasas_init_adapter_fusion(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
struct fusion_context *fusion;
|
2018-10-17 14:37:51 +08:00
|
|
|
u32 scratch_pad_1;
|
2011-10-09 09:15:13 +08:00
|
|
|
int i = 0, count;
|
2019-06-25 19:04:24 +08:00
|
|
|
u32 status_reg;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
2016-01-28 23:34:30 +08:00
|
|
|
megasas_fusion_update_can_queue(instance, PROBE_CONTEXT);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2015-01-05 22:36:23 +08:00
|
|
|
/*
|
|
|
|
* Only Driver's internal DCMDs and IOCTL DCMDs needs to have MFI frames
|
|
|
|
*/
|
|
|
|
instance->max_mfi_cmds =
|
|
|
|
MEGASAS_FUSION_INTERNAL_CMDS + MEGASAS_FUSION_IOCTL_CMDS;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2017-10-19 17:49:03 +08:00
|
|
|
megasas_configure_queue_sizes(instance);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2018-12-17 16:47:40 +08:00
|
|
|
scratch_pad_1 = megasas_readl(instance,
|
|
|
|
&instance->reg_set->outbound_scratch_pad_1);
|
2018-10-17 14:37:51 +08:00
|
|
|
/* If scratch_pad_1 & MEGASAS_MAX_CHAIN_SIZE_UNITS_MASK is set,
|
2015-08-31 19:53:31 +08:00
|
|
|
* Firmware support extended IO chain frame which is 4 times more than
|
|
|
|
* legacy Firmware.
|
|
|
|
* Legacy Firmware - Frame size is (8 * 128) = 1K
|
|
|
|
* 1M IO Firmware - Frame size is (8 * 128 * 4) = 4K
|
|
|
|
*/
|
2018-10-17 14:37:51 +08:00
|
|
|
if (scratch_pad_1 & MEGASAS_MAX_CHAIN_SIZE_UNITS_MASK)
|
2015-08-31 19:53:31 +08:00
|
|
|
instance->max_chain_frame_sz =
|
2018-10-17 14:37:51 +08:00
|
|
|
((scratch_pad_1 & MEGASAS_MAX_CHAIN_SIZE_MASK) >>
|
2015-08-31 19:53:31 +08:00
|
|
|
MEGASAS_MAX_CHAIN_SHIFT) * MEGASAS_1MB_IO;
|
|
|
|
else
|
|
|
|
instance->max_chain_frame_sz =
|
2018-10-17 14:37:51 +08:00
|
|
|
((scratch_pad_1 & MEGASAS_MAX_CHAIN_SIZE_MASK) >>
|
2015-08-31 19:53:31 +08:00
|
|
|
MEGASAS_MAX_CHAIN_SHIFT) * MEGASAS_256K_IO;
|
|
|
|
|
|
|
|
if (instance->max_chain_frame_sz < MEGASAS_CHAIN_FRAME_SZ_MIN) {
|
|
|
|
dev_warn(&instance->pdev->dev, "frame size %d invalid, fall back to legacy max frame size %d\n",
|
|
|
|
instance->max_chain_frame_sz,
|
|
|
|
MEGASAS_CHAIN_FRAME_SZ_MIN);
|
|
|
|
instance->max_chain_frame_sz = MEGASAS_CHAIN_FRAME_SZ_MIN;
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
fusion->max_sge_in_main_msg =
|
2015-08-31 19:53:31 +08:00
|
|
|
(MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE
|
|
|
|
- offsetof(struct MPI2_RAID_SCSI_IO_REQUEST, SGL))/16;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
fusion->max_sge_in_chain =
|
2015-08-31 19:53:31 +08:00
|
|
|
instance->max_chain_frame_sz
|
|
|
|
/ sizeof(union MPI2_SGE_IO_UNION);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2015-08-31 19:53:31 +08:00
|
|
|
instance->max_num_sge =
|
|
|
|
rounddown_pow_of_two(fusion->max_sge_in_main_msg
|
|
|
|
+ fusion->max_sge_in_chain - 2);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
/* Used for pass thru MFI frame (DCMD) */
|
|
|
|
fusion->chain_offset_mfi_pthru =
|
|
|
|
offsetof(struct MPI2_RAID_SCSI_IO_REQUEST, SGL)/16;
|
|
|
|
|
|
|
|
fusion->chain_offset_io_request =
|
|
|
|
(MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE -
|
|
|
|
sizeof(union MPI2_SGE_IO_UNION))/16;
|
|
|
|
|
2011-10-09 09:15:13 +08:00
|
|
|
count = instance->msix_vectors > 0 ? instance->msix_vectors : 1;
|
|
|
|
for (i = 0 ; i < count; i++)
|
|
|
|
fusion->last_reply_idx[i] = 0;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2015-04-23 19:00:54 +08:00
|
|
|
/*
|
2017-02-10 16:59:35 +08:00
|
|
|
* For fusion adapters, 3 commands for IOCTL and 8 commands
|
2015-04-23 19:00:54 +08:00
|
|
|
* for driver's internal DCMDs.
|
|
|
|
*/
|
|
|
|
instance->max_scsi_cmds = instance->max_fw_cmds -
|
|
|
|
(MEGASAS_FUSION_INTERNAL_CMDS +
|
|
|
|
MEGASAS_FUSION_IOCTL_CMDS);
|
|
|
|
sema_init(&instance->ioctl_sem, MEGASAS_FUSION_IOCTL_CMDS);
|
|
|
|
|
2017-10-19 17:49:01 +08:00
|
|
|
if (megasas_alloc_ioc_init_frame(instance))
|
|
|
|
return 1;
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/*
|
|
|
|
* Allocate memory for descriptors
|
|
|
|
* Create a pool of commands
|
|
|
|
*/
|
|
|
|
if (megasas_alloc_cmds(instance))
|
|
|
|
goto fail_alloc_mfi_cmds;
|
|
|
|
if (megasas_alloc_cmds_fusion(instance))
|
|
|
|
goto fail_alloc_cmds;
|
|
|
|
|
2019-06-25 19:04:24 +08:00
|
|
|
if (megasas_ioc_init_fusion(instance)) {
|
|
|
|
status_reg = instance->instancet->read_fw_status_reg(instance);
|
|
|
|
if (((status_reg & MFI_STATE_MASK) == MFI_STATE_FAULT) &&
|
|
|
|
(status_reg & MFI_RESET_ADAPTER)) {
|
|
|
|
/* Do a chip reset and then retry IOC INIT once */
|
|
|
|
if (megasas_adp_reset_wait_for_ready
|
|
|
|
(instance, true, 0) == FAILED)
|
|
|
|
goto fail_ioc_init;
|
|
|
|
|
|
|
|
if (megasas_ioc_init_fusion(instance))
|
|
|
|
goto fail_ioc_init;
|
|
|
|
} else {
|
|
|
|
goto fail_ioc_init;
|
|
|
|
}
|
|
|
|
}
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2013-05-22 15:02:43 +08:00
|
|
|
megasas_display_intel_branding(instance);
|
2014-11-17 17:54:13 +08:00
|
|
|
if (megasas_get_ctrl_info(instance)) {
|
2014-09-12 21:27:33 +08:00
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Could not get controller info. Fail from %s %d\n",
|
|
|
|
__func__, __LINE__);
|
|
|
|
goto fail_ioc_init;
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
instance->flag_ieee = 1;
|
2017-02-10 16:59:13 +08:00
|
|
|
instance->r1_ldio_hint_default = MR_R1_LDIO_PIGGYBACK_DEFAULT;
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
instance->threshold_reply_count = instance->max_fw_cmds / 4;
|
2014-09-12 21:27:33 +08:00
|
|
|
fusion->fast_path_io = 0;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2017-08-23 19:47:03 +08:00
|
|
|
if (megasas_allocate_raid_maps(instance))
|
|
|
|
goto fail_ioc_init;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
if (!megasas_get_map_info(instance))
|
|
|
|
megasas_sync_map_info(instance);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
fail_ioc_init:
|
2011-02-25 12:55:56 +08:00
|
|
|
megasas_free_cmds_fusion(instance);
|
|
|
|
fail_alloc_cmds:
|
|
|
|
megasas_free_cmds(instance);
|
|
|
|
fail_alloc_mfi_cmds:
|
2017-10-19 17:49:01 +08:00
|
|
|
megasas_free_ioc_init_cmd(instance);
|
2010-12-22 05:34:31 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2018-10-17 14:37:39 +08:00
|
|
|
/**
|
|
|
|
* megasas_fault_detect_work - Worker function of
|
|
|
|
* FW fault handling workqueue.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
megasas_fault_detect_work(struct work_struct *work)
|
|
|
|
{
|
|
|
|
struct megasas_instance *instance =
|
|
|
|
container_of(work, struct megasas_instance,
|
|
|
|
fw_fault_work.work);
|
|
|
|
u32 fw_state, dma_state, status;
|
|
|
|
|
|
|
|
/* Check the fw state */
|
2018-12-17 16:47:39 +08:00
|
|
|
fw_state = instance->instancet->read_fw_status_reg(instance) &
|
2018-10-17 14:37:39 +08:00
|
|
|
MFI_STATE_MASK;
|
|
|
|
|
|
|
|
if (fw_state == MFI_STATE_FAULT) {
|
2018-12-17 16:47:39 +08:00
|
|
|
dma_state = instance->instancet->read_fw_status_reg(instance) &
|
|
|
|
MFI_STATE_DMADONE;
|
2018-10-17 14:37:39 +08:00
|
|
|
/* Start collecting crash, if DMA bit is done */
|
|
|
|
if (instance->crash_dump_drv_support &&
|
|
|
|
instance->crash_dump_app_support && dma_state) {
|
|
|
|
megasas_fusion_crash_dump(instance);
|
|
|
|
} else {
|
|
|
|
if (instance->unload == 0) {
|
|
|
|
status = megasas_reset_fusion(instance->host, 0);
|
|
|
|
if (status != SUCCESS) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d, do not re-arm timer\n",
|
|
|
|
__func__, __LINE__);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (instance->fw_fault_work_q)
|
|
|
|
queue_delayed_work(instance->fw_fault_work_q,
|
|
|
|
&instance->fw_fault_work,
|
|
|
|
msecs_to_jiffies(MEGASAS_WATCHDOG_THREAD_INTERVAL));
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
megasas_fusion_start_watchdog(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
/* Check if the Fault WQ is already started */
|
|
|
|
if (instance->fw_fault_work_q)
|
|
|
|
return SUCCESS;
|
|
|
|
|
|
|
|
INIT_DELAYED_WORK(&instance->fw_fault_work, megasas_fault_detect_work);
|
|
|
|
|
|
|
|
snprintf(instance->fault_handler_work_q_name,
|
|
|
|
sizeof(instance->fault_handler_work_q_name),
|
|
|
|
"poll_megasas%d_status", instance->host->host_no);
|
|
|
|
|
|
|
|
instance->fw_fault_work_q =
|
|
|
|
create_singlethread_workqueue(instance->fault_handler_work_q_name);
|
|
|
|
if (!instance->fw_fault_work_q) {
|
|
|
|
dev_err(&instance->pdev->dev, "Failed from %s %d\n",
|
|
|
|
__func__, __LINE__);
|
|
|
|
return FAILED;
|
|
|
|
}
|
|
|
|
|
|
|
|
queue_delayed_work(instance->fw_fault_work_q,
|
|
|
|
&instance->fw_fault_work,
|
|
|
|
msecs_to_jiffies(MEGASAS_WATCHDOG_THREAD_INTERVAL));
|
|
|
|
|
|
|
|
return SUCCESS;
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
megasas_fusion_stop_watchdog(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
struct workqueue_struct *wq;
|
|
|
|
|
|
|
|
if (instance->fw_fault_work_q) {
|
|
|
|
wq = instance->fw_fault_work_q;
|
|
|
|
instance->fw_fault_work_q = NULL;
|
|
|
|
if (!cancel_delayed_work_sync(&instance->fw_fault_work))
|
|
|
|
flush_workqueue(wq);
|
|
|
|
destroy_workqueue(wq);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/**
|
|
|
|
* map_cmd_status - Maps FW cmd status to OS cmd status
|
|
|
|
* @cmd : Pointer to cmd
|
|
|
|
* @status : status of cmd returned by FW
|
|
|
|
* @ext_status : ext status of cmd returned by FW
|
|
|
|
*/
|
|
|
|
|
2019-07-23 22:34:50 +08:00
|
|
|
static void
|
2017-01-11 07:20:47 +08:00
|
|
|
map_cmd_status(struct fusion_context *fusion,
|
2017-02-10 16:59:37 +08:00
|
|
|
struct scsi_cmnd *scmd, u8 status, u8 ext_status,
|
|
|
|
u32 data_length, u8 *sense)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
2017-02-10 16:59:15 +08:00
|
|
|
u8 cmd_type;
|
2017-02-10 16:59:14 +08:00
|
|
|
int resid;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2017-02-10 16:59:15 +08:00
|
|
|
cmd_type = megasas_cmd_type(scmd);
|
2010-12-22 05:34:31 +08:00
|
|
|
switch (status) {
|
|
|
|
|
|
|
|
case MFI_STAT_OK:
|
2017-01-11 07:20:47 +08:00
|
|
|
scmd->result = DID_OK << 16;
|
2010-12-22 05:34:31 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
case MFI_STAT_SCSI_IO_FAILED:
|
|
|
|
case MFI_STAT_LD_INIT_IN_PROGRESS:
|
2017-01-11 07:20:47 +08:00
|
|
|
scmd->result = (DID_ERROR << 16) | ext_status;
|
2010-12-22 05:34:31 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
case MFI_STAT_SCSI_DONE_WITH_ERROR:
|
|
|
|
|
2017-01-11 07:20:47 +08:00
|
|
|
scmd->result = (DID_OK << 16) | ext_status;
|
2010-12-22 05:34:31 +08:00
|
|
|
if (ext_status == SAM_STAT_CHECK_CONDITION) {
|
2017-01-11 07:20:47 +08:00
|
|
|
memset(scmd->sense_buffer, 0,
|
2010-12-22 05:34:31 +08:00
|
|
|
SCSI_SENSE_BUFFERSIZE);
|
2017-01-11 07:20:47 +08:00
|
|
|
memcpy(scmd->sense_buffer, sense,
|
2010-12-22 05:34:31 +08:00
|
|
|
SCSI_SENSE_BUFFERSIZE);
|
2017-01-11 07:20:47 +08:00
|
|
|
scmd->result |= DRIVER_SENSE << 24;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
2017-02-10 16:59:14 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If the IO request is partially completed, then MR FW will
|
|
|
|
* update "io_request->DataLength" field with actual number of
|
|
|
|
* bytes transferred.Driver will set residual bytes count in
|
|
|
|
* SCSI command structure.
|
|
|
|
*/
|
|
|
|
resid = (scsi_bufflen(scmd) - data_length);
|
|
|
|
scsi_set_resid(scmd, resid);
|
2017-02-10 16:59:15 +08:00
|
|
|
|
|
|
|
if (resid &&
|
|
|
|
((cmd_type == READ_WRITE_LDIO) ||
|
|
|
|
(cmd_type == READ_WRITE_SYSPDIO)))
|
|
|
|
scmd_printk(KERN_INFO, scmd, "BRCM Debug mfi stat 0x%x, data len"
|
|
|
|
" requested/completed 0x%x/0x%x\n",
|
|
|
|
status, scsi_bufflen(scmd), data_length);
|
2010-12-22 05:34:31 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
case MFI_STAT_LD_OFFLINE:
|
|
|
|
case MFI_STAT_DEVICE_NOT_FOUND:
|
2017-01-11 07:20:47 +08:00
|
|
|
scmd->result = DID_BAD_TARGET << 16;
|
2010-12-22 05:34:31 +08:00
|
|
|
break;
|
2011-10-09 09:15:06 +08:00
|
|
|
case MFI_STAT_CONFIG_SEQ_MISMATCH:
|
2017-01-11 07:20:47 +08:00
|
|
|
scmd->result = DID_IMM_RETRY << 16;
|
2011-10-09 09:15:06 +08:00
|
|
|
break;
|
2010-12-22 05:34:31 +08:00
|
|
|
default:
|
2017-01-11 07:20:47 +08:00
|
|
|
scmd->result = DID_ERROR << 16;
|
2010-12-22 05:34:31 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-02-10 16:59:12 +08:00
|
|
|
/**
|
|
|
|
* megasas_is_prp_possible -
|
|
|
|
* Checks if native NVMe PRPs can be built for the IO
|
|
|
|
*
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @scmd: SCSI command from the mid-layer
|
|
|
|
* @sge_count: scatter gather element count.
|
|
|
|
*
|
|
|
|
* Returns: true: PRPs can be built
|
|
|
|
* false: IEEE SGLs needs to be built
|
|
|
|
*/
|
|
|
|
static bool
|
|
|
|
megasas_is_prp_possible(struct megasas_instance *instance,
|
|
|
|
struct scsi_cmnd *scmd, int sge_count)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
u32 data_length = 0;
|
|
|
|
struct scatterlist *sg_scmd;
|
|
|
|
bool build_prp = false;
|
|
|
|
u32 mr_nvme_pg_size;
|
|
|
|
|
|
|
|
mr_nvme_pg_size = max_t(u32, instance->nvme_page_size,
|
|
|
|
MR_DEFAULT_NVME_PAGE_SIZE);
|
|
|
|
data_length = scsi_bufflen(scmd);
|
|
|
|
sg_scmd = scsi_sglist(scmd);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* NVMe uses one PRP for each page (or part of a page)
|
|
|
|
* look at the data length - if 4 pages or less then IEEE is OK
|
|
|
|
* if > 5 pages then we need to build a native SGL
|
|
|
|
* if > 4 and <= 5 pages, then check physical address of 1st SG entry
|
|
|
|
* if this first size in the page is >= the residual beyond 4 pages
|
|
|
|
* then use IEEE, otherwise use native SGL
|
|
|
|
*/
|
|
|
|
|
|
|
|
if (data_length > (mr_nvme_pg_size * 5)) {
|
|
|
|
build_prp = true;
|
|
|
|
} else if ((data_length > (mr_nvme_pg_size * 4)) &&
|
|
|
|
(data_length <= (mr_nvme_pg_size * 5))) {
|
|
|
|
/* check if 1st SG entry size is < residual beyond 4 pages */
|
|
|
|
if (sg_dma_len(sg_scmd) < (data_length - (mr_nvme_pg_size * 4)))
|
|
|
|
build_prp = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Below code detects gaps/holes in IO data buffers.
|
|
|
|
* What does holes/gaps mean?
|
|
|
|
* Any SGE except first one in a SGL starts at non NVME page size
|
|
|
|
* aligned address OR Any SGE except last one in a SGL ends at
|
|
|
|
* non NVME page size boundary.
|
|
|
|
*
|
|
|
|
* Driver has already informed block layer by setting boundary rules for
|
|
|
|
* bio merging done at NVME page size boundary calling kernel API
|
|
|
|
* blk_queue_virt_boundary inside slave_config.
|
|
|
|
* Still there is possibility of IO coming with holes to driver because of
|
|
|
|
* IO merging done by IO scheduler.
|
|
|
|
*
|
|
|
|
* With SCSI BLK MQ enabled, there will be no IO with holes as there is no
|
|
|
|
* IO scheduling so no IO merging.
|
|
|
|
*
|
|
|
|
* With SCSI BLK MQ disabled, IO scheduler may attempt to merge IOs and
|
|
|
|
* then sending IOs with holes.
|
|
|
|
*
|
|
|
|
* Though driver can request block layer to disable IO merging by calling-
|
2018-03-08 09:10:10 +08:00
|
|
|
* blk_queue_flag_set(QUEUE_FLAG_NOMERGES, sdev->request_queue) but
|
2017-02-10 16:59:12 +08:00
|
|
|
* user may tune sysfs parameter- nomerges again to 0 or 1.
|
|
|
|
*
|
|
|
|
* If in future IO scheduling is enabled with SCSI BLK MQ,
|
|
|
|
* this algorithm to detect holes will be required in driver
|
|
|
|
* for SCSI BLK MQ enabled case as well.
|
|
|
|
*
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
scsi_for_each_sg(scmd, sg_scmd, sge_count, i) {
|
|
|
|
if ((i != 0) && (i != (sge_count - 1))) {
|
|
|
|
if (mega_mod64(sg_dma_len(sg_scmd), mr_nvme_pg_size) ||
|
|
|
|
mega_mod64(sg_dma_address(sg_scmd),
|
|
|
|
mr_nvme_pg_size)) {
|
|
|
|
build_prp = false;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((sge_count > 1) && (i == 0)) {
|
|
|
|
if ((mega_mod64((sg_dma_address(sg_scmd) +
|
|
|
|
sg_dma_len(sg_scmd)),
|
|
|
|
mr_nvme_pg_size))) {
|
|
|
|
build_prp = false;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((sge_count > 1) && (i == (sge_count - 1))) {
|
|
|
|
if (mega_mod64(sg_dma_address(sg_scmd),
|
|
|
|
mr_nvme_pg_size)) {
|
|
|
|
build_prp = false;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return build_prp;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_make_prp_nvme -
|
|
|
|
* Prepare PRPs(Physical Region Page)- SGLs specific to NVMe drives only
|
|
|
|
*
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @scmd: SCSI command from the mid-layer
|
|
|
|
* @sgl_ptr: SGL to be filled in
|
|
|
|
* @cmd: Fusion command frame
|
|
|
|
* @sge_count: scatter gather element count.
|
|
|
|
*
|
|
|
|
* Returns: true: PRPs are built
|
|
|
|
* false: IEEE SGLs needs to be built
|
|
|
|
*/
|
|
|
|
static bool
|
|
|
|
megasas_make_prp_nvme(struct megasas_instance *instance, struct scsi_cmnd *scmd,
|
|
|
|
struct MPI25_IEEE_SGE_CHAIN64 *sgl_ptr,
|
|
|
|
struct megasas_cmd_fusion *cmd, int sge_count)
|
|
|
|
{
|
|
|
|
int sge_len, offset, num_prp_in_chain = 0;
|
|
|
|
struct MPI25_IEEE_SGE_CHAIN64 *main_chain_element, *ptr_first_sgl;
|
2017-02-15 05:21:51 +08:00
|
|
|
u64 *ptr_sgl;
|
|
|
|
dma_addr_t ptr_sgl_phys;
|
2017-02-10 16:59:12 +08:00
|
|
|
u64 sge_addr;
|
|
|
|
u32 page_mask, page_mask_result;
|
|
|
|
struct scatterlist *sg_scmd;
|
|
|
|
u32 first_prp_len;
|
|
|
|
bool build_prp = false;
|
|
|
|
int data_len = scsi_bufflen(scmd);
|
|
|
|
u32 mr_nvme_pg_size = max_t(u32, instance->nvme_page_size,
|
|
|
|
MR_DEFAULT_NVME_PAGE_SIZE);
|
|
|
|
|
|
|
|
build_prp = megasas_is_prp_possible(instance, scmd, sge_count);
|
|
|
|
|
|
|
|
if (!build_prp)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Nvme has a very convoluted prp format. One prp is required
|
|
|
|
* for each page or partial page. Driver need to split up OS sg_list
|
|
|
|
* entries if it is longer than one page or cross a page
|
|
|
|
* boundary. Driver also have to insert a PRP list pointer entry as
|
|
|
|
* the last entry in each physical page of the PRP list.
|
|
|
|
*
|
|
|
|
* NOTE: The first PRP "entry" is actually placed in the first
|
|
|
|
* SGL entry in the main message as IEEE 64 format. The 2nd
|
|
|
|
* entry in the main message is the chain element, and the rest
|
|
|
|
* of the PRP entries are built in the contiguous pcie buffer.
|
|
|
|
*/
|
|
|
|
page_mask = mr_nvme_pg_size - 1;
|
|
|
|
ptr_sgl = (u64 *)cmd->sg_frame;
|
2017-02-15 05:21:51 +08:00
|
|
|
ptr_sgl_phys = cmd->sg_frame_phys_addr;
|
2017-02-10 16:59:12 +08:00
|
|
|
memset(ptr_sgl, 0, instance->max_chain_frame_sz);
|
|
|
|
|
|
|
|
/* Build chain frame element which holds all prps except first*/
|
|
|
|
main_chain_element = (struct MPI25_IEEE_SGE_CHAIN64 *)
|
|
|
|
((u8 *)sgl_ptr + sizeof(struct MPI25_IEEE_SGE_CHAIN64));
|
|
|
|
|
2017-02-15 05:21:51 +08:00
|
|
|
main_chain_element->Address = cpu_to_le64(ptr_sgl_phys);
|
2017-02-10 16:59:12 +08:00
|
|
|
main_chain_element->NextChainOffset = 0;
|
|
|
|
main_chain_element->Flags = IEEE_SGE_FLAGS_CHAIN_ELEMENT |
|
|
|
|
IEEE_SGE_FLAGS_SYSTEM_ADDR |
|
|
|
|
MPI26_IEEE_SGE_FLAGS_NSF_NVME_PRP;
|
|
|
|
|
|
|
|
/* Build first prp, sge need not to be page aligned*/
|
|
|
|
ptr_first_sgl = sgl_ptr;
|
|
|
|
sg_scmd = scsi_sglist(scmd);
|
|
|
|
sge_addr = sg_dma_address(sg_scmd);
|
|
|
|
sge_len = sg_dma_len(sg_scmd);
|
|
|
|
|
|
|
|
offset = (u32)(sge_addr & page_mask);
|
|
|
|
first_prp_len = mr_nvme_pg_size - offset;
|
|
|
|
|
|
|
|
ptr_first_sgl->Address = cpu_to_le64(sge_addr);
|
|
|
|
ptr_first_sgl->Length = cpu_to_le32(first_prp_len);
|
|
|
|
|
|
|
|
data_len -= first_prp_len;
|
|
|
|
|
|
|
|
if (sge_len > first_prp_len) {
|
|
|
|
sge_addr += first_prp_len;
|
|
|
|
sge_len -= first_prp_len;
|
|
|
|
} else if (sge_len == first_prp_len) {
|
|
|
|
sg_scmd = sg_next(sg_scmd);
|
|
|
|
sge_addr = sg_dma_address(sg_scmd);
|
|
|
|
sge_len = sg_dma_len(sg_scmd);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (;;) {
|
|
|
|
offset = (u32)(sge_addr & page_mask);
|
|
|
|
|
|
|
|
/* Put PRP pointer due to page boundary*/
|
|
|
|
page_mask_result = (uintptr_t)(ptr_sgl + 1) & page_mask;
|
|
|
|
if (unlikely(!page_mask_result)) {
|
|
|
|
scmd_printk(KERN_NOTICE,
|
|
|
|
scmd, "page boundary ptr_sgl: 0x%p\n",
|
|
|
|
ptr_sgl);
|
2017-02-15 05:21:51 +08:00
|
|
|
ptr_sgl_phys += 8;
|
|
|
|
*ptr_sgl = cpu_to_le64(ptr_sgl_phys);
|
2017-02-10 16:59:12 +08:00
|
|
|
ptr_sgl++;
|
|
|
|
num_prp_in_chain++;
|
|
|
|
}
|
|
|
|
|
|
|
|
*ptr_sgl = cpu_to_le64(sge_addr);
|
|
|
|
ptr_sgl++;
|
2017-02-15 05:21:51 +08:00
|
|
|
ptr_sgl_phys += 8;
|
2017-02-10 16:59:12 +08:00
|
|
|
num_prp_in_chain++;
|
|
|
|
|
|
|
|
sge_addr += mr_nvme_pg_size;
|
|
|
|
sge_len -= mr_nvme_pg_size;
|
|
|
|
data_len -= mr_nvme_pg_size;
|
|
|
|
|
|
|
|
if (data_len <= 0)
|
|
|
|
break;
|
|
|
|
|
|
|
|
if (sge_len > 0)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
sg_scmd = sg_next(sg_scmd);
|
|
|
|
sge_addr = sg_dma_address(sg_scmd);
|
|
|
|
sge_len = sg_dma_len(sg_scmd);
|
|
|
|
}
|
|
|
|
|
|
|
|
main_chain_element->Length =
|
|
|
|
cpu_to_le32(num_prp_in_chain * sizeof(u64));
|
|
|
|
|
|
|
|
return build_prp;
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/**
|
|
|
|
* megasas_make_sgl_fusion - Prepares 32-bit SGL
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @scp: SCSI command from the mid-layer
|
|
|
|
* @sgl_ptr: SGL to be filled in
|
|
|
|
* @cmd: cmd we are working on
|
2017-02-10 16:59:12 +08:00
|
|
|
* @sge_count sge count
|
2010-12-22 05:34:31 +08:00
|
|
|
*
|
|
|
|
*/
|
2017-02-10 16:59:12 +08:00
|
|
|
static void
|
2010-12-22 05:34:31 +08:00
|
|
|
megasas_make_sgl_fusion(struct megasas_instance *instance,
|
|
|
|
struct scsi_cmnd *scp,
|
|
|
|
struct MPI25_IEEE_SGE_CHAIN64 *sgl_ptr,
|
2017-02-10 16:59:12 +08:00
|
|
|
struct megasas_cmd_fusion *cmd, int sge_count)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
2017-02-10 16:59:12 +08:00
|
|
|
int i, sg_processed;
|
2010-12-22 05:34:31 +08:00
|
|
|
struct scatterlist *os_sgl;
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
2017-10-19 17:48:48 +08:00
|
|
|
if (instance->adapter_type >= INVADER_SERIES) {
|
2011-10-09 09:15:06 +08:00
|
|
|
struct MPI25_IEEE_SGE_CHAIN64 *sgl_ptr_end = sgl_ptr;
|
|
|
|
sgl_ptr_end += fusion->max_sge_in_main_msg - 1;
|
|
|
|
sgl_ptr_end->Flags = 0;
|
|
|
|
}
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
scsi_for_each_sg(scp, os_sgl, sge_count, i) {
|
2013-09-06 18:20:52 +08:00
|
|
|
sgl_ptr->Length = cpu_to_le32(sg_dma_len(os_sgl));
|
|
|
|
sgl_ptr->Address = cpu_to_le64(sg_dma_address(os_sgl));
|
2010-12-22 05:34:31 +08:00
|
|
|
sgl_ptr->Flags = 0;
|
2017-10-19 17:48:48 +08:00
|
|
|
if (instance->adapter_type >= INVADER_SERIES)
|
2011-10-09 09:15:06 +08:00
|
|
|
if (i == sge_count - 1)
|
|
|
|
sgl_ptr->Flags = IEEE_SGE_FLAGS_END_OF_LIST;
|
2010-12-22 05:34:31 +08:00
|
|
|
sgl_ptr++;
|
|
|
|
sg_processed = i + 1;
|
|
|
|
|
|
|
|
if ((sg_processed == (fusion->max_sge_in_main_msg - 1)) &&
|
|
|
|
(sge_count > fusion->max_sge_in_main_msg)) {
|
|
|
|
|
|
|
|
struct MPI25_IEEE_SGE_CHAIN64 *sg_chain;
|
2017-10-19 17:48:48 +08:00
|
|
|
if (instance->adapter_type >= INVADER_SERIES) {
|
2013-09-06 18:20:52 +08:00
|
|
|
if ((le16_to_cpu(cmd->io_request->IoFlags) &
|
|
|
|
MPI25_SAS_DEVICE0_FLAGS_ENABLED_FAST_PATH) !=
|
|
|
|
MPI25_SAS_DEVICE0_FLAGS_ENABLED_FAST_PATH)
|
2011-10-09 09:15:06 +08:00
|
|
|
cmd->io_request->ChainOffset =
|
|
|
|
fusion->
|
|
|
|
chain_offset_io_request;
|
|
|
|
else
|
|
|
|
cmd->io_request->ChainOffset = 0;
|
|
|
|
} else
|
|
|
|
cmd->io_request->ChainOffset =
|
|
|
|
fusion->chain_offset_io_request;
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
sg_chain = sgl_ptr;
|
|
|
|
/* Prepare chain element */
|
|
|
|
sg_chain->NextChainOffset = 0;
|
2017-10-19 17:48:48 +08:00
|
|
|
if (instance->adapter_type >= INVADER_SERIES)
|
2011-10-09 09:15:06 +08:00
|
|
|
sg_chain->Flags = IEEE_SGE_FLAGS_CHAIN_ELEMENT;
|
|
|
|
else
|
|
|
|
sg_chain->Flags =
|
|
|
|
(IEEE_SGE_FLAGS_CHAIN_ELEMENT |
|
|
|
|
MPI2_IEEE_SGE_FLAGS_IOCPLBNTA_ADDR);
|
2013-09-06 18:20:52 +08:00
|
|
|
sg_chain->Length = cpu_to_le32((sizeof(union MPI2_SGE_IO_UNION) * (sge_count - sg_processed)));
|
|
|
|
sg_chain->Address = cpu_to_le64(cmd->sg_frame_phys_addr);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
sgl_ptr =
|
|
|
|
(struct MPI25_IEEE_SGE_CHAIN64 *)cmd->sg_frame;
|
2015-08-31 19:53:31 +08:00
|
|
|
memset(sgl_ptr, 0, instance->max_chain_frame_sz);
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
}
|
2017-02-10 16:59:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_make_sgl - Build Scatter Gather List(SGLs)
|
|
|
|
* @scp: SCSI command pointer
|
|
|
|
* @instance: Soft instance of controller
|
|
|
|
* @cmd: Fusion command pointer
|
|
|
|
*
|
|
|
|
* This function will build sgls based on device type.
|
|
|
|
* For nvme drives, there is different way of building sgls in nvme native
|
|
|
|
* format- PRPs(Physical Region Page).
|
|
|
|
*
|
|
|
|
* Returns the number of sg lists actually used, zero if the sg lists
|
|
|
|
* is NULL, or -ENOMEM if the mapping failed
|
|
|
|
*/
|
|
|
|
static
|
|
|
|
int megasas_make_sgl(struct megasas_instance *instance, struct scsi_cmnd *scp,
|
|
|
|
struct megasas_cmd_fusion *cmd)
|
|
|
|
{
|
|
|
|
int sge_count;
|
|
|
|
bool build_prp = false;
|
|
|
|
struct MPI25_IEEE_SGE_CHAIN64 *sgl_chain64;
|
|
|
|
|
|
|
|
sge_count = scsi_dma_map(scp);
|
|
|
|
|
|
|
|
if ((sge_count > instance->max_num_sge) || (sge_count <= 0))
|
|
|
|
return sge_count;
|
|
|
|
|
|
|
|
sgl_chain64 = (struct MPI25_IEEE_SGE_CHAIN64 *)&cmd->io_request->SGL;
|
|
|
|
if ((le16_to_cpu(cmd->io_request->IoFlags) &
|
|
|
|
MPI25_SAS_DEVICE0_FLAGS_ENABLED_FAST_PATH) &&
|
|
|
|
(cmd->pd_interface == NVME_PD))
|
|
|
|
build_prp = megasas_make_prp_nvme(instance, scp, sgl_chain64,
|
|
|
|
cmd, sge_count);
|
|
|
|
|
|
|
|
if (!build_prp)
|
|
|
|
megasas_make_sgl_fusion(instance, scp, sgl_chain64,
|
|
|
|
cmd, sge_count);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
return sge_count;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_set_pd_lba - Sets PD LBA
|
|
|
|
* @cdb: CDB
|
|
|
|
* @cdb_len: cdb length
|
|
|
|
* @start_blk: Start block of IO
|
|
|
|
*
|
|
|
|
* Used to set the PD LBA in CDB for FP IOs
|
|
|
|
*/
|
2019-07-23 22:34:50 +08:00
|
|
|
static void
|
2010-12-22 05:34:31 +08:00
|
|
|
megasas_set_pd_lba(struct MPI2_RAID_SCSI_IO_REQUEST *io_request, u8 cdb_len,
|
|
|
|
struct IO_REQUEST_INFO *io_info, struct scsi_cmnd *scp,
|
2014-09-12 21:27:33 +08:00
|
|
|
struct MR_DRV_RAID_MAP_ALL *local_map_ptr, u32 ref_tag)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
|
|
|
struct MR_LD_RAID *raid;
|
2017-02-10 16:59:19 +08:00
|
|
|
u16 ld;
|
2010-12-22 05:34:31 +08:00
|
|
|
u64 start_blk = io_info->pdBlock;
|
|
|
|
u8 *cdb = io_request->CDB.CDB32;
|
|
|
|
u32 num_blocks = io_info->numBlocks;
|
2011-05-12 09:34:40 +08:00
|
|
|
u8 opcode = 0, flagvals = 0, groupnum = 0, control = 0;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
/* Check if T10 PI (DIF) is enabled for this LD */
|
|
|
|
ld = MR_TargetIdToLdGet(io_info->ldTgtId, local_map_ptr);
|
|
|
|
raid = MR_LdRaidGet(ld, local_map_ptr);
|
|
|
|
if (raid->capability.ldPiMode == MR_PROT_INFO_TYPE_CONTROLLER) {
|
|
|
|
memset(cdb, 0, sizeof(io_request->CDB.CDB32));
|
|
|
|
cdb[0] = MEGASAS_SCSI_VARIABLE_LENGTH_CMD;
|
|
|
|
cdb[7] = MEGASAS_SCSI_ADDL_CDB_LEN;
|
|
|
|
|
2018-10-11 01:31:25 +08:00
|
|
|
if (scp->sc_data_direction == DMA_FROM_DEVICE)
|
2010-12-22 05:34:31 +08:00
|
|
|
cdb[9] = MEGASAS_SCSI_SERVICE_ACTION_READ32;
|
|
|
|
else
|
|
|
|
cdb[9] = MEGASAS_SCSI_SERVICE_ACTION_WRITE32;
|
|
|
|
cdb[10] = MEGASAS_RD_WR_PROTECT_CHECK_ALL;
|
|
|
|
|
|
|
|
/* LBA */
|
|
|
|
cdb[12] = (u8)((start_blk >> 56) & 0xff);
|
|
|
|
cdb[13] = (u8)((start_blk >> 48) & 0xff);
|
|
|
|
cdb[14] = (u8)((start_blk >> 40) & 0xff);
|
|
|
|
cdb[15] = (u8)((start_blk >> 32) & 0xff);
|
|
|
|
cdb[16] = (u8)((start_blk >> 24) & 0xff);
|
|
|
|
cdb[17] = (u8)((start_blk >> 16) & 0xff);
|
|
|
|
cdb[18] = (u8)((start_blk >> 8) & 0xff);
|
|
|
|
cdb[19] = (u8)(start_blk & 0xff);
|
|
|
|
|
|
|
|
/* Logical block reference tag */
|
|
|
|
io_request->CDB.EEDP32.PrimaryReferenceTag =
|
|
|
|
cpu_to_be32(ref_tag);
|
2014-11-17 17:54:28 +08:00
|
|
|
io_request->CDB.EEDP32.PrimaryApplicationTagMask = cpu_to_be16(0xffff);
|
2013-09-06 18:20:52 +08:00
|
|
|
io_request->IoFlags = cpu_to_le16(32); /* Specify 32-byte cdb */
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
/* Transfer length */
|
|
|
|
cdb[28] = (u8)((num_blocks >> 24) & 0xff);
|
|
|
|
cdb[29] = (u8)((num_blocks >> 16) & 0xff);
|
|
|
|
cdb[30] = (u8)((num_blocks >> 8) & 0xff);
|
|
|
|
cdb[31] = (u8)(num_blocks & 0xff);
|
|
|
|
|
|
|
|
/* set SCSI IO EEDPFlags */
|
2018-10-11 01:31:25 +08:00
|
|
|
if (scp->sc_data_direction == DMA_FROM_DEVICE) {
|
2013-09-06 18:20:52 +08:00
|
|
|
io_request->EEDPFlags = cpu_to_le16(
|
2010-12-22 05:34:31 +08:00
|
|
|
MPI2_SCSIIO_EEDPFLAGS_INC_PRI_REFTAG |
|
|
|
|
MPI2_SCSIIO_EEDPFLAGS_CHECK_REFTAG |
|
|
|
|
MPI2_SCSIIO_EEDPFLAGS_CHECK_REMOVE_OP |
|
|
|
|
MPI2_SCSIIO_EEDPFLAGS_CHECK_APPTAG |
|
2017-01-11 07:20:45 +08:00
|
|
|
MPI25_SCSIIO_EEDPFLAGS_DO_NOT_DISABLE_MODE |
|
2013-09-06 18:20:52 +08:00
|
|
|
MPI2_SCSIIO_EEDPFLAGS_CHECK_GUARD);
|
2010-12-22 05:34:31 +08:00
|
|
|
} else {
|
2013-09-06 18:20:52 +08:00
|
|
|
io_request->EEDPFlags = cpu_to_le16(
|
2010-12-22 05:34:31 +08:00
|
|
|
MPI2_SCSIIO_EEDPFLAGS_INC_PRI_REFTAG |
|
2013-09-06 18:20:52 +08:00
|
|
|
MPI2_SCSIIO_EEDPFLAGS_INSERT_OP);
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
2013-09-06 18:20:52 +08:00
|
|
|
io_request->Control |= cpu_to_le32((0x4 << 26));
|
|
|
|
io_request->EEDPBlockSize = cpu_to_le32(scp->device->sector_size);
|
2010-12-22 05:34:31 +08:00
|
|
|
} else {
|
|
|
|
/* Some drives don't support 16/12 byte CDB's, convert to 10 */
|
|
|
|
if (((cdb_len == 12) || (cdb_len == 16)) &&
|
|
|
|
(start_blk <= 0xffffffff)) {
|
|
|
|
if (cdb_len == 16) {
|
|
|
|
opcode = cdb[0] == READ_16 ? READ_10 : WRITE_10;
|
|
|
|
flagvals = cdb[1];
|
|
|
|
groupnum = cdb[14];
|
|
|
|
control = cdb[15];
|
|
|
|
} else {
|
|
|
|
opcode = cdb[0] == READ_12 ? READ_10 : WRITE_10;
|
|
|
|
flagvals = cdb[1];
|
|
|
|
groupnum = cdb[10];
|
|
|
|
control = cdb[11];
|
|
|
|
}
|
|
|
|
|
|
|
|
memset(cdb, 0, sizeof(io_request->CDB.CDB32));
|
|
|
|
|
|
|
|
cdb[0] = opcode;
|
|
|
|
cdb[1] = flagvals;
|
|
|
|
cdb[6] = groupnum;
|
|
|
|
cdb[9] = control;
|
|
|
|
|
|
|
|
/* Transfer length */
|
|
|
|
cdb[8] = (u8)(num_blocks & 0xff);
|
|
|
|
cdb[7] = (u8)((num_blocks >> 8) & 0xff);
|
|
|
|
|
2013-09-06 18:20:52 +08:00
|
|
|
io_request->IoFlags = cpu_to_le16(10); /* Specify 10-byte cdb */
|
2010-12-22 05:34:31 +08:00
|
|
|
cdb_len = 10;
|
2011-05-12 09:34:40 +08:00
|
|
|
} else if ((cdb_len < 16) && (start_blk > 0xffffffff)) {
|
|
|
|
/* Convert to 16 byte CDB for large LBA's */
|
|
|
|
switch (cdb_len) {
|
|
|
|
case 6:
|
|
|
|
opcode = cdb[0] == READ_6 ? READ_16 : WRITE_16;
|
|
|
|
control = cdb[5];
|
|
|
|
break;
|
|
|
|
case 10:
|
|
|
|
opcode =
|
|
|
|
cdb[0] == READ_10 ? READ_16 : WRITE_16;
|
|
|
|
flagvals = cdb[1];
|
|
|
|
groupnum = cdb[6];
|
|
|
|
control = cdb[9];
|
|
|
|
break;
|
|
|
|
case 12:
|
|
|
|
opcode =
|
|
|
|
cdb[0] == READ_12 ? READ_16 : WRITE_16;
|
|
|
|
flagvals = cdb[1];
|
|
|
|
groupnum = cdb[10];
|
|
|
|
control = cdb[11];
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
memset(cdb, 0, sizeof(io_request->CDB.CDB32));
|
|
|
|
|
|
|
|
cdb[0] = opcode;
|
|
|
|
cdb[1] = flagvals;
|
|
|
|
cdb[14] = groupnum;
|
|
|
|
cdb[15] = control;
|
|
|
|
|
|
|
|
/* Transfer length */
|
|
|
|
cdb[13] = (u8)(num_blocks & 0xff);
|
|
|
|
cdb[12] = (u8)((num_blocks >> 8) & 0xff);
|
|
|
|
cdb[11] = (u8)((num_blocks >> 16) & 0xff);
|
|
|
|
cdb[10] = (u8)((num_blocks >> 24) & 0xff);
|
|
|
|
|
2013-09-06 18:20:52 +08:00
|
|
|
io_request->IoFlags = cpu_to_le16(16); /* Specify 16-byte cdb */
|
2011-05-12 09:34:40 +08:00
|
|
|
cdb_len = 16;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Normal case, just load LBA here */
|
|
|
|
switch (cdb_len) {
|
|
|
|
case 6:
|
|
|
|
{
|
|
|
|
u8 val = cdb[1] & 0xE0;
|
|
|
|
cdb[3] = (u8)(start_blk & 0xff);
|
|
|
|
cdb[2] = (u8)((start_blk >> 8) & 0xff);
|
|
|
|
cdb[1] = val | ((u8)(start_blk >> 16) & 0x1f);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
case 10:
|
|
|
|
cdb[5] = (u8)(start_blk & 0xff);
|
|
|
|
cdb[4] = (u8)((start_blk >> 8) & 0xff);
|
|
|
|
cdb[3] = (u8)((start_blk >> 16) & 0xff);
|
|
|
|
cdb[2] = (u8)((start_blk >> 24) & 0xff);
|
|
|
|
break;
|
|
|
|
case 12:
|
|
|
|
cdb[5] = (u8)(start_blk & 0xff);
|
|
|
|
cdb[4] = (u8)((start_blk >> 8) & 0xff);
|
|
|
|
cdb[3] = (u8)((start_blk >> 16) & 0xff);
|
|
|
|
cdb[2] = (u8)((start_blk >> 24) & 0xff);
|
|
|
|
break;
|
|
|
|
case 16:
|
|
|
|
cdb[9] = (u8)(start_blk & 0xff);
|
|
|
|
cdb[8] = (u8)((start_blk >> 8) & 0xff);
|
|
|
|
cdb[7] = (u8)((start_blk >> 16) & 0xff);
|
|
|
|
cdb[6] = (u8)((start_blk >> 24) & 0xff);
|
|
|
|
cdb[5] = (u8)((start_blk >> 32) & 0xff);
|
|
|
|
cdb[4] = (u8)((start_blk >> 40) & 0xff);
|
|
|
|
cdb[3] = (u8)((start_blk >> 48) & 0xff);
|
|
|
|
cdb[2] = (u8)((start_blk >> 56) & 0xff);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-01-11 07:20:46 +08:00
|
|
|
/**
|
|
|
|
* megasas_stream_detect - stream detection on read and and write IOs
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @cmd: Command to be prepared
|
|
|
|
* @io_info: IO Request info
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
|
|
|
/** stream detection on read and and write IOs */
|
|
|
|
static void megasas_stream_detect(struct megasas_instance *instance,
|
2017-02-10 16:59:37 +08:00
|
|
|
struct megasas_cmd_fusion *cmd,
|
|
|
|
struct IO_REQUEST_INFO *io_info)
|
2017-01-11 07:20:46 +08:00
|
|
|
{
|
|
|
|
struct fusion_context *fusion = instance->ctrl_context;
|
|
|
|
u32 device_id = io_info->ldTgtId;
|
|
|
|
struct LD_STREAM_DETECT *current_ld_sd
|
|
|
|
= fusion->stream_detect_by_ld[device_id];
|
|
|
|
u32 *track_stream = ¤t_ld_sd->mru_bit_map, stream_num;
|
|
|
|
u32 shifted_values, unshifted_values;
|
|
|
|
u32 index_value_mask, shifted_values_mask;
|
|
|
|
int i;
|
|
|
|
bool is_read_ahead = false;
|
|
|
|
struct STREAM_DETECT *current_sd;
|
|
|
|
/* find possible stream */
|
|
|
|
for (i = 0; i < MAX_STREAMS_TRACKED; ++i) {
|
2017-02-10 16:59:37 +08:00
|
|
|
stream_num = (*track_stream >>
|
|
|
|
(i * BITS_PER_INDEX_STREAM)) &
|
2017-01-11 07:20:46 +08:00
|
|
|
STREAM_MASK;
|
|
|
|
current_sd = ¤t_ld_sd->stream_track[stream_num];
|
2017-02-10 16:59:37 +08:00
|
|
|
/* if we found a stream, update the raid
|
|
|
|
* context and also update the mruBitMap
|
|
|
|
*/
|
|
|
|
/* boundary condition */
|
|
|
|
if ((current_sd->next_seq_lba) &&
|
|
|
|
(io_info->ldStartBlock >= current_sd->next_seq_lba) &&
|
|
|
|
(io_info->ldStartBlock <= (current_sd->next_seq_lba + 32)) &&
|
|
|
|
(current_sd->is_read == io_info->isRead)) {
|
|
|
|
|
|
|
|
if ((io_info->ldStartBlock != current_sd->next_seq_lba) &&
|
|
|
|
((!io_info->isRead) || (!is_read_ahead)))
|
|
|
|
/*
|
|
|
|
* Once the API availible we need to change this.
|
|
|
|
* At this point we are not allowing any gap
|
|
|
|
*/
|
|
|
|
continue;
|
|
|
|
|
|
|
|
SET_STREAM_DETECTED(cmd->io_request->RaidContext.raid_context_g35);
|
|
|
|
current_sd->next_seq_lba =
|
|
|
|
io_info->ldStartBlock + io_info->numBlocks;
|
2017-01-11 07:20:46 +08:00
|
|
|
/*
|
2017-02-10 16:59:37 +08:00
|
|
|
* update the mruBitMap LRU
|
2017-01-11 07:20:46 +08:00
|
|
|
*/
|
2017-02-10 16:59:37 +08:00
|
|
|
shifted_values_mask =
|
|
|
|
(1 << i * BITS_PER_INDEX_STREAM) - 1;
|
|
|
|
shifted_values = ((*track_stream & shifted_values_mask)
|
|
|
|
<< BITS_PER_INDEX_STREAM);
|
|
|
|
index_value_mask =
|
|
|
|
STREAM_MASK << i * BITS_PER_INDEX_STREAM;
|
|
|
|
unshifted_values =
|
|
|
|
*track_stream & ~(shifted_values_mask |
|
|
|
|
index_value_mask);
|
|
|
|
*track_stream =
|
|
|
|
unshifted_values | shifted_values | stream_num;
|
|
|
|
return;
|
2017-01-11 07:20:46 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* if we did not find any stream, create a new one
|
|
|
|
* from the least recently used
|
|
|
|
*/
|
2017-02-10 16:59:37 +08:00
|
|
|
stream_num = (*track_stream >>
|
|
|
|
((MAX_STREAMS_TRACKED - 1) * BITS_PER_INDEX_STREAM)) &
|
|
|
|
STREAM_MASK;
|
2017-01-11 07:20:46 +08:00
|
|
|
current_sd = ¤t_ld_sd->stream_track[stream_num];
|
|
|
|
current_sd->is_read = io_info->isRead;
|
|
|
|
current_sd->next_seq_lba = io_info->ldStartBlock + io_info->numBlocks;
|
2017-02-10 16:59:37 +08:00
|
|
|
*track_stream = (((*track_stream & ZERO_LAST_STREAM) << 4) | stream_num);
|
2017-01-11 07:20:46 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2017-02-10 16:59:02 +08:00
|
|
|
/**
|
|
|
|
* megasas_set_raidflag_cpu_affinity - This function sets the cpu
|
|
|
|
* affinity (cpu of the controller) and raid_flags in the raid context
|
|
|
|
* based on IO type.
|
|
|
|
*
|
|
|
|
* @praid_context: IO RAID context
|
|
|
|
* @raid: LD raid map
|
|
|
|
* @fp_possible: Is fast path possible?
|
|
|
|
* @is_read: Is read IO?
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
static void
|
2019-06-25 19:04:28 +08:00
|
|
|
megasas_set_raidflag_cpu_affinity(struct fusion_context *fusion,
|
|
|
|
union RAID_CONTEXT_UNION *praid_context,
|
|
|
|
struct MR_LD_RAID *raid, bool fp_possible,
|
|
|
|
u8 is_read, u32 scsi_buff_len)
|
2017-02-10 16:59:02 +08:00
|
|
|
{
|
|
|
|
u8 cpu_sel = MR_RAID_CTX_CPUSEL_0;
|
|
|
|
struct RAID_CONTEXT_G35 *rctx_g35;
|
|
|
|
|
|
|
|
rctx_g35 = &praid_context->raid_context_g35;
|
|
|
|
if (fp_possible) {
|
|
|
|
if (is_read) {
|
|
|
|
if ((raid->cpuAffinity.pdRead.cpu0) &&
|
|
|
|
(raid->cpuAffinity.pdRead.cpu1))
|
|
|
|
cpu_sel = MR_RAID_CTX_CPUSEL_FCFS;
|
|
|
|
else if (raid->cpuAffinity.pdRead.cpu1)
|
|
|
|
cpu_sel = MR_RAID_CTX_CPUSEL_1;
|
|
|
|
} else {
|
|
|
|
if ((raid->cpuAffinity.pdWrite.cpu0) &&
|
|
|
|
(raid->cpuAffinity.pdWrite.cpu1))
|
|
|
|
cpu_sel = MR_RAID_CTX_CPUSEL_FCFS;
|
|
|
|
else if (raid->cpuAffinity.pdWrite.cpu1)
|
|
|
|
cpu_sel = MR_RAID_CTX_CPUSEL_1;
|
|
|
|
/* Fast path cache by pass capable R0/R1 VD */
|
|
|
|
if ((raid->level <= 1) &&
|
|
|
|
(raid->capability.fp_cache_bypass_capable)) {
|
2017-02-10 16:59:21 +08:00
|
|
|
rctx_g35->routing_flags |=
|
|
|
|
(1 << MR_RAID_CTX_ROUTINGFLAGS_SLD_SHIFT);
|
2017-02-10 16:59:02 +08:00
|
|
|
rctx_g35->raid_flags =
|
|
|
|
(MR_RAID_FLAGS_IO_SUB_TYPE_CACHE_BYPASS
|
|
|
|
<< MR_RAID_CTX_RAID_FLAGS_IO_SUB_TYPE_SHIFT);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
if (is_read) {
|
|
|
|
if ((raid->cpuAffinity.ldRead.cpu0) &&
|
|
|
|
(raid->cpuAffinity.ldRead.cpu1))
|
|
|
|
cpu_sel = MR_RAID_CTX_CPUSEL_FCFS;
|
|
|
|
else if (raid->cpuAffinity.ldRead.cpu1)
|
|
|
|
cpu_sel = MR_RAID_CTX_CPUSEL_1;
|
|
|
|
} else {
|
|
|
|
if ((raid->cpuAffinity.ldWrite.cpu0) &&
|
|
|
|
(raid->cpuAffinity.ldWrite.cpu1))
|
|
|
|
cpu_sel = MR_RAID_CTX_CPUSEL_FCFS;
|
|
|
|
else if (raid->cpuAffinity.ldWrite.cpu1)
|
|
|
|
cpu_sel = MR_RAID_CTX_CPUSEL_1;
|
|
|
|
|
2017-02-10 16:59:21 +08:00
|
|
|
if (is_stream_detected(rctx_g35) &&
|
2017-03-10 19:22:14 +08:00
|
|
|
((raid->level == 5) || (raid->level == 6)) &&
|
2017-02-10 16:59:02 +08:00
|
|
|
(raid->writeMode == MR_RL_WRITE_THROUGH_MODE) &&
|
|
|
|
(cpu_sel == MR_RAID_CTX_CPUSEL_FCFS))
|
|
|
|
cpu_sel = MR_RAID_CTX_CPUSEL_0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-02-10 16:59:21 +08:00
|
|
|
rctx_g35->routing_flags |=
|
|
|
|
(cpu_sel << MR_RAID_CTX_ROUTINGFLAGS_CPUSEL_SHIFT);
|
2017-02-10 16:59:13 +08:00
|
|
|
|
|
|
|
/* Always give priority to MR_RAID_FLAGS_IO_SUB_TYPE_LDIO_BW_LIMIT
|
|
|
|
* vs MR_RAID_FLAGS_IO_SUB_TYPE_CACHE_BYPASS.
|
|
|
|
* IO Subtype is not bitmap.
|
|
|
|
*/
|
2019-06-25 19:04:28 +08:00
|
|
|
if ((fusion->pcie_bw_limitation) && (raid->level == 1) && (!is_read) &&
|
|
|
|
(scsi_buff_len > MR_LARGE_IO_MIN_SIZE)) {
|
|
|
|
praid_context->raid_context_g35.raid_flags =
|
|
|
|
(MR_RAID_FLAGS_IO_SUB_TYPE_LDIO_BW_LIMIT
|
|
|
|
<< MR_RAID_CTX_RAID_FLAGS_IO_SUB_TYPE_SHIFT);
|
2017-02-10 16:59:13 +08:00
|
|
|
}
|
2017-02-10 16:59:02 +08:00
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/**
|
|
|
|
* megasas_build_ldio_fusion - Prepares IOs to devices
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @scp: SCSI command
|
|
|
|
* @cmd: Command to be prepared
|
|
|
|
*
|
|
|
|
* Prepares the io_request and chain elements (sg_frame) for IO
|
|
|
|
* The IO can be for PD (Fast Path) or LD
|
|
|
|
*/
|
2019-07-23 22:34:50 +08:00
|
|
|
static void
|
2010-12-22 05:34:31 +08:00
|
|
|
megasas_build_ldio_fusion(struct megasas_instance *instance,
|
|
|
|
struct scsi_cmnd *scp,
|
|
|
|
struct megasas_cmd_fusion *cmd)
|
|
|
|
{
|
2017-02-10 16:59:02 +08:00
|
|
|
bool fp_possible;
|
|
|
|
u16 ld;
|
|
|
|
u32 start_lba_lo, start_lba_hi, device_id, datalength = 0;
|
|
|
|
u32 scsi_buff_len;
|
2010-12-22 05:34:31 +08:00
|
|
|
struct MPI2_RAID_SCSI_IO_REQUEST *io_request;
|
|
|
|
struct IO_REQUEST_INFO io_info;
|
|
|
|
struct fusion_context *fusion;
|
2014-09-12 21:27:33 +08:00
|
|
|
struct MR_DRV_RAID_MAP_ALL *local_map_ptr;
|
2013-09-07 06:27:14 +08:00
|
|
|
u8 *raidLUN;
|
2017-01-11 07:20:46 +08:00
|
|
|
unsigned long spinlock_flags;
|
2017-02-10 16:59:02 +08:00
|
|
|
struct MR_LD_RAID *raid = NULL;
|
|
|
|
struct MR_PRIV_DEVICE *mrdev_priv;
|
2018-10-17 14:37:49 +08:00
|
|
|
struct RAID_CONTEXT *rctx;
|
|
|
|
struct RAID_CONTEXT_G35 *rctx_g35;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2015-04-23 19:00:39 +08:00
|
|
|
device_id = MEGASAS_DEV_INDEX(scp);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
|
|
|
io_request = cmd->io_request;
|
2018-10-17 14:37:49 +08:00
|
|
|
rctx = &io_request->RaidContext.raid_context;
|
|
|
|
rctx_g35 = &io_request->RaidContext.raid_context_g35;
|
|
|
|
|
|
|
|
rctx->virtual_disk_tgt_id = cpu_to_le16(device_id);
|
|
|
|
rctx->status = 0;
|
|
|
|
rctx->ex_status = 0;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
start_lba_lo = 0;
|
|
|
|
start_lba_hi = 0;
|
2017-02-10 16:59:02 +08:00
|
|
|
fp_possible = false;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* 6-byte READ(0x08) or WRITE(0x0A) cdb
|
|
|
|
*/
|
|
|
|
if (scp->cmd_len == 6) {
|
2012-10-02 10:27:07 +08:00
|
|
|
datalength = (u32) scp->cmnd[4];
|
2010-12-22 05:34:31 +08:00
|
|
|
start_lba_lo = ((u32) scp->cmnd[1] << 16) |
|
|
|
|
((u32) scp->cmnd[2] << 8) | (u32) scp->cmnd[3];
|
|
|
|
|
|
|
|
start_lba_lo &= 0x1FFFFF;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* 10-byte READ(0x28) or WRITE(0x2A) cdb
|
|
|
|
*/
|
|
|
|
else if (scp->cmd_len == 10) {
|
2012-10-02 10:27:07 +08:00
|
|
|
datalength = (u32) scp->cmnd[8] |
|
2010-12-22 05:34:31 +08:00
|
|
|
((u32) scp->cmnd[7] << 8);
|
|
|
|
start_lba_lo = ((u32) scp->cmnd[2] << 24) |
|
|
|
|
((u32) scp->cmnd[3] << 16) |
|
|
|
|
((u32) scp->cmnd[4] << 8) | (u32) scp->cmnd[5];
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* 12-byte READ(0xA8) or WRITE(0xAA) cdb
|
|
|
|
*/
|
|
|
|
else if (scp->cmd_len == 12) {
|
2012-10-02 10:27:07 +08:00
|
|
|
datalength = ((u32) scp->cmnd[6] << 24) |
|
2010-12-22 05:34:31 +08:00
|
|
|
((u32) scp->cmnd[7] << 16) |
|
|
|
|
((u32) scp->cmnd[8] << 8) | (u32) scp->cmnd[9];
|
|
|
|
start_lba_lo = ((u32) scp->cmnd[2] << 24) |
|
|
|
|
((u32) scp->cmnd[3] << 16) |
|
|
|
|
((u32) scp->cmnd[4] << 8) | (u32) scp->cmnd[5];
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* 16-byte READ(0x88) or WRITE(0x8A) cdb
|
|
|
|
*/
|
|
|
|
else if (scp->cmd_len == 16) {
|
2012-10-02 10:27:07 +08:00
|
|
|
datalength = ((u32) scp->cmnd[10] << 24) |
|
2010-12-22 05:34:31 +08:00
|
|
|
((u32) scp->cmnd[11] << 16) |
|
|
|
|
((u32) scp->cmnd[12] << 8) | (u32) scp->cmnd[13];
|
|
|
|
start_lba_lo = ((u32) scp->cmnd[6] << 24) |
|
|
|
|
((u32) scp->cmnd[7] << 16) |
|
|
|
|
((u32) scp->cmnd[8] << 8) | (u32) scp->cmnd[9];
|
|
|
|
|
|
|
|
start_lba_hi = ((u32) scp->cmnd[2] << 24) |
|
|
|
|
((u32) scp->cmnd[3] << 16) |
|
|
|
|
((u32) scp->cmnd[4] << 8) | (u32) scp->cmnd[5];
|
|
|
|
}
|
|
|
|
|
|
|
|
memset(&io_info, 0, sizeof(struct IO_REQUEST_INFO));
|
|
|
|
io_info.ldStartBlock = ((u64)start_lba_hi << 32) | start_lba_lo;
|
2012-10-02 10:27:07 +08:00
|
|
|
io_info.numBlocks = datalength;
|
2010-12-22 05:34:31 +08:00
|
|
|
io_info.ldTgtId = device_id;
|
2017-02-10 16:59:03 +08:00
|
|
|
io_info.r1_alt_dev_handle = MR_DEVHANDLE_INVALID;
|
2017-02-10 16:59:02 +08:00
|
|
|
scsi_buff_len = scsi_bufflen(scp);
|
|
|
|
io_request->DataLength = cpu_to_le32(scsi_buff_len);
|
2019-06-25 19:04:34 +08:00
|
|
|
io_info.data_arms = 1;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2018-10-11 01:31:25 +08:00
|
|
|
if (scp->sc_data_direction == DMA_FROM_DEVICE)
|
2010-12-22 05:34:31 +08:00
|
|
|
io_info.isRead = 1;
|
|
|
|
|
2014-09-12 21:27:33 +08:00
|
|
|
local_map_ptr = fusion->ld_drv_map[(instance->map_id & 1)];
|
2017-01-11 07:20:48 +08:00
|
|
|
ld = MR_TargetIdToLdGet(device_id, local_map_ptr);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2017-02-10 16:59:19 +08:00
|
|
|
if (ld < instance->fw_supported_vd_count)
|
|
|
|
raid = MR_LdRaidGet(ld, local_map_ptr);
|
|
|
|
|
|
|
|
if (!raid || (!fusion->fast_path_io)) {
|
2018-10-17 14:37:49 +08:00
|
|
|
rctx->reg_lock_flags = 0;
|
2017-02-10 16:59:02 +08:00
|
|
|
fp_possible = false;
|
2010-12-22 05:34:31 +08:00
|
|
|
} else {
|
2018-10-17 14:37:49 +08:00
|
|
|
if (MR_BuildRaidContext(instance, &io_info, rctx,
|
2013-09-07 06:27:14 +08:00
|
|
|
local_map_ptr, &raidLUN))
|
2017-02-10 16:59:02 +08:00
|
|
|
fp_possible = (io_info.fpOkForIo > 0) ? true : false;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
2020-01-14 19:21:21 +08:00
|
|
|
megasas_get_msix_index(instance, scp, cmd, io_info.data_arms);
|
2011-10-09 09:15:13 +08:00
|
|
|
|
2018-12-17 16:47:37 +08:00
|
|
|
if (instance->adapter_type >= VENTURA_SERIES) {
|
2017-02-10 16:59:13 +08:00
|
|
|
/* FP for Optimal raid level 1.
|
|
|
|
* All large RAID-1 writes (> 32 KiB, both WT and WB modes)
|
|
|
|
* are built by the driver as LD I/Os.
|
|
|
|
* All small RAID-1 WT writes (<= 32 KiB) are built as FP I/Os
|
|
|
|
* (there is never a reason to process these as buffered writes)
|
|
|
|
* All small RAID-1 WB writes (<= 32 KiB) are built as FP I/Os
|
|
|
|
* with the SLD bit asserted.
|
|
|
|
*/
|
2017-02-10 16:59:03 +08:00
|
|
|
if (io_info.r1_alt_dev_handle != MR_DEVHANDLE_INVALID) {
|
2017-02-10 16:59:02 +08:00
|
|
|
mrdev_priv = scp->device->hostdata;
|
|
|
|
|
|
|
|
if (atomic_inc_return(&instance->fw_outstanding) >
|
|
|
|
(instance->host->can_queue)) {
|
|
|
|
fp_possible = false;
|
|
|
|
atomic_dec(&instance->fw_outstanding);
|
2019-06-25 19:04:28 +08:00
|
|
|
} else if (fusion->pcie_bw_limitation &&
|
|
|
|
((scsi_buff_len > MR_LARGE_IO_MIN_SIZE) ||
|
|
|
|
(atomic_dec_if_positive(&mrdev_priv->r1_ldio_hint) > 0))) {
|
2017-02-10 16:59:13 +08:00
|
|
|
fp_possible = false;
|
|
|
|
atomic_dec(&instance->fw_outstanding);
|
|
|
|
if (scsi_buff_len > MR_LARGE_IO_MIN_SIZE)
|
|
|
|
atomic_set(&mrdev_priv->r1_ldio_hint,
|
|
|
|
instance->r1_ldio_hint_default);
|
2017-02-10 16:59:02 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-05 21:27:45 +08:00
|
|
|
if (!fp_possible ||
|
|
|
|
(io_info.isRead && io_info.ra_capable)) {
|
|
|
|
spin_lock_irqsave(&instance->stream_lock,
|
|
|
|
spinlock_flags);
|
|
|
|
megasas_stream_detect(instance, cmd, &io_info);
|
|
|
|
spin_unlock_irqrestore(&instance->stream_lock,
|
|
|
|
spinlock_flags);
|
|
|
|
/* In ventura if stream detected for a read and it is
|
|
|
|
* read ahead capable make this IO as LDIO
|
|
|
|
*/
|
2018-10-17 14:37:49 +08:00
|
|
|
if (is_stream_detected(rctx_g35))
|
2018-01-05 21:27:45 +08:00
|
|
|
fp_possible = false;
|
|
|
|
}
|
|
|
|
|
2017-02-10 16:59:02 +08:00
|
|
|
/* If raid is NULL, set CPU affinity to default CPU0 */
|
|
|
|
if (raid)
|
2019-06-25 19:04:28 +08:00
|
|
|
megasas_set_raidflag_cpu_affinity(fusion, &io_request->RaidContext,
|
2017-02-10 16:59:13 +08:00
|
|
|
raid, fp_possible, io_info.isRead,
|
|
|
|
scsi_buff_len);
|
2017-02-10 16:59:02 +08:00
|
|
|
else
|
2018-10-17 14:37:49 +08:00
|
|
|
rctx_g35->routing_flags |=
|
2017-02-10 16:59:21 +08:00
|
|
|
(MR_RAID_CTX_CPUSEL_0 << MR_RAID_CTX_ROUTINGFLAGS_CPUSEL_SHIFT);
|
2017-02-10 16:59:02 +08:00
|
|
|
}
|
2017-01-11 07:20:48 +08:00
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
if (fp_possible) {
|
|
|
|
megasas_set_pd_lba(io_request, scp->cmd_len, &io_info, scp,
|
|
|
|
local_map_ptr, start_lba_lo);
|
|
|
|
io_request->Function = MPI2_FUNCTION_SCSI_IO_REQUEST;
|
|
|
|
cmd->request_desc->SCSIIO.RequestFlags =
|
2016-01-28 23:34:24 +08:00
|
|
|
(MPI2_REQ_DESCRIPT_FLAGS_FP_IO
|
2010-12-22 05:34:31 +08:00
|
|
|
<< MEGASAS_REQ_DESCRIPT_FLAGS_TYPE_SHIFT);
|
2017-10-19 17:48:48 +08:00
|
|
|
if (instance->adapter_type == INVADER_SERIES) {
|
2018-10-17 14:37:49 +08:00
|
|
|
rctx->type = MPI2_TYPE_CUDA;
|
|
|
|
rctx->nseg = 0x1;
|
2013-09-06 18:20:52 +08:00
|
|
|
io_request->IoFlags |= cpu_to_le16(MPI25_SAS_DEVICE0_FLAGS_ENABLED_FAST_PATH);
|
2018-10-17 14:37:49 +08:00
|
|
|
rctx->reg_lock_flags |=
|
2011-10-09 09:15:06 +08:00
|
|
|
(MR_RL_FLAGS_GRANT_DESTINATION_CUDA |
|
|
|
|
MR_RL_FLAGS_SEQ_NUM_ENABLE);
|
2018-12-17 16:47:37 +08:00
|
|
|
} else if (instance->adapter_type >= VENTURA_SERIES) {
|
2018-10-17 14:37:49 +08:00
|
|
|
rctx_g35->nseg_type |= (1 << RAID_CONTEXT_NSEG_SHIFT);
|
|
|
|
rctx_g35->nseg_type |= (MPI2_TYPE_CUDA << RAID_CONTEXT_TYPE_SHIFT);
|
|
|
|
rctx_g35->routing_flags |= (1 << MR_RAID_CTX_ROUTINGFLAGS_SQN_SHIFT);
|
2017-01-11 07:20:48 +08:00
|
|
|
io_request->IoFlags |=
|
2017-02-10 16:59:21 +08:00
|
|
|
cpu_to_le16(MPI25_SAS_DEVICE0_FLAGS_ENABLED_FAST_PATH);
|
2011-10-09 09:15:06 +08:00
|
|
|
}
|
2017-02-10 16:59:17 +08:00
|
|
|
if (fusion->load_balance_info &&
|
|
|
|
(fusion->load_balance_info[device_id].loadBalanceFlag) &&
|
|
|
|
(io_info.isRead)) {
|
2010-12-22 05:34:31 +08:00
|
|
|
io_info.devHandle =
|
2014-09-12 21:27:53 +08:00
|
|
|
get_updated_dev_handle(instance,
|
2010-12-22 05:34:31 +08:00
|
|
|
&fusion->load_balance_info[device_id],
|
2017-02-10 16:59:12 +08:00
|
|
|
&io_info, local_map_ptr);
|
2010-12-22 05:34:31 +08:00
|
|
|
scp->SCp.Status |= MEGASAS_LOAD_BALANCE_FLAG;
|
2014-09-12 21:27:53 +08:00
|
|
|
cmd->pd_r1_lb = io_info.pd_after_lb;
|
2018-12-17 16:47:37 +08:00
|
|
|
if (instance->adapter_type >= VENTURA_SERIES)
|
2018-10-17 14:37:49 +08:00
|
|
|
rctx_g35->span_arm = io_info.span_arm;
|
2017-01-11 07:20:48 +08:00
|
|
|
else
|
2018-10-17 14:37:49 +08:00
|
|
|
rctx->span_arm = io_info.span_arm;
|
2017-01-11 07:20:48 +08:00
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
} else
|
|
|
|
scp->SCp.Status &= ~MEGASAS_LOAD_BALANCE_FLAG;
|
2015-04-23 19:02:09 +08:00
|
|
|
|
2018-12-17 16:47:37 +08:00
|
|
|
if (instance->adapter_type >= VENTURA_SERIES)
|
2017-01-11 07:20:47 +08:00
|
|
|
cmd->r1_alt_dev_handle = io_info.r1_alt_dev_handle;
|
2017-02-10 16:59:03 +08:00
|
|
|
else
|
|
|
|
cmd->r1_alt_dev_handle = MR_DEVHANDLE_INVALID;
|
2017-01-11 07:20:47 +08:00
|
|
|
|
2015-04-23 19:02:09 +08:00
|
|
|
if ((raidLUN[0] == 1) &&
|
2015-08-31 19:54:01 +08:00
|
|
|
(local_map_ptr->raidMap.devHndlInfo[io_info.pd_after_lb].validHandles > 1)) {
|
2015-04-23 19:02:09 +08:00
|
|
|
instance->dev_handle = !(instance->dev_handle);
|
|
|
|
io_info.devHandle =
|
|
|
|
local_map_ptr->raidMap.devHndlInfo[io_info.pd_after_lb].devHandle[instance->dev_handle];
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
cmd->request_desc->SCSIIO.DevHandle = io_info.devHandle;
|
|
|
|
io_request->DevHandle = io_info.devHandle;
|
2017-02-10 16:59:12 +08:00
|
|
|
cmd->pd_interface = io_info.pd_interface;
|
2013-09-07 06:27:14 +08:00
|
|
|
/* populate the LUN field */
|
|
|
|
memcpy(io_request->LUN, raidLUN, 8);
|
2010-12-22 05:34:31 +08:00
|
|
|
} else {
|
2018-10-17 14:37:49 +08:00
|
|
|
rctx->timeout_value =
|
2013-09-06 18:20:52 +08:00
|
|
|
cpu_to_le16(local_map_ptr->raidMap.fpPdIoTimeoutSec);
|
2010-12-22 05:34:31 +08:00
|
|
|
cmd->request_desc->SCSIIO.RequestFlags =
|
|
|
|
(MEGASAS_REQ_DESCRIPT_FLAGS_LD_IO
|
|
|
|
<< MEGASAS_REQ_DESCRIPT_FLAGS_TYPE_SHIFT);
|
2017-10-19 17:48:48 +08:00
|
|
|
if (instance->adapter_type == INVADER_SERIES) {
|
2016-01-28 23:34:27 +08:00
|
|
|
if (io_info.do_fp_rlbypass ||
|
2018-10-17 14:37:49 +08:00
|
|
|
(rctx->reg_lock_flags == REGION_TYPE_UNUSED))
|
2011-10-09 09:15:06 +08:00
|
|
|
cmd->request_desc->SCSIIO.RequestFlags =
|
|
|
|
(MEGASAS_REQ_DESCRIPT_FLAGS_NO_LOCK <<
|
|
|
|
MEGASAS_REQ_DESCRIPT_FLAGS_TYPE_SHIFT);
|
2018-10-17 14:37:49 +08:00
|
|
|
rctx->type = MPI2_TYPE_CUDA;
|
|
|
|
rctx->reg_lock_flags |=
|
2011-10-09 09:15:06 +08:00
|
|
|
(MR_RL_FLAGS_GRANT_DESTINATION_CPU0 |
|
2018-10-17 14:37:49 +08:00
|
|
|
MR_RL_FLAGS_SEQ_NUM_ENABLE);
|
|
|
|
rctx->nseg = 0x1;
|
2018-12-17 16:47:37 +08:00
|
|
|
} else if (instance->adapter_type >= VENTURA_SERIES) {
|
2018-10-17 14:37:49 +08:00
|
|
|
rctx_g35->routing_flags |= (1 << MR_RAID_CTX_ROUTINGFLAGS_SQN_SHIFT);
|
|
|
|
rctx_g35->nseg_type |= (1 << RAID_CONTEXT_NSEG_SHIFT);
|
|
|
|
rctx_g35->nseg_type |= (MPI2_TYPE_CUDA << RAID_CONTEXT_TYPE_SHIFT);
|
2011-10-09 09:15:06 +08:00
|
|
|
}
|
|
|
|
io_request->Function = MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST;
|
2013-09-06 18:20:52 +08:00
|
|
|
io_request->DevHandle = cpu_to_le16(device_id);
|
2017-01-11 07:20:48 +08:00
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
} /* Not FP */
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
2015-04-23 19:00:39 +08:00
|
|
|
* megasas_build_ld_nonrw_fusion - prepares non rw ios for virtual disk
|
2010-12-22 05:34:31 +08:00
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @scp: SCSI command
|
|
|
|
* @cmd: Command to be prepared
|
|
|
|
*
|
2015-04-23 19:00:39 +08:00
|
|
|
* Prepares the io_request frame for non-rw io cmds for vd.
|
2010-12-22 05:34:31 +08:00
|
|
|
*/
|
2015-04-23 19:00:39 +08:00
|
|
|
static void megasas_build_ld_nonrw_fusion(struct megasas_instance *instance,
|
|
|
|
struct scsi_cmnd *scmd, struct megasas_cmd_fusion *cmd)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
|
|
|
u32 device_id;
|
|
|
|
struct MPI2_RAID_SCSI_IO_REQUEST *io_request;
|
2017-02-10 16:59:29 +08:00
|
|
|
u16 ld;
|
2014-09-12 21:27:33 +08:00
|
|
|
struct MR_DRV_RAID_MAP_ALL *local_map_ptr;
|
2010-12-22 05:34:31 +08:00
|
|
|
struct fusion_context *fusion = instance->ctrl_context;
|
2013-09-07 06:27:14 +08:00
|
|
|
u8 span, physArm;
|
2015-04-23 19:02:54 +08:00
|
|
|
__le16 devHandle;
|
2017-02-10 16:59:19 +08:00
|
|
|
u32 arRef, pd;
|
2013-09-07 06:27:14 +08:00
|
|
|
struct MR_LD_RAID *raid;
|
|
|
|
struct RAID_CONTEXT *pRAID_Context;
|
2015-04-23 19:00:39 +08:00
|
|
|
u8 fp_possible = 1;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
io_request = cmd->io_request;
|
2015-04-23 19:00:39 +08:00
|
|
|
device_id = MEGASAS_DEV_INDEX(scmd);
|
2014-09-12 21:27:33 +08:00
|
|
|
local_map_ptr = fusion->ld_drv_map[(instance->map_id & 1)];
|
2013-09-06 18:20:52 +08:00
|
|
|
io_request->DataLength = cpu_to_le32(scsi_bufflen(scmd));
|
2015-04-23 19:00:39 +08:00
|
|
|
/* get RAID_Context pointer */
|
2017-01-11 07:20:46 +08:00
|
|
|
pRAID_Context = &io_request->RaidContext.raid_context;
|
2015-04-23 19:00:39 +08:00
|
|
|
/* Check with FW team */
|
2017-01-11 07:20:48 +08:00
|
|
|
pRAID_Context->virtual_disk_tgt_id = cpu_to_le16(device_id);
|
|
|
|
pRAID_Context->reg_lock_row_lba = 0;
|
|
|
|
pRAID_Context->reg_lock_length = 0;
|
2013-09-07 06:27:14 +08:00
|
|
|
|
2015-04-23 19:00:39 +08:00
|
|
|
if (fusion->fast_path_io && (
|
|
|
|
device_id < instance->fw_supported_vd_count)) {
|
2015-01-05 22:36:08 +08:00
|
|
|
|
2013-09-07 06:27:14 +08:00
|
|
|
ld = MR_TargetIdToLdGet(device_id, local_map_ptr);
|
scsi: megaraid: fix out-of-bound array accesses
UBSAN reported those with MegaRAID SAS-3 3108,
[ 77.467308] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32
[ 77.475402] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]'
[ 77.481677] CPU: 16 PID: 333 Comm: kworker/16:1 Not tainted 4.20.0-rc5+ #1
[ 77.488556] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018
[ 77.495791] Workqueue: events work_for_cpu_fn
[ 77.500154] Call trace:
[ 77.502610] dump_backtrace+0x0/0x2c8
[ 77.506279] show_stack+0x24/0x30
[ 77.509604] dump_stack+0x118/0x19c
[ 77.513098] ubsan_epilogue+0x14/0x60
[ 77.516765] __ubsan_handle_out_of_bounds+0xfc/0x13c
[ 77.521767] mr_update_load_balance_params+0x150/0x158 [megaraid_sas]
[ 77.528230] MR_ValidateMapInfo+0x2cc/0x10d0 [megaraid_sas]
[ 77.533825] megasas_get_map_info+0x244/0x2f0 [megaraid_sas]
[ 77.539505] megasas_init_adapter_fusion+0x9b0/0xf48 [megaraid_sas]
[ 77.545794] megasas_init_fw+0x1ab4/0x3518 [megaraid_sas]
[ 77.551212] megasas_probe_one+0x2c4/0xbe0 [megaraid_sas]
[ 77.556614] local_pci_probe+0x7c/0xf0
[ 77.560365] work_for_cpu_fn+0x34/0x50
[ 77.564118] process_one_work+0x61c/0xf08
[ 77.568129] worker_thread+0x534/0xa70
[ 77.571882] kthread+0x1c8/0x1d0
[ 77.575114] ret_from_fork+0x10/0x1c
[ 89.240332] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32
[ 89.248426] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]'
[ 89.254700] CPU: 16 PID: 95 Comm: kworker/u130:0 Not tainted 4.20.0-rc5+ #1
[ 89.261665] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018
[ 89.268903] Workqueue: events_unbound async_run_entry_fn
[ 89.274222] Call trace:
[ 89.276680] dump_backtrace+0x0/0x2c8
[ 89.280348] show_stack+0x24/0x30
[ 89.283671] dump_stack+0x118/0x19c
[ 89.287167] ubsan_epilogue+0x14/0x60
[ 89.290835] __ubsan_handle_out_of_bounds+0xfc/0x13c
[ 89.295828] MR_LdRaidGet+0x50/0x58 [megaraid_sas]
[ 89.300638] megasas_build_io_fusion+0xbb8/0xd90 [megaraid_sas]
[ 89.306576] megasas_build_and_issue_cmd_fusion+0x138/0x460 [megaraid_sas]
[ 89.313468] megasas_queue_command+0x398/0x3d0 [megaraid_sas]
[ 89.319222] scsi_dispatch_cmd+0x1dc/0x8a8
[ 89.323321] scsi_request_fn+0x8e8/0xdd0
[ 89.327249] __blk_run_queue+0xc4/0x158
[ 89.331090] blk_execute_rq_nowait+0xf4/0x158
[ 89.335449] blk_execute_rq+0xdc/0x158
[ 89.339202] __scsi_execute+0x130/0x258
[ 89.343041] scsi_probe_and_add_lun+0x2fc/0x1488
[ 89.347661] __scsi_scan_target+0x1cc/0x8c8
[ 89.351848] scsi_scan_channel.part.3+0x8c/0xc0
[ 89.356382] scsi_scan_host_selected+0x130/0x1f0
[ 89.361002] do_scsi_scan_host+0xd8/0xf0
[ 89.364927] do_scan_async+0x9c/0x320
[ 89.368594] async_run_entry_fn+0x138/0x420
[ 89.372780] process_one_work+0x61c/0xf08
[ 89.376793] worker_thread+0x13c/0xa70
[ 89.380546] kthread+0x1c8/0x1d0
[ 89.383778] ret_from_fork+0x10/0x1c
This is because when populating Driver Map using firmware raid map, all
non-existing VDs set their ldTgtIdToLd to 0xff, so it can be skipped later.
From drivers/scsi/megaraid/megaraid_sas_base.c ,
memset(instance->ld_ids, 0xff, MEGASAS_MAX_LD_IDS);
From drivers/scsi/megaraid/megaraid_sas_fp.c ,
/* For non existing VDs, iterate to next VD*/
if (ld >= (MAX_LOGICAL_DRIVES_EXT - 1))
continue;
However, there are a few places that failed to skip those non-existing VDs
due to off-by-one errors. Then, those 0xff leaked into MR_LdRaidGet(0xff,
map) and triggered the out-of-bound accesses.
Fixes: 51087a8617fe ("megaraid_sas : Extended VD support")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-13 21:27:27 +08:00
|
|
|
if (ld >= instance->fw_supported_vd_count - 1)
|
2015-04-23 19:00:39 +08:00
|
|
|
fp_possible = 0;
|
2017-02-10 16:59:19 +08:00
|
|
|
else {
|
|
|
|
raid = MR_LdRaidGet(ld, local_map_ptr);
|
|
|
|
if (!(raid->capability.fpNonRWCapable))
|
|
|
|
fp_possible = 0;
|
|
|
|
}
|
2015-04-23 19:00:39 +08:00
|
|
|
} else
|
|
|
|
fp_possible = 0;
|
2013-09-07 06:27:14 +08:00
|
|
|
|
2015-04-23 19:00:39 +08:00
|
|
|
if (!fp_possible) {
|
|
|
|
io_request->Function = MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST;
|
|
|
|
io_request->DevHandle = cpu_to_le16(device_id);
|
|
|
|
io_request->LUN[1] = scmd->device->lun;
|
2017-01-11 07:20:48 +08:00
|
|
|
pRAID_Context->timeout_value =
|
2015-04-23 19:00:39 +08:00
|
|
|
cpu_to_le16 (scmd->request->timeout / HZ);
|
|
|
|
cmd->request_desc->SCSIIO.RequestFlags =
|
|
|
|
(MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO <<
|
|
|
|
MEGASAS_REQ_DESCRIPT_FLAGS_TYPE_SHIFT);
|
|
|
|
} else {
|
2013-09-07 06:27:14 +08:00
|
|
|
|
|
|
|
/* set RAID context values */
|
2017-01-11 07:20:48 +08:00
|
|
|
pRAID_Context->config_seq_num = raid->seqNum;
|
2018-12-17 16:47:37 +08:00
|
|
|
if (instance->adapter_type < VENTURA_SERIES)
|
2017-01-11 07:20:48 +08:00
|
|
|
pRAID_Context->reg_lock_flags = REGION_TYPE_SHARED_READ;
|
|
|
|
pRAID_Context->timeout_value =
|
|
|
|
cpu_to_le16(raid->fpIoTimeoutForLd);
|
2013-09-07 06:27:14 +08:00
|
|
|
|
|
|
|
/* get the DevHandle for the PD (since this is
|
|
|
|
fpNonRWCapable, this is a single disk RAID0) */
|
|
|
|
span = physArm = 0;
|
|
|
|
arRef = MR_LdSpanArrayGet(ld, span, local_map_ptr);
|
|
|
|
pd = MR_ArPdGet(arRef, physArm, local_map_ptr);
|
|
|
|
devHandle = MR_PdDevHandleGet(pd, local_map_ptr);
|
|
|
|
|
|
|
|
/* build request descriptor */
|
|
|
|
cmd->request_desc->SCSIIO.RequestFlags =
|
2016-01-28 23:34:24 +08:00
|
|
|
(MPI2_REQ_DESCRIPT_FLAGS_FP_IO <<
|
2015-04-23 19:00:39 +08:00
|
|
|
MEGASAS_REQ_DESCRIPT_FLAGS_TYPE_SHIFT);
|
2013-09-07 06:27:14 +08:00
|
|
|
cmd->request_desc->SCSIIO.DevHandle = devHandle;
|
|
|
|
|
|
|
|
/* populate the LUN field */
|
|
|
|
memcpy(io_request->LUN, raid->LUN, 8);
|
|
|
|
|
|
|
|
/* build the raidScsiIO structure */
|
|
|
|
io_request->Function = MPI2_FUNCTION_SCSI_IO_REQUEST;
|
|
|
|
io_request->DevHandle = devHandle;
|
2015-04-23 19:00:39 +08:00
|
|
|
}
|
|
|
|
}
|
2013-09-07 06:27:14 +08:00
|
|
|
|
2015-04-23 19:00:39 +08:00
|
|
|
/**
|
|
|
|
* megasas_build_syspd_fusion - prepares rw/non-rw ios for syspd
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @scp: SCSI command
|
|
|
|
* @cmd: Command to be prepared
|
|
|
|
* @fp_possible: parameter to detect fast path or firmware path io.
|
|
|
|
*
|
|
|
|
* Prepares the io_request frame for rw/non-rw io cmds for syspds
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
megasas_build_syspd_fusion(struct megasas_instance *instance,
|
2017-02-10 16:59:30 +08:00
|
|
|
struct scsi_cmnd *scmd, struct megasas_cmd_fusion *cmd,
|
|
|
|
bool fp_possible)
|
2015-04-23 19:00:39 +08:00
|
|
|
{
|
|
|
|
u32 device_id;
|
|
|
|
struct MPI2_RAID_SCSI_IO_REQUEST *io_request;
|
|
|
|
u16 pd_index = 0;
|
|
|
|
u16 os_timeout_value;
|
|
|
|
u16 timeout_limit;
|
|
|
|
struct MR_DRV_RAID_MAP_ALL *local_map_ptr;
|
|
|
|
struct RAID_CONTEXT *pRAID_Context;
|
2015-08-31 19:53:11 +08:00
|
|
|
struct MR_PD_CFG_SEQ_NUM_SYNC *pd_sync;
|
2017-02-10 16:59:12 +08:00
|
|
|
struct MR_PRIV_DEVICE *mr_device_priv_data;
|
2015-04-23 19:00:39 +08:00
|
|
|
struct fusion_context *fusion = instance->ctrl_context;
|
2015-08-31 19:53:11 +08:00
|
|
|
pd_sync = (void *)fusion->pd_seq_sync[(instance->pd_seq_map_id - 1) & 1];
|
2015-04-23 19:00:39 +08:00
|
|
|
|
|
|
|
device_id = MEGASAS_DEV_INDEX(scmd);
|
|
|
|
pd_index = MEGASAS_PD_INDEX(scmd);
|
|
|
|
os_timeout_value = scmd->request->timeout / HZ;
|
2017-02-10 16:59:12 +08:00
|
|
|
mr_device_priv_data = scmd->device->hostdata;
|
|
|
|
cmd->pd_interface = mr_device_priv_data->interface_type;
|
2013-09-07 06:27:14 +08:00
|
|
|
|
2015-04-23 19:00:39 +08:00
|
|
|
io_request = cmd->io_request;
|
|
|
|
/* get RAID_Context pointer */
|
2017-01-11 07:20:46 +08:00
|
|
|
pRAID_Context = &io_request->RaidContext.raid_context;
|
2017-01-11 07:20:48 +08:00
|
|
|
pRAID_Context->reg_lock_flags = 0;
|
|
|
|
pRAID_Context->reg_lock_row_lba = 0;
|
|
|
|
pRAID_Context->reg_lock_length = 0;
|
2015-04-23 19:00:39 +08:00
|
|
|
io_request->DataLength = cpu_to_le32(scsi_bufflen(scmd));
|
|
|
|
io_request->LUN[1] = scmd->device->lun;
|
2017-01-11 07:20:48 +08:00
|
|
|
pRAID_Context->raid_flags = MR_RAID_FLAGS_IO_SUB_TYPE_SYSTEM_PD
|
2015-04-23 19:00:39 +08:00
|
|
|
<< MR_RAID_CTX_RAID_FLAGS_IO_SUB_TYPE_SHIFT;
|
|
|
|
|
2015-08-31 19:53:11 +08:00
|
|
|
/* If FW supports PD sequence number */
|
2019-06-25 19:04:26 +08:00
|
|
|
if (instance->support_seqnum_jbod_fp) {
|
|
|
|
if (instance->use_seqnum_jbod_fp &&
|
|
|
|
instance->pd_list[pd_index].driveType == TYPE_DISK) {
|
|
|
|
|
|
|
|
/* More than 256 PD/JBOD support for Ventura */
|
|
|
|
if (instance->support_morethan256jbod)
|
|
|
|
pRAID_Context->virtual_disk_tgt_id =
|
|
|
|
pd_sync->seq[pd_index].pd_target_id;
|
|
|
|
else
|
|
|
|
pRAID_Context->virtual_disk_tgt_id =
|
|
|
|
cpu_to_le16(device_id +
|
|
|
|
(MAX_PHYSICAL_DEVICES - 1));
|
|
|
|
pRAID_Context->config_seq_num =
|
|
|
|
pd_sync->seq[pd_index].seqNum;
|
|
|
|
io_request->DevHandle =
|
|
|
|
pd_sync->seq[pd_index].devHandle;
|
|
|
|
if (instance->adapter_type >= VENTURA_SERIES) {
|
|
|
|
io_request->RaidContext.raid_context_g35.routing_flags |=
|
|
|
|
(1 << MR_RAID_CTX_ROUTINGFLAGS_SQN_SHIFT);
|
|
|
|
io_request->RaidContext.raid_context_g35.nseg_type |=
|
|
|
|
(1 << RAID_CONTEXT_NSEG_SHIFT);
|
|
|
|
io_request->RaidContext.raid_context_g35.nseg_type |=
|
|
|
|
(MPI2_TYPE_CUDA << RAID_CONTEXT_TYPE_SHIFT);
|
|
|
|
} else {
|
|
|
|
pRAID_Context->type = MPI2_TYPE_CUDA;
|
|
|
|
pRAID_Context->nseg = 0x1;
|
|
|
|
pRAID_Context->reg_lock_flags |=
|
|
|
|
(MR_RL_FLAGS_SEQ_NUM_ENABLE |
|
|
|
|
MR_RL_FLAGS_GRANT_DESTINATION_CUDA);
|
|
|
|
}
|
2017-02-10 16:59:21 +08:00
|
|
|
} else {
|
2019-06-25 19:04:26 +08:00
|
|
|
pRAID_Context->virtual_disk_tgt_id =
|
|
|
|
cpu_to_le16(device_id +
|
|
|
|
(MAX_PHYSICAL_DEVICES - 1));
|
|
|
|
pRAID_Context->config_seq_num = 0;
|
|
|
|
io_request->DevHandle = cpu_to_le16(0xFFFF);
|
2017-02-10 16:59:21 +08:00
|
|
|
}
|
2015-08-31 19:53:11 +08:00
|
|
|
} else {
|
2017-01-11 07:20:48 +08:00
|
|
|
pRAID_Context->virtual_disk_tgt_id = cpu_to_le16(device_id);
|
|
|
|
pRAID_Context->config_seq_num = 0;
|
2019-06-25 19:04:26 +08:00
|
|
|
|
|
|
|
if (fusion->fast_path_io) {
|
|
|
|
local_map_ptr =
|
|
|
|
fusion->ld_drv_map[(instance->map_id & 1)];
|
|
|
|
io_request->DevHandle =
|
|
|
|
local_map_ptr->raidMap.devHndlInfo[device_id].curDevHdl;
|
|
|
|
} else {
|
|
|
|
io_request->DevHandle = cpu_to_le16(0xFFFF);
|
|
|
|
}
|
2015-08-31 19:53:11 +08:00
|
|
|
}
|
2015-04-23 19:00:39 +08:00
|
|
|
|
|
|
|
cmd->request_desc->SCSIIO.DevHandle = io_request->DevHandle;
|
|
|
|
|
2020-01-14 19:21:21 +08:00
|
|
|
megasas_get_msix_index(instance, scmd, cmd, 1);
|
2015-04-23 19:00:39 +08:00
|
|
|
|
|
|
|
if (!fp_possible) {
|
|
|
|
/* system pd firmware path */
|
2010-12-22 05:34:31 +08:00
|
|
|
io_request->Function = MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST;
|
|
|
|
cmd->request_desc->SCSIIO.RequestFlags =
|
|
|
|
(MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO <<
|
2015-04-23 19:00:39 +08:00
|
|
|
MEGASAS_REQ_DESCRIPT_FLAGS_TYPE_SHIFT);
|
2017-01-11 07:20:48 +08:00
|
|
|
pRAID_Context->timeout_value = cpu_to_le16(os_timeout_value);
|
|
|
|
pRAID_Context->virtual_disk_tgt_id = cpu_to_le16(device_id);
|
2015-04-23 19:00:39 +08:00
|
|
|
} else {
|
2018-04-06 17:02:11 +08:00
|
|
|
if (os_timeout_value)
|
|
|
|
os_timeout_value++;
|
|
|
|
|
2015-04-23 19:00:39 +08:00
|
|
|
/* system pd Fast Path */
|
|
|
|
io_request->Function = MPI2_FUNCTION_SCSI_IO_REQUEST;
|
|
|
|
timeout_limit = (scmd->device->type == TYPE_DISK) ?
|
|
|
|
255 : 0xFFFF;
|
2017-01-11 07:20:48 +08:00
|
|
|
pRAID_Context->timeout_value =
|
2015-04-23 19:00:39 +08:00
|
|
|
cpu_to_le16((os_timeout_value > timeout_limit) ?
|
|
|
|
timeout_limit : os_timeout_value);
|
2017-10-19 17:48:48 +08:00
|
|
|
if (instance->adapter_type >= INVADER_SERIES)
|
2015-04-23 19:00:39 +08:00
|
|
|
io_request->IoFlags |=
|
|
|
|
cpu_to_le16(MPI25_SAS_DEVICE0_FLAGS_ENABLED_FAST_PATH);
|
2016-10-21 21:33:35 +08:00
|
|
|
|
2015-04-23 19:00:39 +08:00
|
|
|
cmd->request_desc->SCSIIO.RequestFlags =
|
2016-01-28 23:34:24 +08:00
|
|
|
(MPI2_REQ_DESCRIPT_FLAGS_FP_IO <<
|
2015-04-23 19:00:39 +08:00
|
|
|
MEGASAS_REQ_DESCRIPT_FLAGS_TYPE_SHIFT);
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_build_io_fusion - Prepares IOs to devices
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @scp: SCSI command
|
|
|
|
* @cmd: Command to be prepared
|
|
|
|
*
|
|
|
|
* Invokes helper functions to prepare request frames
|
|
|
|
* and sets flags appropriate for IO/Non-IO cmd
|
|
|
|
*/
|
2019-07-23 22:34:50 +08:00
|
|
|
static int
|
2010-12-22 05:34:31 +08:00
|
|
|
megasas_build_io_fusion(struct megasas_instance *instance,
|
|
|
|
struct scsi_cmnd *scp,
|
|
|
|
struct megasas_cmd_fusion *cmd)
|
|
|
|
{
|
2017-02-10 16:59:12 +08:00
|
|
|
int sge_count;
|
2015-04-23 19:00:39 +08:00
|
|
|
u8 cmd_type;
|
2010-12-22 05:34:31 +08:00
|
|
|
struct MPI2_RAID_SCSI_IO_REQUEST *io_request = cmd->io_request;
|
2017-02-10 16:59:30 +08:00
|
|
|
struct MR_PRIV_DEVICE *mr_device_priv_data;
|
|
|
|
mr_device_priv_data = scp->device->hostdata;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
/* Zero out some fields so they don't get reused */
|
2014-06-25 21:27:36 +08:00
|
|
|
memset(io_request->LUN, 0x0, 8);
|
2010-12-22 05:34:31 +08:00
|
|
|
io_request->CDB.EEDP32.PrimaryReferenceTag = 0;
|
|
|
|
io_request->CDB.EEDP32.PrimaryApplicationTagMask = 0;
|
|
|
|
io_request->EEDPFlags = 0;
|
|
|
|
io_request->Control = 0;
|
|
|
|
io_request->EEDPBlockSize = 0;
|
2011-10-09 09:15:06 +08:00
|
|
|
io_request->ChainOffset = 0;
|
2017-01-11 07:20:48 +08:00
|
|
|
io_request->RaidContext.raid_context.raid_flags = 0;
|
|
|
|
io_request->RaidContext.raid_context.type = 0;
|
2017-01-11 07:20:46 +08:00
|
|
|
io_request->RaidContext.raid_context.nseg = 0;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
memcpy(io_request->CDB.CDB32, scp->cmnd, scp->cmd_len);
|
|
|
|
/*
|
|
|
|
* Just the CDB length,rest of the Flags are zero
|
|
|
|
* This will be modified for FP in build_ldio_fusion
|
|
|
|
*/
|
2013-09-06 18:20:52 +08:00
|
|
|
io_request->IoFlags = cpu_to_le16(scp->cmd_len);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2015-04-23 19:00:39 +08:00
|
|
|
switch (cmd_type = megasas_cmd_type(scp)) {
|
|
|
|
case READ_WRITE_LDIO:
|
2010-12-22 05:34:31 +08:00
|
|
|
megasas_build_ldio_fusion(instance, scp, cmd);
|
2015-04-23 19:00:39 +08:00
|
|
|
break;
|
|
|
|
case NON_READ_WRITE_LDIO:
|
|
|
|
megasas_build_ld_nonrw_fusion(instance, scp, cmd);
|
|
|
|
break;
|
|
|
|
case READ_WRITE_SYSPDIO:
|
2017-02-10 16:59:30 +08:00
|
|
|
megasas_build_syspd_fusion(instance, scp, cmd, true);
|
|
|
|
break;
|
2015-04-23 19:00:39 +08:00
|
|
|
case NON_READ_WRITE_SYSPDIO:
|
2017-02-10 16:59:30 +08:00
|
|
|
if (instance->secure_jbod_support ||
|
|
|
|
mr_device_priv_data->is_tm_capable)
|
|
|
|
megasas_build_syspd_fusion(instance, scp, cmd, false);
|
2015-04-23 19:00:39 +08:00
|
|
|
else
|
2017-02-10 16:59:30 +08:00
|
|
|
megasas_build_syspd_fusion(instance, scp, cmd, true);
|
2015-04-23 19:00:39 +08:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Construct SGL
|
|
|
|
*/
|
|
|
|
|
2017-02-10 16:59:12 +08:00
|
|
|
sge_count = megasas_make_sgl(instance, scp, cmd);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2017-02-10 16:59:12 +08:00
|
|
|
if (sge_count > instance->max_num_sge || (sge_count < 0)) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"%s %d sge_count (%d) is out of range. Range is: 0-%d\n",
|
|
|
|
__func__, __LINE__, sge_count, instance->max_num_sge);
|
2010-12-22 05:34:31 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2018-12-17 16:47:37 +08:00
|
|
|
if (instance->adapter_type >= VENTURA_SERIES) {
|
2017-02-10 16:59:21 +08:00
|
|
|
set_num_sge(&io_request->RaidContext.raid_context_g35, sge_count);
|
|
|
|
cpu_to_le16s(&io_request->RaidContext.raid_context_g35.routing_flags);
|
|
|
|
cpu_to_le16s(&io_request->RaidContext.raid_context_g35.nseg_type);
|
|
|
|
} else {
|
2017-01-11 07:20:48 +08:00
|
|
|
/* numSGE store lower 8 bit of sge_count.
|
|
|
|
* numSGEExt store higher 8 bit of sge_count
|
|
|
|
*/
|
|
|
|
io_request->RaidContext.raid_context.num_sge = sge_count;
|
|
|
|
io_request->RaidContext.raid_context.num_sge_ext =
|
|
|
|
(u8)(sge_count >> 8);
|
|
|
|
}
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2013-09-06 18:20:52 +08:00
|
|
|
io_request->SGLFlags = cpu_to_le16(MPI2_SGE_FLAGS_64_BIT_ADDRESSING);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2018-10-11 01:31:25 +08:00
|
|
|
if (scp->sc_data_direction == DMA_TO_DEVICE)
|
2013-09-06 18:20:52 +08:00
|
|
|
io_request->Control |= cpu_to_le32(MPI2_SCSIIO_CONTROL_WRITE);
|
2018-10-11 01:31:25 +08:00
|
|
|
else if (scp->sc_data_direction == DMA_FROM_DEVICE)
|
2013-09-06 18:20:52 +08:00
|
|
|
io_request->Control |= cpu_to_le32(MPI2_SCSIIO_CONTROL_READ);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
io_request->SGLOffset0 =
|
|
|
|
offsetof(struct MPI2_RAID_SCSI_IO_REQUEST, SGL) / 4;
|
|
|
|
|
2017-10-19 17:49:05 +08:00
|
|
|
io_request->SenseBufferLowAddress =
|
|
|
|
cpu_to_le32(lower_32_bits(cmd->sense_phys_addr));
|
2010-12-22 05:34:31 +08:00
|
|
|
io_request->SenseBufferLength = SCSI_SENSE_BUFFERSIZE;
|
|
|
|
|
|
|
|
cmd->scmd = scp;
|
|
|
|
scp->SCp.ptr = (char *)cmd;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-02-10 16:59:08 +08:00
|
|
|
static union MEGASAS_REQUEST_DESCRIPTOR_UNION *
|
2010-12-22 05:34:31 +08:00
|
|
|
megasas_get_request_descriptor(struct megasas_instance *instance, u16 index)
|
|
|
|
{
|
|
|
|
u8 *p;
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
2017-01-11 07:20:47 +08:00
|
|
|
p = fusion->req_frames_desc +
|
|
|
|
sizeof(union MEGASAS_REQUEST_DESCRIPTOR_UNION) * index;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
return (union MEGASAS_REQUEST_DESCRIPTOR_UNION *)p;
|
|
|
|
}
|
|
|
|
|
2017-01-11 07:20:47 +08:00
|
|
|
|
|
|
|
/* megasas_prepate_secondRaid1_IO
|
|
|
|
* It prepares the raid 1 second IO
|
|
|
|
*/
|
2019-07-23 22:34:50 +08:00
|
|
|
static void megasas_prepare_secondRaid1_IO(struct megasas_instance *instance,
|
|
|
|
struct megasas_cmd_fusion *cmd,
|
|
|
|
struct megasas_cmd_fusion *r1_cmd)
|
2017-01-11 07:20:47 +08:00
|
|
|
{
|
|
|
|
union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc, *req_desc2 = NULL;
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
req_desc = cmd->request_desc;
|
2017-02-10 16:59:03 +08:00
|
|
|
/* copy the io request frame as well as 8 SGEs data for r1 command*/
|
|
|
|
memcpy(r1_cmd->io_request, cmd->io_request,
|
|
|
|
(sizeof(struct MPI2_RAID_SCSI_IO_REQUEST)));
|
|
|
|
memcpy(&r1_cmd->io_request->SGL, &cmd->io_request->SGL,
|
|
|
|
(fusion->max_sge_in_main_msg * sizeof(union MPI2_SGE_IO_UNION)));
|
|
|
|
/*sense buffer is different for r1 command*/
|
|
|
|
r1_cmd->io_request->SenseBufferLowAddress =
|
2017-10-19 17:49:05 +08:00
|
|
|
cpu_to_le32(lower_32_bits(r1_cmd->sense_phys_addr));
|
2017-02-10 16:59:03 +08:00
|
|
|
r1_cmd->scmd = cmd->scmd;
|
|
|
|
req_desc2 = megasas_get_request_descriptor(instance,
|
|
|
|
(r1_cmd->index - 1));
|
|
|
|
req_desc2->Words = 0;
|
|
|
|
r1_cmd->request_desc = req_desc2;
|
|
|
|
req_desc2->SCSIIO.SMID = cpu_to_le16(r1_cmd->index);
|
|
|
|
req_desc2->SCSIIO.RequestFlags = req_desc->SCSIIO.RequestFlags;
|
|
|
|
r1_cmd->request_desc->SCSIIO.DevHandle = cmd->r1_alt_dev_handle;
|
|
|
|
r1_cmd->io_request->DevHandle = cmd->r1_alt_dev_handle;
|
|
|
|
r1_cmd->r1_alt_dev_handle = cmd->io_request->DevHandle;
|
2019-06-25 19:04:29 +08:00
|
|
|
cmd->io_request->RaidContext.raid_context_g35.flow_specific.peer_smid =
|
2017-02-10 16:59:03 +08:00
|
|
|
cpu_to_le16(r1_cmd->index);
|
2019-06-25 19:04:29 +08:00
|
|
|
r1_cmd->io_request->RaidContext.raid_context_g35.flow_specific.peer_smid =
|
2017-02-10 16:59:03 +08:00
|
|
|
cpu_to_le16(cmd->index);
|
|
|
|
/*MSIxIndex of both commands request descriptors should be same*/
|
|
|
|
r1_cmd->request_desc->SCSIIO.MSIxIndex =
|
|
|
|
cmd->request_desc->SCSIIO.MSIxIndex;
|
|
|
|
/*span arm is different for r1 cmd*/
|
|
|
|
r1_cmd->io_request->RaidContext.raid_context_g35.span_arm =
|
2017-01-11 07:20:47 +08:00
|
|
|
cmd->io_request->RaidContext.raid_context_g35.span_arm + 1;
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/**
|
|
|
|
* megasas_build_and_issue_cmd_fusion -Main routine for building and
|
|
|
|
* issuing non IOCTL cmd
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @scmd: pointer to scsi cmd from OS
|
|
|
|
*/
|
|
|
|
static u32
|
|
|
|
megasas_build_and_issue_cmd_fusion(struct megasas_instance *instance,
|
|
|
|
struct scsi_cmnd *scmd)
|
|
|
|
{
|
2017-01-11 07:20:47 +08:00
|
|
|
struct megasas_cmd_fusion *cmd, *r1_cmd = NULL;
|
2010-12-22 05:34:31 +08:00
|
|
|
union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc;
|
|
|
|
u32 index;
|
|
|
|
|
2016-01-28 23:34:30 +08:00
|
|
|
if ((megasas_cmd_type(scmd) == READ_WRITE_LDIO) &&
|
|
|
|
instance->ldio_threshold &&
|
|
|
|
(atomic_inc_return(&instance->ldio_outstanding) >
|
|
|
|
instance->ldio_threshold)) {
|
|
|
|
atomic_dec(&instance->ldio_outstanding);
|
|
|
|
return SCSI_MLQUEUE_DEVICE_BUSY;
|
|
|
|
}
|
|
|
|
|
2017-01-11 07:20:47 +08:00
|
|
|
if (atomic_inc_return(&instance->fw_outstanding) >
|
|
|
|
instance->host->can_queue) {
|
|
|
|
atomic_dec(&instance->fw_outstanding);
|
|
|
|
return SCSI_MLQUEUE_HOST_BUSY;
|
|
|
|
}
|
|
|
|
|
2015-04-23 19:01:24 +08:00
|
|
|
cmd = megasas_get_cmd_fusion(instance, scmd->request->tag);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2017-01-11 07:20:47 +08:00
|
|
|
if (!cmd) {
|
|
|
|
atomic_dec(&instance->fw_outstanding);
|
|
|
|
return SCSI_MLQUEUE_HOST_BUSY;
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
index = cmd->index;
|
|
|
|
|
|
|
|
req_desc = megasas_get_request_descriptor(instance, index-1);
|
|
|
|
|
|
|
|
req_desc->Words = 0;
|
|
|
|
cmd->request_desc = req_desc;
|
|
|
|
|
|
|
|
if (megasas_build_io_fusion(instance, scmd, cmd)) {
|
|
|
|
megasas_return_cmd_fusion(instance, cmd);
|
2015-07-08 04:52:34 +08:00
|
|
|
dev_err(&instance->pdev->dev, "Error building command\n");
|
2010-12-22 05:34:31 +08:00
|
|
|
cmd->request_desc = NULL;
|
2017-01-11 07:20:47 +08:00
|
|
|
atomic_dec(&instance->fw_outstanding);
|
2016-01-28 23:34:29 +08:00
|
|
|
return SCSI_MLQUEUE_HOST_BUSY;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
req_desc = cmd->request_desc;
|
2013-09-06 18:20:52 +08:00
|
|
|
req_desc->SCSIIO.SMID = cpu_to_le16(index);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
if (cmd->io_request->ChainOffset != 0 &&
|
|
|
|
cmd->io_request->ChainOffset != 0xF)
|
2015-07-08 04:52:34 +08:00
|
|
|
dev_err(&instance->pdev->dev, "The chain offset value is not "
|
2010-12-22 05:34:31 +08:00
|
|
|
"correct : %x\n", cmd->io_request->ChainOffset);
|
2017-01-11 07:20:47 +08:00
|
|
|
/*
|
|
|
|
* if it is raid 1/10 fp write capable.
|
|
|
|
* try to get second command from pool and construct it.
|
|
|
|
* From FW, it has confirmed that lba values of two PDs
|
|
|
|
* corresponds to single R1/10 LD are always same
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
/* driver side count always should be less than max_fw_cmds
|
|
|
|
* to get new command
|
|
|
|
*/
|
2017-02-10 16:59:03 +08:00
|
|
|
if (cmd->r1_alt_dev_handle != MR_DEVHANDLE_INVALID) {
|
2017-01-11 07:20:47 +08:00
|
|
|
r1_cmd = megasas_get_cmd_fusion(instance,
|
|
|
|
(scmd->request->tag + instance->max_fw_cmds));
|
|
|
|
megasas_prepare_secondRaid1_IO(instance, cmd, r1_cmd);
|
|
|
|
}
|
|
|
|
|
2017-02-10 16:59:01 +08:00
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/*
|
|
|
|
* Issue the command to the FW
|
|
|
|
*/
|
|
|
|
|
2017-02-10 16:59:04 +08:00
|
|
|
megasas_fire_cmd_fusion(instance, req_desc);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2017-02-10 16:59:01 +08:00
|
|
|
if (r1_cmd)
|
2017-02-10 16:59:04 +08:00
|
|
|
megasas_fire_cmd_fusion(instance, r1_cmd->request_desc);
|
2017-02-10 16:59:01 +08:00
|
|
|
|
2017-01-11 07:20:47 +08:00
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-02-10 16:59:03 +08:00
|
|
|
/**
|
|
|
|
* megasas_complete_r1_command -
|
|
|
|
* completes R1 FP write commands which has valid peer smid
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @cmd_fusion: MPT command frame
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
static inline void
|
|
|
|
megasas_complete_r1_command(struct megasas_instance *instance,
|
|
|
|
struct megasas_cmd_fusion *cmd)
|
|
|
|
{
|
|
|
|
u8 *sense, status, ex_status;
|
|
|
|
u32 data_length;
|
|
|
|
u16 peer_smid;
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
struct megasas_cmd_fusion *r1_cmd = NULL;
|
|
|
|
struct scsi_cmnd *scmd_local = NULL;
|
|
|
|
struct RAID_CONTEXT_G35 *rctx_g35;
|
|
|
|
|
|
|
|
rctx_g35 = &cmd->io_request->RaidContext.raid_context_g35;
|
|
|
|
fusion = instance->ctrl_context;
|
2019-06-25 19:04:29 +08:00
|
|
|
peer_smid = le16_to_cpu(rctx_g35->flow_specific.peer_smid);
|
2017-02-10 16:59:03 +08:00
|
|
|
|
|
|
|
r1_cmd = fusion->cmd_list[peer_smid - 1];
|
|
|
|
scmd_local = cmd->scmd;
|
|
|
|
status = rctx_g35->status;
|
|
|
|
ex_status = rctx_g35->ex_status;
|
|
|
|
data_length = cmd->io_request->DataLength;
|
|
|
|
sense = cmd->sense;
|
|
|
|
|
|
|
|
cmd->cmd_completed = true;
|
|
|
|
|
|
|
|
/* Check if peer command is completed or not*/
|
|
|
|
if (r1_cmd->cmd_completed) {
|
|
|
|
rctx_g35 = &r1_cmd->io_request->RaidContext.raid_context_g35;
|
|
|
|
if (rctx_g35->status != MFI_STAT_OK) {
|
|
|
|
status = rctx_g35->status;
|
|
|
|
ex_status = rctx_g35->ex_status;
|
|
|
|
data_length = r1_cmd->io_request->DataLength;
|
|
|
|
sense = r1_cmd->sense;
|
|
|
|
}
|
|
|
|
|
|
|
|
megasas_return_cmd_fusion(instance, r1_cmd);
|
|
|
|
map_cmd_status(fusion, scmd_local, status, ex_status,
|
|
|
|
le32_to_cpu(data_length), sense);
|
|
|
|
if (instance->ldio_threshold &&
|
|
|
|
megasas_cmd_type(scmd_local) == READ_WRITE_LDIO)
|
|
|
|
atomic_dec(&instance->ldio_outstanding);
|
|
|
|
scmd_local->SCp.ptr = NULL;
|
|
|
|
megasas_return_cmd_fusion(instance, cmd);
|
|
|
|
scsi_dma_unmap(scmd_local);
|
|
|
|
scmd_local->scsi_done(scmd_local);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/**
|
|
|
|
* complete_cmd_fusion - Completes command
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* Completes all commands that is in reply descriptor queue
|
|
|
|
*/
|
2019-07-26 21:55:40 +08:00
|
|
|
static int
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
complete_cmd_fusion(struct megasas_instance *instance, u32 MSIxIndex,
|
|
|
|
struct megasas_irq_context *irq_context)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
|
|
|
union MPI2_REPLY_DESCRIPTORS_UNION *desc;
|
|
|
|
struct MPI2_SCSI_IO_SUCCESS_REPLY_DESCRIPTOR *reply_desc;
|
|
|
|
struct MPI2_RAID_SCSI_IO_REQUEST *scsi_io_req;
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
struct megasas_cmd *cmd_mfi;
|
2017-02-10 16:59:03 +08:00
|
|
|
struct megasas_cmd_fusion *cmd_fusion;
|
2010-12-22 05:34:31 +08:00
|
|
|
u16 smid, num_completed;
|
2017-02-10 16:59:03 +08:00
|
|
|
u8 reply_descript_type, *sense, status, extStatus;
|
|
|
|
u32 device_id, data_length;
|
2010-12-22 05:34:31 +08:00
|
|
|
union desc_value d_val;
|
|
|
|
struct LD_LOAD_BALANCE_INFO *lbinfo;
|
2014-09-12 21:27:23 +08:00
|
|
|
int threshold_reply_count = 0;
|
2015-04-23 19:01:24 +08:00
|
|
|
struct scsi_cmnd *scmd_local = NULL;
|
2016-01-28 23:34:25 +08:00
|
|
|
struct MR_TASK_MANAGE_REQUEST *mr_tm_req;
|
|
|
|
struct MPI2_SCSI_TASK_MANAGE_REQUEST *mpi_tm_req;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
2016-01-28 23:34:32 +08:00
|
|
|
if (atomic_read(&instance->adprecovery) == MEGASAS_HW_CRITICAL_ERROR)
|
2010-12-22 05:34:31 +08:00
|
|
|
return IRQ_HANDLED;
|
|
|
|
|
2016-01-28 23:34:28 +08:00
|
|
|
desc = fusion->reply_frames_desc[MSIxIndex] +
|
|
|
|
fusion->last_reply_idx[MSIxIndex];
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
reply_desc = (struct MPI2_SCSI_IO_SUCCESS_REPLY_DESCRIPTOR *)desc;
|
|
|
|
|
|
|
|
d_val.word = desc->Words;
|
|
|
|
|
|
|
|
reply_descript_type = reply_desc->ReplyFlags &
|
|
|
|
MPI2_RPY_DESCRIPT_FLAGS_TYPE_MASK;
|
|
|
|
|
|
|
|
if (reply_descript_type == MPI2_RPY_DESCRIPT_FLAGS_UNUSED)
|
|
|
|
return IRQ_NONE;
|
|
|
|
|
|
|
|
num_completed = 0;
|
|
|
|
|
2015-04-23 19:03:09 +08:00
|
|
|
while (d_val.u.low != cpu_to_le32(UINT_MAX) &&
|
|
|
|
d_val.u.high != cpu_to_le32(UINT_MAX)) {
|
2017-01-11 07:20:47 +08:00
|
|
|
|
2013-09-06 18:20:52 +08:00
|
|
|
smid = le16_to_cpu(reply_desc->SMID);
|
2010-12-22 05:34:31 +08:00
|
|
|
cmd_fusion = fusion->cmd_list[smid - 1];
|
2017-02-10 16:59:03 +08:00
|
|
|
scsi_io_req = (struct MPI2_RAID_SCSI_IO_REQUEST *)
|
|
|
|
cmd_fusion->io_request;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2015-04-23 19:01:24 +08:00
|
|
|
scmd_local = cmd_fusion->scmd;
|
2017-01-11 07:20:46 +08:00
|
|
|
status = scsi_io_req->RaidContext.raid_context.status;
|
2017-01-11 07:20:48 +08:00
|
|
|
extStatus = scsi_io_req->RaidContext.raid_context.ex_status;
|
2017-01-11 07:20:47 +08:00
|
|
|
sense = cmd_fusion->sense;
|
|
|
|
data_length = scsi_io_req->DataLength;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
switch (scsi_io_req->Function) {
|
2016-01-28 23:34:25 +08:00
|
|
|
case MPI2_FUNCTION_SCSI_TASK_MGMT:
|
|
|
|
mr_tm_req = (struct MR_TASK_MANAGE_REQUEST *)
|
|
|
|
cmd_fusion->io_request;
|
|
|
|
mpi_tm_req = (struct MPI2_SCSI_TASK_MANAGE_REQUEST *)
|
|
|
|
&mr_tm_req->TmRequest;
|
|
|
|
dev_dbg(&instance->pdev->dev, "TM completion:"
|
|
|
|
"type: 0x%x TaskMID: 0x%x\n",
|
|
|
|
mpi_tm_req->TaskType, mpi_tm_req->TaskMID);
|
|
|
|
complete(&cmd_fusion->done);
|
|
|
|
break;
|
2010-12-22 05:34:31 +08:00
|
|
|
case MPI2_FUNCTION_SCSI_IO_REQUEST: /*Fast Path IO.*/
|
|
|
|
/* Update load balancing info */
|
2017-02-10 16:59:03 +08:00
|
|
|
if (fusion->load_balance_info &&
|
|
|
|
(cmd_fusion->scmd->SCp.Status &
|
|
|
|
MEGASAS_LOAD_BALANCE_FLAG)) {
|
|
|
|
device_id = MEGASAS_DEV_INDEX(scmd_local);
|
|
|
|
lbinfo = &fusion->load_balance_info[device_id];
|
2014-09-12 21:27:53 +08:00
|
|
|
atomic_dec(&lbinfo->scsi_pending_cmds[cmd_fusion->pd_r1_lb]);
|
2017-02-10 16:59:03 +08:00
|
|
|
cmd_fusion->scmd->SCp.Status &= ~MEGASAS_LOAD_BALANCE_FLAG;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
2018-11-28 12:32:34 +08:00
|
|
|
/* Fall through - and complete IO */
|
2010-12-22 05:34:31 +08:00
|
|
|
case MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST: /* LD-IO Path */
|
2017-02-10 16:59:03 +08:00
|
|
|
atomic_dec(&instance->fw_outstanding);
|
|
|
|
if (cmd_fusion->r1_alt_dev_handle == MR_DEVHANDLE_INVALID) {
|
2017-01-11 07:20:47 +08:00
|
|
|
map_cmd_status(fusion, scmd_local, status,
|
2017-02-10 16:59:03 +08:00
|
|
|
extStatus, le32_to_cpu(data_length),
|
|
|
|
sense);
|
|
|
|
if (instance->ldio_threshold &&
|
|
|
|
(megasas_cmd_type(scmd_local) == READ_WRITE_LDIO))
|
2017-01-11 07:20:51 +08:00
|
|
|
atomic_dec(&instance->ldio_outstanding);
|
2017-02-10 16:59:03 +08:00
|
|
|
scmd_local->SCp.ptr = NULL;
|
2017-01-11 07:20:47 +08:00
|
|
|
megasas_return_cmd_fusion(instance, cmd_fusion);
|
|
|
|
scsi_dma_unmap(scmd_local);
|
|
|
|
scmd_local->scsi_done(scmd_local);
|
2017-02-10 16:59:03 +08:00
|
|
|
} else /* Optimal VD - R1 FP command completion. */
|
|
|
|
megasas_complete_r1_command(instance, cmd_fusion);
|
2010-12-22 05:34:31 +08:00
|
|
|
break;
|
|
|
|
case MEGASAS_MPI2_FUNCTION_PASSTHRU_IO_REQUEST: /*MFI command */
|
|
|
|
cmd_mfi = instance->cmd_list[cmd_fusion->sync_cmd_idx];
|
2015-04-23 19:01:24 +08:00
|
|
|
/* Poll mode. Dummy free.
|
|
|
|
* In case of Interrupt mode, caller has reverse check.
|
|
|
|
*/
|
|
|
|
if (cmd_mfi->flags & DRV_DCMD_POLLED_MODE) {
|
|
|
|
cmd_mfi->flags &= ~DRV_DCMD_POLLED_MODE;
|
|
|
|
megasas_return_cmd(instance, cmd_mfi);
|
|
|
|
} else
|
|
|
|
megasas_complete_cmd(instance, cmd_mfi, DID_OK);
|
2010-12-22 05:34:31 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2011-10-09 09:15:13 +08:00
|
|
|
fusion->last_reply_idx[MSIxIndex]++;
|
|
|
|
if (fusion->last_reply_idx[MSIxIndex] >=
|
|
|
|
fusion->reply_q_depth)
|
|
|
|
fusion->last_reply_idx[MSIxIndex] = 0;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2015-04-23 19:03:09 +08:00
|
|
|
desc->Words = cpu_to_le64(ULLONG_MAX);
|
2010-12-22 05:34:31 +08:00
|
|
|
num_completed++;
|
2014-09-12 21:27:23 +08:00
|
|
|
threshold_reply_count++;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
/* Get the next reply descriptor */
|
2011-10-09 09:15:13 +08:00
|
|
|
if (!fusion->last_reply_idx[MSIxIndex])
|
2016-01-28 23:34:28 +08:00
|
|
|
desc = fusion->reply_frames_desc[MSIxIndex];
|
2010-12-22 05:34:31 +08:00
|
|
|
else
|
|
|
|
desc++;
|
|
|
|
|
|
|
|
reply_desc =
|
|
|
|
(struct MPI2_SCSI_IO_SUCCESS_REPLY_DESCRIPTOR *)desc;
|
|
|
|
|
|
|
|
d_val.word = desc->Words;
|
|
|
|
|
|
|
|
reply_descript_type = reply_desc->ReplyFlags &
|
|
|
|
MPI2_RPY_DESCRIPT_FLAGS_TYPE_MASK;
|
|
|
|
|
|
|
|
if (reply_descript_type == MPI2_RPY_DESCRIPT_FLAGS_UNUSED)
|
|
|
|
break;
|
2014-09-12 21:27:23 +08:00
|
|
|
/*
|
|
|
|
* Write to reply post host index register after completing threshold
|
|
|
|
* number of reply counts and still there are more replies in reply queue
|
|
|
|
* pending to be completed
|
|
|
|
*/
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
if (threshold_reply_count >= instance->threshold_reply_count) {
|
2017-01-11 07:20:44 +08:00
|
|
|
if (instance->msix_combined)
|
2014-09-12 21:27:23 +08:00
|
|
|
writel(((MSIxIndex & 0x7) << 24) |
|
|
|
|
fusion->last_reply_idx[MSIxIndex],
|
|
|
|
instance->reply_post_host_index_addr[MSIxIndex/8]);
|
|
|
|
else
|
|
|
|
writel((MSIxIndex << 24) |
|
|
|
|
fusion->last_reply_idx[MSIxIndex],
|
|
|
|
instance->reply_post_host_index_addr[0]);
|
|
|
|
threshold_reply_count = 0;
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
if (irq_context) {
|
|
|
|
if (!irq_context->irq_poll_scheduled) {
|
|
|
|
irq_context->irq_poll_scheduled = true;
|
2019-06-25 19:04:22 +08:00
|
|
|
irq_context->irq_line_enable = true;
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
irq_poll_sched(&irq_context->irqpoll);
|
|
|
|
}
|
|
|
|
return num_completed;
|
|
|
|
}
|
2014-09-12 21:27:23 +08:00
|
|
|
}
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
if (num_completed) {
|
|
|
|
wmb();
|
|
|
|
if (instance->msix_combined)
|
|
|
|
writel(((MSIxIndex & 0x7) << 24) |
|
|
|
|
fusion->last_reply_idx[MSIxIndex],
|
|
|
|
instance->reply_post_host_index_addr[MSIxIndex/8]);
|
|
|
|
else
|
|
|
|
writel((MSIxIndex << 24) |
|
|
|
|
fusion->last_reply_idx[MSIxIndex],
|
|
|
|
instance->reply_post_host_index_addr[0]);
|
|
|
|
megasas_check_and_restore_queue_depth(instance);
|
|
|
|
}
|
|
|
|
return num_completed;
|
|
|
|
}
|
2010-12-22 05:34:31 +08:00
|
|
|
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
/**
|
|
|
|
* megasas_enable_irq_poll() - enable irqpoll
|
|
|
|
*/
|
|
|
|
static void megasas_enable_irq_poll(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
u32 count, i;
|
|
|
|
struct megasas_irq_context *irq_ctx;
|
|
|
|
|
|
|
|
count = instance->msix_vectors > 0 ? instance->msix_vectors : 1;
|
|
|
|
|
|
|
|
for (i = 0; i < count; i++) {
|
|
|
|
irq_ctx = &instance->irq_context[i];
|
|
|
|
irq_poll_enable(&irq_ctx->irqpoll);
|
|
|
|
}
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
2017-02-10 16:59:34 +08:00
|
|
|
/**
|
|
|
|
* megasas_sync_irqs - Synchronizes all IRQs owned by adapter
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
*/
|
2019-07-26 21:55:40 +08:00
|
|
|
static void megasas_sync_irqs(unsigned long instance_addr)
|
2017-02-10 16:59:34 +08:00
|
|
|
{
|
|
|
|
u32 count, i;
|
|
|
|
struct megasas_instance *instance =
|
|
|
|
(struct megasas_instance *)instance_addr;
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
struct megasas_irq_context *irq_ctx;
|
2017-02-10 16:59:34 +08:00
|
|
|
|
|
|
|
count = instance->msix_vectors > 0 ? instance->msix_vectors : 1;
|
|
|
|
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
for (i = 0; i < count; i++) {
|
2017-02-10 16:59:34 +08:00
|
|
|
synchronize_irq(pci_irq_vector(instance->pdev, i));
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
irq_ctx = &instance->irq_context[i];
|
|
|
|
irq_poll_disable(&irq_ctx->irqpoll);
|
|
|
|
if (irq_ctx->irq_poll_scheduled) {
|
|
|
|
irq_ctx->irq_poll_scheduled = false;
|
|
|
|
enable_irq(irq_ctx->os_irq);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_irqpoll() - process a queue for completed reply descriptors
|
|
|
|
* @irqpoll: IRQ poll structure associated with queue to poll.
|
|
|
|
* @budget: Threshold of reply descriptors to process per poll.
|
|
|
|
*
|
|
|
|
* Return: The number of entries processed.
|
|
|
|
*/
|
|
|
|
|
|
|
|
int megasas_irqpoll(struct irq_poll *irqpoll, int budget)
|
|
|
|
{
|
|
|
|
struct megasas_irq_context *irq_ctx;
|
|
|
|
struct megasas_instance *instance;
|
|
|
|
int num_entries;
|
|
|
|
|
|
|
|
irq_ctx = container_of(irqpoll, struct megasas_irq_context, irqpoll);
|
|
|
|
instance = irq_ctx->instance;
|
|
|
|
|
2019-06-25 19:04:22 +08:00
|
|
|
if (irq_ctx->irq_line_enable) {
|
|
|
|
disable_irq(irq_ctx->os_irq);
|
|
|
|
irq_ctx->irq_line_enable = false;
|
|
|
|
}
|
|
|
|
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
num_entries = complete_cmd_fusion(instance, irq_ctx->MSIxIndex, irq_ctx);
|
|
|
|
if (num_entries < budget) {
|
|
|
|
irq_poll_complete(irqpoll);
|
|
|
|
irq_ctx->irq_poll_scheduled = false;
|
|
|
|
enable_irq(irq_ctx->os_irq);
|
|
|
|
}
|
|
|
|
|
|
|
|
return num_entries;
|
2017-02-10 16:59:34 +08:00
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/**
|
|
|
|
* megasas_complete_cmd_dpc_fusion - Completes command
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
*
|
|
|
|
* Tasklet to complete cmds
|
|
|
|
*/
|
2019-07-26 21:55:40 +08:00
|
|
|
static void
|
2010-12-22 05:34:31 +08:00
|
|
|
megasas_complete_cmd_dpc_fusion(unsigned long instance_addr)
|
|
|
|
{
|
|
|
|
struct megasas_instance *instance =
|
|
|
|
(struct megasas_instance *)instance_addr;
|
2011-10-09 09:15:13 +08:00
|
|
|
u32 count, MSIxIndex;
|
|
|
|
|
|
|
|
count = instance->msix_vectors > 0 ? instance->msix_vectors : 1;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
/* If we have already declared adapter dead, donot complete cmds */
|
2018-10-17 14:37:50 +08:00
|
|
|
if (atomic_read(&instance->adprecovery) == MEGASAS_HW_CRITICAL_ERROR)
|
2010-12-22 05:34:31 +08:00
|
|
|
return;
|
|
|
|
|
2011-10-09 09:15:13 +08:00
|
|
|
for (MSIxIndex = 0 ; MSIxIndex < count; MSIxIndex++)
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
complete_cmd_fusion(instance, MSIxIndex, NULL);
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_isr_fusion - isr entry point
|
|
|
|
*/
|
2019-07-26 21:55:40 +08:00
|
|
|
static irqreturn_t megasas_isr_fusion(int irq, void *devp)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
2011-10-09 09:15:13 +08:00
|
|
|
struct megasas_irq_context *irq_context = devp;
|
|
|
|
struct megasas_instance *instance = irq_context->instance;
|
2018-10-17 14:37:39 +08:00
|
|
|
u32 mfiStatus;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2013-05-22 15:04:14 +08:00
|
|
|
if (instance->mask_interrupts)
|
|
|
|
return IRQ_NONE;
|
|
|
|
|
2019-06-25 19:04:22 +08:00
|
|
|
#if defined(ENABLE_IRQ_POLL)
|
|
|
|
if (irq_context->irq_poll_scheduled)
|
|
|
|
return IRQ_HANDLED;
|
|
|
|
#endif
|
|
|
|
|
2011-10-09 09:15:13 +08:00
|
|
|
if (!instance->msix_vectors) {
|
2018-12-17 16:47:39 +08:00
|
|
|
mfiStatus = instance->instancet->clear_intr(instance);
|
2010-12-22 05:34:31 +08:00
|
|
|
if (!mfiStatus)
|
|
|
|
return IRQ_NONE;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* If we are resetting, bail */
|
2011-10-09 09:14:50 +08:00
|
|
|
if (test_bit(MEGASAS_FUSION_IN_RESET, &instance->reset_flags)) {
|
2018-12-17 16:47:39 +08:00
|
|
|
instance->instancet->clear_intr(instance);
|
2010-12-22 05:34:31 +08:00
|
|
|
return IRQ_HANDLED;
|
2011-10-09 09:14:50 +08:00
|
|
|
}
|
2010-12-22 05:34:31 +08:00
|
|
|
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
return complete_cmd_fusion(instance, irq_context->MSIxIndex, irq_context)
|
|
|
|
? IRQ_HANDLED : IRQ_NONE;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* build_mpt_mfi_pass_thru - builds a cmd fo MFI Pass thru
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* mfi_cmd: megasas_cmd pointer
|
|
|
|
*
|
|
|
|
*/
|
2019-07-26 21:55:40 +08:00
|
|
|
static void
|
2010-12-22 05:34:31 +08:00
|
|
|
build_mpt_mfi_pass_thru(struct megasas_instance *instance,
|
|
|
|
struct megasas_cmd *mfi_cmd)
|
|
|
|
{
|
|
|
|
struct MPI25_IEEE_SGE_CHAIN64 *mpi25_ieee_chain;
|
|
|
|
struct MPI2_RAID_SCSI_IO_REQUEST *io_req;
|
|
|
|
struct megasas_cmd_fusion *cmd;
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
struct megasas_header *frame_hdr = &mfi_cmd->frame->hdr;
|
|
|
|
|
2015-04-23 19:01:24 +08:00
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
|
|
|
cmd = megasas_get_cmd_fusion(instance,
|
|
|
|
instance->max_scsi_cmds + mfi_cmd->index);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
/* Save the smid. To be used for returning the cmd */
|
|
|
|
mfi_cmd->context.smid = cmd->index;
|
2014-09-12 21:27:58 +08:00
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/*
|
|
|
|
* For cmds where the flag is set, store the flag and check
|
|
|
|
* on completion. For cmds with this flag, don't call
|
|
|
|
* megasas_complete_cmd
|
|
|
|
*/
|
|
|
|
|
2014-11-17 17:54:28 +08:00
|
|
|
if (frame_hdr->flags & cpu_to_le16(MFI_FRAME_DONT_POST_IN_REPLY_QUEUE))
|
2015-04-23 19:01:24 +08:00
|
|
|
mfi_cmd->flags |= DRV_DCMD_POLLED_MODE;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
io_req = cmd->io_request;
|
2011-10-09 09:15:06 +08:00
|
|
|
|
2017-10-19 17:48:48 +08:00
|
|
|
if (instance->adapter_type >= INVADER_SERIES) {
|
2011-10-09 09:15:06 +08:00
|
|
|
struct MPI25_IEEE_SGE_CHAIN64 *sgl_ptr_end =
|
|
|
|
(struct MPI25_IEEE_SGE_CHAIN64 *)&io_req->SGL;
|
|
|
|
sgl_ptr_end += fusion->max_sge_in_main_msg - 1;
|
|
|
|
sgl_ptr_end->Flags = 0;
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
mpi25_ieee_chain =
|
|
|
|
(struct MPI25_IEEE_SGE_CHAIN64 *)&io_req->SGL.IeeeChain;
|
|
|
|
|
|
|
|
io_req->Function = MEGASAS_MPI2_FUNCTION_PASSTHRU_IO_REQUEST;
|
|
|
|
io_req->SGLOffset0 = offsetof(struct MPI2_RAID_SCSI_IO_REQUEST,
|
|
|
|
SGL) / 4;
|
|
|
|
io_req->ChainOffset = fusion->chain_offset_mfi_pthru;
|
|
|
|
|
2013-09-06 18:20:52 +08:00
|
|
|
mpi25_ieee_chain->Address = cpu_to_le64(mfi_cmd->frame_phys_addr);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
mpi25_ieee_chain->Flags = IEEE_SGE_FLAGS_CHAIN_ELEMENT |
|
|
|
|
MPI2_IEEE_SGE_FLAGS_IOCPLBNTA_ADDR;
|
|
|
|
|
2017-08-23 19:46:55 +08:00
|
|
|
mpi25_ieee_chain->Length = cpu_to_le32(instance->mfi_frame_size);
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* build_mpt_cmd - Calls helper function to build a cmd MFI Pass thru cmd
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @cmd: mfi cmd to build
|
|
|
|
*
|
|
|
|
*/
|
2019-07-26 21:55:40 +08:00
|
|
|
static union MEGASAS_REQUEST_DESCRIPTOR_UNION *
|
2010-12-22 05:34:31 +08:00
|
|
|
build_mpt_cmd(struct megasas_instance *instance, struct megasas_cmd *cmd)
|
|
|
|
{
|
2017-02-10 16:59:08 +08:00
|
|
|
union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc = NULL;
|
2010-12-22 05:34:31 +08:00
|
|
|
u16 index;
|
|
|
|
|
2017-02-10 16:59:32 +08:00
|
|
|
build_mpt_mfi_pass_thru(instance, cmd);
|
2010-12-22 05:34:31 +08:00
|
|
|
index = cmd->context.smid;
|
|
|
|
|
|
|
|
req_desc = megasas_get_request_descriptor(instance, index - 1);
|
|
|
|
|
|
|
|
req_desc->Words = 0;
|
|
|
|
req_desc->SCSIIO.RequestFlags = (MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO <<
|
|
|
|
MEGASAS_REQ_DESCRIPT_FLAGS_TYPE_SHIFT);
|
|
|
|
|
2013-09-06 18:20:52 +08:00
|
|
|
req_desc->SCSIIO.SMID = cpu_to_le16(index);
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
return req_desc;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_issue_dcmd_fusion - Issues a MFI Pass thru cmd
|
|
|
|
* @instance: Adapter soft state
|
|
|
|
* @cmd: mfi cmd pointer
|
|
|
|
*
|
|
|
|
*/
|
2019-07-26 21:55:40 +08:00
|
|
|
static void
|
2010-12-22 05:34:31 +08:00
|
|
|
megasas_issue_dcmd_fusion(struct megasas_instance *instance,
|
|
|
|
struct megasas_cmd *cmd)
|
|
|
|
{
|
|
|
|
union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc;
|
|
|
|
|
|
|
|
req_desc = build_mpt_cmd(instance, cmd);
|
2016-01-28 23:34:23 +08:00
|
|
|
|
2017-02-10 16:59:04 +08:00
|
|
|
megasas_fire_cmd_fusion(instance, req_desc);
|
2017-02-10 16:59:09 +08:00
|
|
|
return;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_release_fusion - Reverses the FW initialization
|
2015-03-03 18:58:07 +08:00
|
|
|
* @instance: Adapter soft state
|
2010-12-22 05:34:31 +08:00
|
|
|
*/
|
|
|
|
void
|
|
|
|
megasas_release_fusion(struct megasas_instance *instance)
|
|
|
|
{
|
2017-10-19 17:49:01 +08:00
|
|
|
megasas_free_ioc_init_cmd(instance);
|
2010-12-22 05:34:31 +08:00
|
|
|
megasas_free_cmds(instance);
|
|
|
|
megasas_free_cmds_fusion(instance);
|
|
|
|
|
|
|
|
iounmap(instance->reg_set);
|
|
|
|
|
2016-08-06 14:37:34 +08:00
|
|
|
pci_release_selected_regions(instance->pdev, 1<<instance->bar);
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_read_fw_status_reg_fusion - returns the current FW status value
|
|
|
|
* @regs: MFI register set
|
|
|
|
*/
|
|
|
|
static u32
|
2018-12-17 16:47:39 +08:00
|
|
|
megasas_read_fw_status_reg_fusion(struct megasas_instance *instance)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
2018-12-17 16:47:40 +08:00
|
|
|
return megasas_readl(instance, &instance->reg_set->outbound_scratch_pad_0);
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
/**
|
|
|
|
* megasas_alloc_host_crash_buffer - Host buffers for Crash dump collection from Firmware
|
|
|
|
* @instance: Controller's soft instance
|
|
|
|
* return: Number of allocated host crash buffers
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
megasas_alloc_host_crash_buffer(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
unsigned int i;
|
|
|
|
|
|
|
|
for (i = 0; i < MAX_CRASH_DUMP_SIZE; i++) {
|
2017-08-23 19:47:03 +08:00
|
|
|
instance->crash_buf[i] = vzalloc(CRASH_DMA_BUF_SIZE);
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
if (!instance->crash_buf[i]) {
|
|
|
|
dev_info(&instance->pdev->dev, "Firmware crash dump "
|
|
|
|
"memory allocation failed at index %d\n", i);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
instance->drv_buf_alloc = i;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_free_host_crash_buffer - Host buffers for Crash dump collection from Firmware
|
|
|
|
* @instance: Controller's soft instance
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
megasas_free_host_crash_buffer(struct megasas_instance *instance)
|
|
|
|
{
|
2017-08-23 19:47:03 +08:00
|
|
|
unsigned int i;
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
for (i = 0; i < instance->drv_buf_alloc; i++) {
|
|
|
|
if (instance->crash_buf[i])
|
2017-08-23 19:47:03 +08:00
|
|
|
vfree(instance->crash_buf[i]);
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
}
|
|
|
|
instance->drv_buf_index = 0;
|
|
|
|
instance->drv_buf_alloc = 0;
|
|
|
|
instance->fw_crash_state = UNAVAILABLE;
|
|
|
|
instance->fw_crash_buffer_size = 0;
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/**
|
|
|
|
* megasas_adp_reset_fusion - For controller reset
|
|
|
|
* @regs: MFI register set
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
megasas_adp_reset_fusion(struct megasas_instance *instance,
|
|
|
|
struct megasas_register_set __iomem *regs)
|
|
|
|
{
|
2015-08-31 19:53:41 +08:00
|
|
|
u32 host_diag, abs_state, retry;
|
|
|
|
|
|
|
|
/* Now try to reset the chip */
|
|
|
|
writel(MPI2_WRSEQ_FLUSH_KEY_VALUE, &instance->reg_set->fusion_seq_offset);
|
|
|
|
writel(MPI2_WRSEQ_1ST_KEY_VALUE, &instance->reg_set->fusion_seq_offset);
|
|
|
|
writel(MPI2_WRSEQ_2ND_KEY_VALUE, &instance->reg_set->fusion_seq_offset);
|
|
|
|
writel(MPI2_WRSEQ_3RD_KEY_VALUE, &instance->reg_set->fusion_seq_offset);
|
|
|
|
writel(MPI2_WRSEQ_4TH_KEY_VALUE, &instance->reg_set->fusion_seq_offset);
|
|
|
|
writel(MPI2_WRSEQ_5TH_KEY_VALUE, &instance->reg_set->fusion_seq_offset);
|
|
|
|
writel(MPI2_WRSEQ_6TH_KEY_VALUE, &instance->reg_set->fusion_seq_offset);
|
|
|
|
|
|
|
|
/* Check that the diag write enable (DRWE) bit is on */
|
2018-12-17 16:47:40 +08:00
|
|
|
host_diag = megasas_readl(instance, &instance->reg_set->fusion_host_diag);
|
2015-08-31 19:53:41 +08:00
|
|
|
retry = 0;
|
|
|
|
while (!(host_diag & HOST_DIAG_WRITE_ENABLE)) {
|
|
|
|
msleep(100);
|
2018-12-17 16:47:40 +08:00
|
|
|
host_diag = megasas_readl(instance,
|
|
|
|
&instance->reg_set->fusion_host_diag);
|
2015-08-31 19:53:41 +08:00
|
|
|
if (retry++ == 100) {
|
|
|
|
dev_warn(&instance->pdev->dev,
|
|
|
|
"Host diag unlock failed from %s %d\n",
|
|
|
|
__func__, __LINE__);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (!(host_diag & HOST_DIAG_WRITE_ENABLE))
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
/* Send chip reset command */
|
|
|
|
writel(host_diag | HOST_DIAG_RESET_ADAPTER,
|
|
|
|
&instance->reg_set->fusion_host_diag);
|
|
|
|
msleep(3000);
|
|
|
|
|
|
|
|
/* Make sure reset adapter bit is cleared */
|
2018-12-17 16:47:40 +08:00
|
|
|
host_diag = megasas_readl(instance, &instance->reg_set->fusion_host_diag);
|
2015-08-31 19:53:41 +08:00
|
|
|
retry = 0;
|
|
|
|
while (host_diag & HOST_DIAG_RESET_ADAPTER) {
|
|
|
|
msleep(100);
|
2018-12-17 16:47:40 +08:00
|
|
|
host_diag = megasas_readl(instance,
|
|
|
|
&instance->reg_set->fusion_host_diag);
|
2015-08-31 19:53:41 +08:00
|
|
|
if (retry++ == 1000) {
|
|
|
|
dev_warn(&instance->pdev->dev,
|
|
|
|
"Diag reset adapter never cleared %s %d\n",
|
|
|
|
__func__, __LINE__);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (host_diag & HOST_DIAG_RESET_ADAPTER)
|
|
|
|
return -1;
|
|
|
|
|
2018-12-17 16:47:39 +08:00
|
|
|
abs_state = instance->instancet->read_fw_status_reg(instance)
|
2015-08-31 19:53:41 +08:00
|
|
|
& MFI_STATE_MASK;
|
|
|
|
retry = 0;
|
|
|
|
|
|
|
|
while ((abs_state <= MFI_STATE_FW_INIT) && (retry++ < 1000)) {
|
|
|
|
msleep(100);
|
|
|
|
abs_state = instance->instancet->
|
2018-12-17 16:47:39 +08:00
|
|
|
read_fw_status_reg(instance) & MFI_STATE_MASK;
|
2015-08-31 19:53:41 +08:00
|
|
|
}
|
|
|
|
if (abs_state <= MFI_STATE_FW_INIT) {
|
|
|
|
dev_warn(&instance->pdev->dev,
|
|
|
|
"fw state < MFI_STATE_FW_INIT, state = 0x%x %s %d\n",
|
|
|
|
abs_state, __func__, __LINE__);
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_check_reset_fusion - For controller reset check
|
|
|
|
* @regs: MFI register set
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
megasas_check_reset_fusion(struct megasas_instance *instance,
|
|
|
|
struct megasas_register_set __iomem *regs)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-10-17 14:37:40 +08:00
|
|
|
/**
|
|
|
|
* megasas_trigger_snap_dump - Trigger snap dump in FW
|
|
|
|
* @instance: Soft instance of adapter
|
|
|
|
*/
|
|
|
|
static inline void megasas_trigger_snap_dump(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
int j;
|
2019-05-08 01:05:44 +08:00
|
|
|
u32 fw_state, abs_state;
|
2018-10-17 14:37:40 +08:00
|
|
|
|
|
|
|
if (!instance->disableOnlineCtrlReset) {
|
|
|
|
dev_info(&instance->pdev->dev, "Trigger snap dump\n");
|
|
|
|
writel(MFI_ADP_TRIGGER_SNAP_DUMP,
|
|
|
|
&instance->reg_set->doorbell);
|
|
|
|
readl(&instance->reg_set->doorbell);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (j = 0; j < instance->snapdump_wait_time; j++) {
|
2019-05-08 01:05:44 +08:00
|
|
|
abs_state = instance->instancet->read_fw_status_reg(instance);
|
|
|
|
fw_state = abs_state & MFI_STATE_MASK;
|
2018-10-17 14:37:40 +08:00
|
|
|
if (fw_state == MFI_STATE_FAULT) {
|
2019-05-08 01:05:44 +08:00
|
|
|
dev_printk(KERN_ERR, &instance->pdev->dev,
|
|
|
|
"FW in FAULT state Fault code:0x%x subcode:0x%x func:%s\n",
|
|
|
|
abs_state & MFI_STATE_FAULT_CODE,
|
|
|
|
abs_state & MFI_STATE_FAULT_SUBCODE, __func__);
|
2018-10-17 14:37:40 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
msleep(1000);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/* This function waits for outstanding commands on fusion to complete */
|
2019-07-26 21:55:40 +08:00
|
|
|
static int
|
|
|
|
megasas_wait_for_outstanding_fusion(struct megasas_instance *instance,
|
|
|
|
int reason, int *convert)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
2014-03-10 17:51:56 +08:00
|
|
|
int i, outstanding, retval = 0, hb_seconds_missed = 0;
|
2019-05-08 01:05:44 +08:00
|
|
|
u32 fw_state, abs_state;
|
2018-10-17 14:37:40 +08:00
|
|
|
u32 waittime_for_io_completion;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2018-10-17 14:37:40 +08:00
|
|
|
waittime_for_io_completion =
|
|
|
|
min_t(u32, resetwaittime,
|
|
|
|
(resetwaittime - instance->snapdump_wait_time));
|
|
|
|
|
|
|
|
if (reason == MFI_IO_TIMEOUT_OCR) {
|
|
|
|
dev_info(&instance->pdev->dev,
|
|
|
|
"MFI command is timed out\n");
|
|
|
|
megasas_complete_cmd_dpc_fusion((unsigned long)instance);
|
|
|
|
if (instance->snapdump_wait_time)
|
|
|
|
megasas_trigger_snap_dump(instance);
|
|
|
|
retval = 1;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0; i < waittime_for_io_completion; i++) {
|
2010-12-22 05:34:31 +08:00
|
|
|
/* Check if firmware is in fault state */
|
2019-05-08 01:05:44 +08:00
|
|
|
abs_state = instance->instancet->read_fw_status_reg(instance);
|
|
|
|
fw_state = abs_state & MFI_STATE_MASK;
|
2010-12-22 05:34:31 +08:00
|
|
|
if (fw_state == MFI_STATE_FAULT) {
|
2019-05-08 01:05:44 +08:00
|
|
|
dev_printk(KERN_ERR, &instance->pdev->dev,
|
|
|
|
"FW in FAULT state Fault code:0x%x subcode:0x%x func:%s\n",
|
|
|
|
abs_state & MFI_STATE_FAULT_CODE,
|
|
|
|
abs_state & MFI_STATE_FAULT_SUBCODE, __func__);
|
2016-04-15 15:23:31 +08:00
|
|
|
megasas_complete_cmd_dpc_fusion((unsigned long)instance);
|
2017-02-10 16:59:15 +08:00
|
|
|
if (instance->requestorId && reason) {
|
|
|
|
dev_warn(&instance->pdev->dev, "SR-IOV Found FW in FAULT"
|
|
|
|
" state while polling during"
|
|
|
|
" I/O timeout handling for %d\n",
|
|
|
|
instance->host->host_no);
|
|
|
|
*convert = 1;
|
|
|
|
}
|
|
|
|
|
2014-03-10 17:51:56 +08:00
|
|
|
retval = 1;
|
|
|
|
goto out;
|
|
|
|
}
|
2016-01-28 23:34:23 +08:00
|
|
|
|
|
|
|
|
2014-03-10 17:51:56 +08:00
|
|
|
/* If SR-IOV VF mode & heartbeat timeout, don't wait */
|
2016-01-28 23:34:23 +08:00
|
|
|
if (instance->requestorId && !reason) {
|
2010-12-22 05:34:31 +08:00
|
|
|
retval = 1;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2014-03-10 17:51:56 +08:00
|
|
|
/* If SR-IOV VF mode & I/O timeout, check for HB timeout */
|
2017-02-10 16:59:15 +08:00
|
|
|
if (instance->requestorId && (reason == SCSIIO_TIMEOUT_OCR)) {
|
2014-03-10 17:51:56 +08:00
|
|
|
if (instance->hb_host_mem->HB.fwCounter !=
|
|
|
|
instance->hb_host_mem->HB.driverCounter) {
|
|
|
|
instance->hb_host_mem->HB.driverCounter =
|
|
|
|
instance->hb_host_mem->HB.fwCounter;
|
|
|
|
hb_seconds_missed = 0;
|
|
|
|
} else {
|
|
|
|
hb_seconds_missed++;
|
|
|
|
if (hb_seconds_missed ==
|
|
|
|
(MEGASAS_SRIOV_HEARTBEAT_INTERVAL_VF/HZ)) {
|
2015-07-08 04:52:34 +08:00
|
|
|
dev_warn(&instance->pdev->dev, "SR-IOV:"
|
2014-03-10 17:51:56 +08:00
|
|
|
" Heartbeat never completed "
|
|
|
|
" while polling during I/O "
|
|
|
|
" timeout handling for "
|
|
|
|
"scsi%d.\n",
|
|
|
|
instance->host->host_no);
|
|
|
|
*convert = 1;
|
|
|
|
retval = 1;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-08-23 19:46:58 +08:00
|
|
|
megasas_complete_cmd_dpc_fusion((unsigned long)instance);
|
2010-12-22 05:34:31 +08:00
|
|
|
outstanding = atomic_read(&instance->fw_outstanding);
|
|
|
|
if (!outstanding)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (!(i % MEGASAS_RESET_NOTICE_INTERVAL)) {
|
2015-07-08 04:52:34 +08:00
|
|
|
dev_notice(&instance->pdev->dev, "[%2d]waiting for %d "
|
2014-03-10 17:51:56 +08:00
|
|
|
"commands to complete for scsi%d\n", i,
|
|
|
|
outstanding, instance->host->host_no);
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
msleep(1000);
|
|
|
|
}
|
|
|
|
|
2018-10-17 14:37:40 +08:00
|
|
|
if (instance->snapdump_wait_time) {
|
|
|
|
megasas_trigger_snap_dump(instance);
|
|
|
|
retval = 1;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
if (atomic_read(&instance->fw_outstanding)) {
|
2015-07-08 04:52:34 +08:00
|
|
|
dev_err(&instance->pdev->dev, "pending commands remain after waiting, "
|
2014-03-10 17:51:56 +08:00
|
|
|
"will reset adapter scsi%d.\n",
|
|
|
|
instance->host->host_no);
|
2016-10-21 21:33:29 +08:00
|
|
|
*convert = 1;
|
2010-12-22 05:34:31 +08:00
|
|
|
retval = 1;
|
|
|
|
}
|
2018-10-17 14:37:40 +08:00
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
out:
|
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
|
|
|
void megasas_reset_reply_desc(struct megasas_instance *instance)
|
|
|
|
{
|
2016-01-28 23:34:28 +08:00
|
|
|
int i, j, count;
|
2010-12-22 05:34:31 +08:00
|
|
|
struct fusion_context *fusion;
|
|
|
|
union MPI2_REPLY_DESCRIPTORS_UNION *reply_desc;
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
2011-10-09 09:15:13 +08:00
|
|
|
count = instance->msix_vectors > 0 ? instance->msix_vectors : 1;
|
2016-01-28 23:34:28 +08:00
|
|
|
for (i = 0 ; i < count ; i++) {
|
2011-10-09 09:15:13 +08:00
|
|
|
fusion->last_reply_idx[i] = 0;
|
2016-01-28 23:34:28 +08:00
|
|
|
reply_desc = fusion->reply_frames_desc[i];
|
|
|
|
for (j = 0 ; j < fusion->reply_q_depth; j++, reply_desc++)
|
|
|
|
reply_desc->Words = cpu_to_le64(ULLONG_MAX);
|
|
|
|
}
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
2015-04-23 19:00:24 +08:00
|
|
|
/*
|
|
|
|
* megasas_refire_mgmt_cmd : Re-fire management commands
|
|
|
|
* @instance: Controller's soft instance
|
|
|
|
*/
|
2020-01-14 19:21:20 +08:00
|
|
|
void megasas_refire_mgmt_cmd(struct megasas_instance *instance,
|
|
|
|
bool return_ioctl)
|
2015-04-23 19:00:24 +08:00
|
|
|
{
|
|
|
|
int j;
|
|
|
|
struct megasas_cmd_fusion *cmd_fusion;
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
struct megasas_cmd *cmd_mfi;
|
|
|
|
union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc;
|
|
|
|
u16 smid;
|
2016-01-28 23:34:23 +08:00
|
|
|
bool refire_cmd = 0;
|
2018-01-05 21:27:47 +08:00
|
|
|
u8 result;
|
|
|
|
u32 opcode = 0;
|
2015-04-23 19:00:24 +08:00
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
|
|
|
/* Re-fire management commands.
|
|
|
|
* Do not traverse complet MPT frame pool. Start from max_scsi_cmds.
|
|
|
|
*/
|
|
|
|
for (j = instance->max_scsi_cmds ; j < instance->max_fw_cmds; j++) {
|
|
|
|
cmd_fusion = fusion->cmd_list[j];
|
|
|
|
cmd_mfi = instance->cmd_list[cmd_fusion->sync_cmd_idx];
|
|
|
|
smid = le16_to_cpu(cmd_mfi->context.smid);
|
2018-01-05 21:27:47 +08:00
|
|
|
result = REFIRE_CMD;
|
2015-04-23 19:00:24 +08:00
|
|
|
|
|
|
|
if (!smid)
|
|
|
|
continue;
|
2017-08-23 19:46:59 +08:00
|
|
|
|
2018-01-05 21:27:47 +08:00
|
|
|
req_desc = megasas_get_request_descriptor(instance, smid - 1);
|
|
|
|
|
|
|
|
switch (cmd_mfi->frame->hdr.cmd) {
|
|
|
|
case MFI_CMD_DCMD:
|
|
|
|
opcode = le32_to_cpu(cmd_mfi->frame->dcmd.opcode);
|
|
|
|
/* Do not refire shutdown command */
|
|
|
|
if (opcode == MR_DCMD_CTRL_SHUTDOWN) {
|
|
|
|
cmd_mfi->frame->dcmd.cmd_status = MFI_STAT_OK;
|
|
|
|
result = COMPLETE_CMD;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
refire_cmd = ((opcode != MR_DCMD_LD_MAP_GET_INFO)) &&
|
|
|
|
(opcode != MR_DCMD_SYSTEM_PD_MAP_GET_INFO) &&
|
|
|
|
!(cmd_mfi->flags & DRV_DCMD_SKIP_REFIRE);
|
|
|
|
|
|
|
|
if (!refire_cmd)
|
|
|
|
result = RETURN_CMD;
|
|
|
|
|
|
|
|
break;
|
2018-01-05 21:33:04 +08:00
|
|
|
case MFI_CMD_NVME:
|
|
|
|
if (!instance->support_nvme_passthru) {
|
|
|
|
cmd_mfi->frame->hdr.cmd_status = MFI_STAT_INVALID_CMD;
|
|
|
|
result = COMPLETE_CMD;
|
|
|
|
}
|
2018-01-05 21:27:47 +08:00
|
|
|
|
2019-06-25 19:04:30 +08:00
|
|
|
break;
|
|
|
|
case MFI_CMD_TOOLBOX:
|
|
|
|
if (!instance->support_pci_lane_margining) {
|
|
|
|
cmd_mfi->frame->hdr.cmd_status = MFI_STAT_INVALID_CMD;
|
|
|
|
result = COMPLETE_CMD;
|
|
|
|
}
|
|
|
|
|
2018-01-05 21:33:04 +08:00
|
|
|
break;
|
2018-01-05 21:27:47 +08:00
|
|
|
default:
|
|
|
|
break;
|
2017-08-23 19:46:59 +08:00
|
|
|
}
|
|
|
|
|
2020-01-14 19:21:20 +08:00
|
|
|
if (return_ioctl && cmd_mfi->sync_cmd &&
|
|
|
|
cmd_mfi->frame->hdr.cmd != MFI_CMD_ABORT) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"return -EBUSY from %s %d cmd 0x%x opcode 0x%x\n",
|
|
|
|
__func__, __LINE__, cmd_mfi->frame->hdr.cmd,
|
|
|
|
le32_to_cpu(cmd_mfi->frame->dcmd.opcode));
|
|
|
|
cmd_mfi->cmd_status_drv = DCMD_BUSY;
|
|
|
|
result = COMPLETE_CMD;
|
|
|
|
}
|
|
|
|
|
2018-01-05 21:27:47 +08:00
|
|
|
switch (result) {
|
|
|
|
case REFIRE_CMD:
|
2017-02-10 16:59:04 +08:00
|
|
|
megasas_fire_cmd_fusion(instance, req_desc);
|
2018-01-05 21:27:47 +08:00
|
|
|
break;
|
|
|
|
case RETURN_CMD:
|
2015-04-23 19:00:24 +08:00
|
|
|
megasas_return_cmd(instance, cmd_mfi);
|
2018-01-05 21:27:47 +08:00
|
|
|
break;
|
|
|
|
case COMPLETE_CMD:
|
|
|
|
megasas_complete_cmd(instance, cmd_mfi, DID_OK);
|
|
|
|
break;
|
|
|
|
}
|
2015-04-23 19:00:24 +08:00
|
|
|
}
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
2020-01-14 19:21:20 +08:00
|
|
|
/*
|
|
|
|
* megasas_return_polled_cmds: Return polled mode commands back to the pool
|
|
|
|
* before initiating an OCR.
|
|
|
|
* @instance: Controller's soft instance
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
megasas_return_polled_cmds(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct megasas_cmd_fusion *cmd_fusion;
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
struct megasas_cmd *cmd_mfi;
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
|
|
|
for (i = instance->max_scsi_cmds; i < instance->max_fw_cmds; i++) {
|
|
|
|
cmd_fusion = fusion->cmd_list[i];
|
|
|
|
cmd_mfi = instance->cmd_list[cmd_fusion->sync_cmd_idx];
|
|
|
|
|
|
|
|
if (cmd_mfi->flags & DRV_DCMD_POLLED_MODE) {
|
|
|
|
if (megasas_dbg_lvl & OCR_DEBUG)
|
|
|
|
dev_info(&instance->pdev->dev,
|
|
|
|
"%s %d return cmd 0x%x opcode 0x%x\n",
|
|
|
|
__func__, __LINE__, cmd_mfi->frame->hdr.cmd,
|
|
|
|
le32_to_cpu(cmd_mfi->frame->dcmd.opcode));
|
|
|
|
cmd_mfi->flags &= ~DRV_DCMD_POLLED_MODE;
|
|
|
|
megasas_return_cmd(instance, cmd_mfi);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-01-28 23:34:25 +08:00
|
|
|
/*
|
|
|
|
* megasas_track_scsiio : Track SCSI IOs outstanding to a SCSI device
|
|
|
|
* @instance: per adapter struct
|
|
|
|
* @channel: the channel assigned by the OS
|
|
|
|
* @id: the id assigned by the OS
|
|
|
|
*
|
|
|
|
* Returns SUCCESS if no IOs pending to SCSI device, else return FAILED
|
|
|
|
*/
|
|
|
|
|
|
|
|
static int megasas_track_scsiio(struct megasas_instance *instance,
|
|
|
|
int id, int channel)
|
|
|
|
{
|
|
|
|
int i, found = 0;
|
|
|
|
struct megasas_cmd_fusion *cmd_fusion;
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
|
|
|
for (i = 0 ; i < instance->max_scsi_cmds; i++) {
|
|
|
|
cmd_fusion = fusion->cmd_list[i];
|
|
|
|
if (cmd_fusion->scmd &&
|
|
|
|
(cmd_fusion->scmd->device->id == id &&
|
|
|
|
cmd_fusion->scmd->device->channel == channel)) {
|
|
|
|
dev_info(&instance->pdev->dev,
|
|
|
|
"SCSI commands pending to target"
|
|
|
|
"channel %d id %d \tSMID: 0x%x\n",
|
|
|
|
channel, id, cmd_fusion->index);
|
|
|
|
scsi_print_command(cmd_fusion->scmd);
|
|
|
|
found = 1;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return found ? FAILED : SUCCESS;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_tm_response_code - translation of device response code
|
|
|
|
* @ioc: per adapter object
|
|
|
|
* @mpi_reply: MPI reply returned by firmware
|
|
|
|
*
|
|
|
|
* Return nothing.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
megasas_tm_response_code(struct megasas_instance *instance,
|
|
|
|
struct MPI2_SCSI_TASK_MANAGE_REPLY *mpi_reply)
|
|
|
|
{
|
|
|
|
char *desc;
|
|
|
|
|
|
|
|
switch (mpi_reply->ResponseCode) {
|
|
|
|
case MPI2_SCSITASKMGMT_RSP_TM_COMPLETE:
|
|
|
|
desc = "task management request completed";
|
|
|
|
break;
|
|
|
|
case MPI2_SCSITASKMGMT_RSP_INVALID_FRAME:
|
|
|
|
desc = "invalid frame";
|
|
|
|
break;
|
|
|
|
case MPI2_SCSITASKMGMT_RSP_TM_NOT_SUPPORTED:
|
|
|
|
desc = "task management request not supported";
|
|
|
|
break;
|
|
|
|
case MPI2_SCSITASKMGMT_RSP_TM_FAILED:
|
|
|
|
desc = "task management request failed";
|
|
|
|
break;
|
|
|
|
case MPI2_SCSITASKMGMT_RSP_TM_SUCCEEDED:
|
|
|
|
desc = "task management request succeeded";
|
|
|
|
break;
|
|
|
|
case MPI2_SCSITASKMGMT_RSP_TM_INVALID_LUN:
|
|
|
|
desc = "invalid lun";
|
|
|
|
break;
|
|
|
|
case 0xA:
|
|
|
|
desc = "overlapped tag attempted";
|
|
|
|
break;
|
|
|
|
case MPI2_SCSITASKMGMT_RSP_IO_QUEUED_ON_IOC:
|
|
|
|
desc = "task queued, however not sent to target";
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
desc = "unknown";
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
dev_dbg(&instance->pdev->dev, "response_code(%01x): %s\n",
|
|
|
|
mpi_reply->ResponseCode, desc);
|
|
|
|
dev_dbg(&instance->pdev->dev,
|
|
|
|
"TerminationCount/DevHandle/Function/TaskType/IOCStat/IOCLoginfo"
|
|
|
|
" 0x%x/0x%x/0x%x/0x%x/0x%x/0x%x\n",
|
|
|
|
mpi_reply->TerminationCount, mpi_reply->DevHandle,
|
|
|
|
mpi_reply->Function, mpi_reply->TaskType,
|
|
|
|
mpi_reply->IOCStatus, mpi_reply->IOCLogInfo);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* megasas_issue_tm - main routine for sending tm requests
|
|
|
|
* @instance: per adapter struct
|
|
|
|
* @device_handle: device handle
|
|
|
|
* @channel: the channel assigned by the OS
|
|
|
|
* @id: the id assigned by the OS
|
|
|
|
* @type: MPI2_SCSITASKMGMT_TASKTYPE__XXX (defined in megaraid_sas_fusion.c)
|
|
|
|
* @smid_task: smid assigned to the task
|
|
|
|
* @m_type: TM_MUTEX_ON or TM_MUTEX_OFF
|
|
|
|
* Context: user
|
|
|
|
*
|
|
|
|
* MegaRaid use MPT interface for Task Magement request.
|
|
|
|
* A generic API for sending task management requests to firmware.
|
|
|
|
*
|
|
|
|
* Return SUCCESS or FAILED.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
megasas_issue_tm(struct megasas_instance *instance, u16 device_handle,
|
2018-06-04 18:45:12 +08:00
|
|
|
uint channel, uint id, u16 smid_task, u8 type,
|
|
|
|
struct MR_PRIV_DEVICE *mr_device_priv_data)
|
2016-01-28 23:34:25 +08:00
|
|
|
{
|
|
|
|
struct MR_TASK_MANAGE_REQUEST *mr_request;
|
|
|
|
struct MPI2_SCSI_TASK_MANAGE_REQUEST *mpi_request;
|
|
|
|
unsigned long timeleft;
|
|
|
|
struct megasas_cmd_fusion *cmd_fusion;
|
|
|
|
struct megasas_cmd *cmd_mfi;
|
|
|
|
union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc;
|
2017-08-23 19:47:02 +08:00
|
|
|
struct fusion_context *fusion = NULL;
|
2016-01-28 23:34:25 +08:00
|
|
|
struct megasas_cmd_fusion *scsi_lookup;
|
|
|
|
int rc;
|
2018-06-04 18:45:12 +08:00
|
|
|
int timeout = MEGASAS_DEFAULT_TM_TIMEOUT;
|
2016-01-28 23:34:25 +08:00
|
|
|
struct MPI2_SCSI_TASK_MANAGE_REPLY *mpi_reply;
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
|
|
|
cmd_mfi = megasas_get_cmd(instance);
|
|
|
|
|
|
|
|
if (!cmd_mfi) {
|
|
|
|
dev_err(&instance->pdev->dev, "Failed from %s %d\n",
|
|
|
|
__func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
cmd_fusion = megasas_get_cmd_fusion(instance,
|
|
|
|
instance->max_scsi_cmds + cmd_mfi->index);
|
|
|
|
|
|
|
|
/* Save the smid. To be used for returning the cmd */
|
|
|
|
cmd_mfi->context.smid = cmd_fusion->index;
|
|
|
|
|
|
|
|
req_desc = megasas_get_request_descriptor(instance,
|
|
|
|
(cmd_fusion->index - 1));
|
|
|
|
|
|
|
|
cmd_fusion->request_desc = req_desc;
|
|
|
|
req_desc->Words = 0;
|
|
|
|
|
|
|
|
mr_request = (struct MR_TASK_MANAGE_REQUEST *) cmd_fusion->io_request;
|
|
|
|
memset(mr_request, 0, sizeof(struct MR_TASK_MANAGE_REQUEST));
|
|
|
|
mpi_request = (struct MPI2_SCSI_TASK_MANAGE_REQUEST *) &mr_request->TmRequest;
|
|
|
|
mpi_request->Function = MPI2_FUNCTION_SCSI_TASK_MGMT;
|
|
|
|
mpi_request->DevHandle = cpu_to_le16(device_handle);
|
|
|
|
mpi_request->TaskType = type;
|
|
|
|
mpi_request->TaskMID = cpu_to_le16(smid_task);
|
|
|
|
mpi_request->LUN[1] = 0;
|
|
|
|
|
|
|
|
|
|
|
|
req_desc = cmd_fusion->request_desc;
|
|
|
|
req_desc->HighPriority.SMID = cpu_to_le16(cmd_fusion->index);
|
|
|
|
req_desc->HighPriority.RequestFlags =
|
|
|
|
(MPI2_REQ_DESCRIPT_FLAGS_HIGH_PRIORITY <<
|
|
|
|
MEGASAS_REQ_DESCRIPT_FLAGS_TYPE_SHIFT);
|
|
|
|
req_desc->HighPriority.MSIxIndex = 0;
|
|
|
|
req_desc->HighPriority.LMID = 0;
|
|
|
|
req_desc->HighPriority.Reserved1 = 0;
|
|
|
|
|
|
|
|
if (channel < MEGASAS_MAX_PD_CHANNELS)
|
|
|
|
mr_request->tmReqFlags.isTMForPD = 1;
|
|
|
|
else
|
|
|
|
mr_request->tmReqFlags.isTMForLD = 1;
|
|
|
|
|
|
|
|
init_completion(&cmd_fusion->done);
|
2017-02-10 16:59:04 +08:00
|
|
|
megasas_fire_cmd_fusion(instance, req_desc);
|
2016-01-28 23:34:25 +08:00
|
|
|
|
2018-06-04 18:45:12 +08:00
|
|
|
switch (type) {
|
|
|
|
case MPI2_SCSITASKMGMT_TASKTYPE_ABORT_TASK:
|
|
|
|
timeout = mr_device_priv_data->task_abort_tmo;
|
|
|
|
break;
|
|
|
|
case MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET:
|
|
|
|
timeout = mr_device_priv_data->target_reset_tmo;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
timeleft = wait_for_completion_timeout(&cmd_fusion->done, timeout * HZ);
|
2016-01-28 23:34:25 +08:00
|
|
|
|
|
|
|
if (!timeleft) {
|
|
|
|
dev_err(&instance->pdev->dev,
|
|
|
|
"task mgmt type 0x%x timed out\n", type);
|
|
|
|
cmd_mfi->flags |= DRV_DCMD_SKIP_REFIRE;
|
|
|
|
mutex_unlock(&instance->reset_mutex);
|
|
|
|
rc = megasas_reset_fusion(instance->host, MFI_IO_TIMEOUT_OCR);
|
|
|
|
mutex_lock(&instance->reset_mutex);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
mpi_reply = (struct MPI2_SCSI_TASK_MANAGE_REPLY *) &mr_request->TMReply;
|
|
|
|
megasas_tm_response_code(instance, mpi_reply);
|
|
|
|
|
|
|
|
megasas_return_cmd(instance, cmd_mfi);
|
|
|
|
rc = SUCCESS;
|
|
|
|
switch (type) {
|
|
|
|
case MPI2_SCSITASKMGMT_TASKTYPE_ABORT_TASK:
|
2017-08-23 19:47:02 +08:00
|
|
|
scsi_lookup = fusion->cmd_list[smid_task - 1];
|
|
|
|
|
2016-01-28 23:34:25 +08:00
|
|
|
if (scsi_lookup->scmd == NULL)
|
|
|
|
break;
|
|
|
|
else {
|
|
|
|
instance->instancet->disable_intr(instance);
|
2017-02-10 16:59:34 +08:00
|
|
|
megasas_sync_irqs((unsigned long)instance);
|
2016-01-28 23:34:25 +08:00
|
|
|
instance->instancet->enable_intr(instance);
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
megasas_enable_irq_poll(instance);
|
2016-01-28 23:34:25 +08:00
|
|
|
if (scsi_lookup->scmd == NULL)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
rc = FAILED;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET:
|
|
|
|
if ((channel == 0xFFFFFFFF) && (id == 0xFFFFFFFF))
|
|
|
|
break;
|
|
|
|
instance->instancet->disable_intr(instance);
|
2017-08-23 19:46:57 +08:00
|
|
|
megasas_sync_irqs((unsigned long)instance);
|
2016-01-28 23:34:25 +08:00
|
|
|
rc = megasas_track_scsiio(instance, id, channel);
|
|
|
|
instance->instancet->enable_intr(instance);
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
megasas_enable_irq_poll(instance);
|
2016-01-28 23:34:25 +08:00
|
|
|
|
|
|
|
break;
|
|
|
|
case MPI2_SCSITASKMGMT_TASKTYPE_ABRT_TASK_SET:
|
|
|
|
case MPI2_SCSITASKMGMT_TASKTYPE_QUERY_TASK:
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
rc = FAILED;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return rc;
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* megasas_fusion_smid_lookup : Look for fusion command correpspodning to SCSI
|
|
|
|
* @instance: per adapter struct
|
|
|
|
*
|
|
|
|
* Return Non Zero index, if SMID found in outstanding commands
|
|
|
|
*/
|
|
|
|
static u16 megasas_fusion_smid_lookup(struct scsi_cmnd *scmd)
|
|
|
|
{
|
|
|
|
int i, ret = 0;
|
|
|
|
struct megasas_instance *instance;
|
|
|
|
struct megasas_cmd_fusion *cmd_fusion;
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
|
|
|
|
instance = (struct megasas_instance *)scmd->device->host->hostdata;
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
|
|
|
for (i = 0; i < instance->max_scsi_cmds; i++) {
|
|
|
|
cmd_fusion = fusion->cmd_list[i];
|
|
|
|
if (cmd_fusion->scmd && (cmd_fusion->scmd == scmd)) {
|
|
|
|
scmd_printk(KERN_NOTICE, scmd, "Abort request is for"
|
|
|
|
" SMID: %d\n", cmd_fusion->index);
|
|
|
|
ret = cmd_fusion->index;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* megasas_get_tm_devhandle - Get devhandle for TM request
|
|
|
|
* @sdev- OS provided scsi device
|
|
|
|
*
|
|
|
|
* Returns- devhandle/targetID of SCSI device
|
|
|
|
*/
|
|
|
|
static u16 megasas_get_tm_devhandle(struct scsi_device *sdev)
|
|
|
|
{
|
|
|
|
u16 pd_index = 0;
|
|
|
|
u32 device_id;
|
|
|
|
struct megasas_instance *instance;
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
struct MR_PD_CFG_SEQ_NUM_SYNC *pd_sync;
|
|
|
|
u16 devhandle = (u16)ULONG_MAX;
|
|
|
|
|
|
|
|
instance = (struct megasas_instance *)sdev->host->hostdata;
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
2017-02-10 16:59:05 +08:00
|
|
|
if (!MEGASAS_IS_LOGICAL(sdev)) {
|
2016-01-28 23:34:25 +08:00
|
|
|
if (instance->use_seqnum_jbod_fp) {
|
2017-02-10 16:59:05 +08:00
|
|
|
pd_index = (sdev->channel * MEGASAS_MAX_DEV_PER_CHANNEL)
|
|
|
|
+ sdev->id;
|
|
|
|
pd_sync = (void *)fusion->pd_seq_sync
|
|
|
|
[(instance->pd_seq_map_id - 1) & 1];
|
|
|
|
devhandle = pd_sync->seq[pd_index].devHandle;
|
2016-01-28 23:34:25 +08:00
|
|
|
} else
|
|
|
|
sdev_printk(KERN_ERR, sdev, "Firmware expose tmCapable"
|
|
|
|
" without JBOD MAP support from %s %d\n", __func__, __LINE__);
|
|
|
|
} else {
|
|
|
|
device_id = ((sdev->channel % 2) * MEGASAS_MAX_DEV_PER_CHANNEL)
|
|
|
|
+ sdev->id;
|
|
|
|
devhandle = device_id;
|
|
|
|
}
|
|
|
|
|
|
|
|
return devhandle;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* megasas_task_abort_fusion : SCSI task abort function for fusion adapters
|
|
|
|
* @scmd : pointer to scsi command object
|
|
|
|
*
|
|
|
|
* Return SUCCESS, if command aborted else FAILED
|
|
|
|
*/
|
|
|
|
|
|
|
|
int megasas_task_abort_fusion(struct scsi_cmnd *scmd)
|
|
|
|
{
|
|
|
|
struct megasas_instance *instance;
|
|
|
|
u16 smid, devhandle;
|
|
|
|
int ret;
|
|
|
|
struct MR_PRIV_DEVICE *mr_device_priv_data;
|
|
|
|
mr_device_priv_data = scmd->device->hostdata;
|
|
|
|
|
|
|
|
instance = (struct megasas_instance *)scmd->device->host->hostdata;
|
|
|
|
|
2016-01-28 23:34:32 +08:00
|
|
|
if (atomic_read(&instance->adprecovery) != MEGASAS_HBA_OPERATIONAL) {
|
2016-01-28 23:34:25 +08:00
|
|
|
dev_err(&instance->pdev->dev, "Controller is not OPERATIONAL,"
|
|
|
|
"SCSI host:%d\n", instance->host->host_no);
|
|
|
|
ret = FAILED;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!mr_device_priv_data) {
|
|
|
|
sdev_printk(KERN_INFO, scmd->device, "device been deleted! "
|
|
|
|
"scmd(%p)\n", scmd);
|
|
|
|
scmd->result = DID_NO_CONNECT << 16;
|
|
|
|
ret = SUCCESS;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!mr_device_priv_data->is_tm_capable) {
|
|
|
|
ret = FAILED;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_lock(&instance->reset_mutex);
|
|
|
|
|
|
|
|
smid = megasas_fusion_smid_lookup(scmd);
|
|
|
|
|
|
|
|
if (!smid) {
|
|
|
|
ret = SUCCESS;
|
|
|
|
scmd_printk(KERN_NOTICE, scmd, "Command for which abort is"
|
2019-04-17 21:51:09 +08:00
|
|
|
" issued is not found in outstanding commands\n");
|
2016-01-28 23:34:25 +08:00
|
|
|
mutex_unlock(&instance->reset_mutex);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
devhandle = megasas_get_tm_devhandle(scmd->device);
|
|
|
|
|
|
|
|
if (devhandle == (u16)ULONG_MAX) {
|
|
|
|
ret = SUCCESS;
|
|
|
|
sdev_printk(KERN_INFO, scmd->device,
|
|
|
|
"task abort issued for invalid devhandle\n");
|
|
|
|
mutex_unlock(&instance->reset_mutex);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
sdev_printk(KERN_INFO, scmd->device,
|
2019-05-08 01:05:37 +08:00
|
|
|
"attempting task abort! scmd(0x%p) tm_dev_handle 0x%x\n",
|
2016-01-28 23:34:25 +08:00
|
|
|
scmd, devhandle);
|
|
|
|
|
|
|
|
mr_device_priv_data->tm_busy = 1;
|
|
|
|
ret = megasas_issue_tm(instance, devhandle,
|
|
|
|
scmd->device->channel, scmd->device->id, smid,
|
2018-06-04 18:45:12 +08:00
|
|
|
MPI2_SCSITASKMGMT_TASKTYPE_ABORT_TASK,
|
|
|
|
mr_device_priv_data);
|
2016-01-28 23:34:25 +08:00
|
|
|
mr_device_priv_data->tm_busy = 0;
|
|
|
|
|
|
|
|
mutex_unlock(&instance->reset_mutex);
|
2019-05-08 01:05:37 +08:00
|
|
|
scmd_printk(KERN_INFO, scmd, "task abort %s!! scmd(0x%p)\n",
|
2016-01-28 23:34:25 +08:00
|
|
|
((ret == SUCCESS) ? "SUCCESS" : "FAILED"), scmd);
|
2019-05-08 01:05:37 +08:00
|
|
|
out:
|
|
|
|
scsi_print_command(scmd);
|
|
|
|
if (megasas_dbg_lvl & TM_DEBUG)
|
|
|
|
megasas_dump_fusion_io(scmd);
|
2016-01-28 23:34:25 +08:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* megasas_reset_target_fusion : target reset function for fusion adapters
|
|
|
|
* scmd: SCSI command pointer
|
|
|
|
*
|
|
|
|
* Returns SUCCESS if all commands associated with target aborted else FAILED
|
|
|
|
*/
|
|
|
|
|
|
|
|
int megasas_reset_target_fusion(struct scsi_cmnd *scmd)
|
|
|
|
{
|
|
|
|
|
|
|
|
struct megasas_instance *instance;
|
|
|
|
int ret = FAILED;
|
|
|
|
u16 devhandle;
|
|
|
|
struct MR_PRIV_DEVICE *mr_device_priv_data;
|
|
|
|
mr_device_priv_data = scmd->device->hostdata;
|
|
|
|
|
|
|
|
instance = (struct megasas_instance *)scmd->device->host->hostdata;
|
|
|
|
|
2016-01-28 23:34:32 +08:00
|
|
|
if (atomic_read(&instance->adprecovery) != MEGASAS_HBA_OPERATIONAL) {
|
2016-01-28 23:34:25 +08:00
|
|
|
dev_err(&instance->pdev->dev, "Controller is not OPERATIONAL,"
|
|
|
|
"SCSI host:%d\n", instance->host->host_no);
|
|
|
|
ret = FAILED;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!mr_device_priv_data) {
|
2019-05-08 01:05:37 +08:00
|
|
|
sdev_printk(KERN_INFO, scmd->device,
|
|
|
|
"device been deleted! scmd: (0x%p)\n", scmd);
|
2016-01-28 23:34:25 +08:00
|
|
|
scmd->result = DID_NO_CONNECT << 16;
|
|
|
|
ret = SUCCESS;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!mr_device_priv_data->is_tm_capable) {
|
|
|
|
ret = FAILED;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_lock(&instance->reset_mutex);
|
|
|
|
devhandle = megasas_get_tm_devhandle(scmd->device);
|
|
|
|
|
|
|
|
if (devhandle == (u16)ULONG_MAX) {
|
|
|
|
ret = SUCCESS;
|
|
|
|
sdev_printk(KERN_INFO, scmd->device,
|
|
|
|
"target reset issued for invalid devhandle\n");
|
|
|
|
mutex_unlock(&instance->reset_mutex);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
sdev_printk(KERN_INFO, scmd->device,
|
2019-05-08 01:05:37 +08:00
|
|
|
"attempting target reset! scmd(0x%p) tm_dev_handle: 0x%x\n",
|
2016-01-28 23:34:25 +08:00
|
|
|
scmd, devhandle);
|
|
|
|
mr_device_priv_data->tm_busy = 1;
|
|
|
|
ret = megasas_issue_tm(instance, devhandle,
|
|
|
|
scmd->device->channel, scmd->device->id, 0,
|
2018-06-04 18:45:12 +08:00
|
|
|
MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET,
|
|
|
|
mr_device_priv_data);
|
2016-01-28 23:34:25 +08:00
|
|
|
mr_device_priv_data->tm_busy = 0;
|
|
|
|
mutex_unlock(&instance->reset_mutex);
|
2019-05-08 01:05:37 +08:00
|
|
|
scmd_printk(KERN_NOTICE, scmd, "target reset %s!!\n",
|
2016-01-28 23:34:25 +08:00
|
|
|
(ret == SUCCESS) ? "SUCCESS" : "FAILED");
|
|
|
|
|
2019-05-08 01:05:37 +08:00
|
|
|
out:
|
2016-01-28 23:34:25 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2016-01-28 23:44:25 +08:00
|
|
|
/*SRIOV get other instance in cluster if any*/
|
2019-07-26 21:55:40 +08:00
|
|
|
static struct
|
|
|
|
megasas_instance *megasas_get_peer_instance(struct megasas_instance *instance)
|
2016-01-28 23:44:25 +08:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < MAX_MGMT_ADAPTERS; i++) {
|
|
|
|
if (megasas_mgmt_info.instance[i] &&
|
|
|
|
(megasas_mgmt_info.instance[i] != instance) &&
|
|
|
|
megasas_mgmt_info.instance[i]->requestorId &&
|
|
|
|
megasas_mgmt_info.instance[i]->peerIsPresent &&
|
|
|
|
(memcmp((megasas_mgmt_info.instance[i]->clusterId),
|
|
|
|
instance->clusterId, MEGASAS_CLUSTER_ID_SIZE) == 0))
|
|
|
|
return megasas_mgmt_info.instance[i];
|
|
|
|
}
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2014-03-10 17:51:56 +08:00
|
|
|
/* Check for a second path that is currently UP */
|
|
|
|
int megasas_check_mpio_paths(struct megasas_instance *instance,
|
|
|
|
struct scsi_cmnd *scmd)
|
|
|
|
{
|
2016-01-28 23:44:25 +08:00
|
|
|
struct megasas_instance *peer_instance = NULL;
|
2017-02-10 16:59:07 +08:00
|
|
|
int retval = (DID_REQUEUE << 16);
|
2016-01-28 23:44:25 +08:00
|
|
|
|
|
|
|
if (instance->peerIsPresent) {
|
|
|
|
peer_instance = megasas_get_peer_instance(instance);
|
|
|
|
if ((peer_instance) &&
|
|
|
|
(atomic_read(&peer_instance->adprecovery) ==
|
|
|
|
MEGASAS_HBA_OPERATIONAL))
|
|
|
|
retval = (DID_NO_CONNECT << 16);
|
2014-03-10 17:51:56 +08:00
|
|
|
}
|
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/* Core fusion reset function */
|
2016-01-28 23:34:25 +08:00
|
|
|
int megasas_reset_fusion(struct Scsi_Host *shost, int reason)
|
2010-12-22 05:34:31 +08:00
|
|
|
{
|
2017-01-11 07:20:46 +08:00
|
|
|
int retval = SUCCESS, i, j, convert = 0;
|
2010-12-22 05:34:31 +08:00
|
|
|
struct megasas_instance *instance;
|
2017-02-10 16:59:03 +08:00
|
|
|
struct megasas_cmd_fusion *cmd_fusion, *r1_cmd;
|
2010-12-22 05:34:31 +08:00
|
|
|
struct fusion_context *fusion;
|
2019-05-08 01:05:37 +08:00
|
|
|
u32 abs_state, status_reg, reset_adapter, fpio_count = 0;
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
u32 io_timeout_in_crash_mode = 0;
|
2015-04-23 19:01:24 +08:00
|
|
|
struct scsi_cmnd *scmd_local = NULL;
|
2016-01-28 23:34:25 +08:00
|
|
|
struct scsi_device *sdev;
|
2018-06-04 18:45:12 +08:00
|
|
|
int ret_target_prop = DCMD_FAILED;
|
|
|
|
bool is_target_prop = false;
|
2019-05-08 01:05:33 +08:00
|
|
|
bool do_adp_reset = true;
|
|
|
|
int max_reset_tries = MEGASAS_FUSION_MAX_RESET_TRIES;
|
2010-12-22 05:34:31 +08:00
|
|
|
|
|
|
|
instance = (struct megasas_instance *)shost->hostdata;
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
2014-03-10 17:51:56 +08:00
|
|
|
mutex_lock(&instance->reset_mutex);
|
|
|
|
|
2016-01-28 23:34:32 +08:00
|
|
|
if (atomic_read(&instance->adprecovery) == MEGASAS_HW_CRITICAL_ERROR) {
|
2015-07-08 04:52:34 +08:00
|
|
|
dev_warn(&instance->pdev->dev, "Hardware critical error, "
|
2014-03-10 17:51:56 +08:00
|
|
|
"returning FAILED for scsi%d.\n",
|
|
|
|
instance->host->host_no);
|
2014-07-10 06:17:54 +08:00
|
|
|
mutex_unlock(&instance->reset_mutex);
|
2011-10-09 09:14:39 +08:00
|
|
|
return FAILED;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
2018-12-17 16:47:39 +08:00
|
|
|
status_reg = instance->instancet->read_fw_status_reg(instance);
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
abs_state = status_reg & MFI_STATE_MASK;
|
|
|
|
|
|
|
|
/* IO timeout detected, forcibly put FW in FAULT state */
|
|
|
|
if (abs_state != MFI_STATE_FAULT && instance->crash_dump_buf &&
|
2016-01-28 23:34:25 +08:00
|
|
|
instance->crash_dump_app_support && reason) {
|
|
|
|
dev_info(&instance->pdev->dev, "IO/DCMD timeout is detected, "
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
"forcibly FAULT Firmware\n");
|
2016-01-28 23:34:32 +08:00
|
|
|
atomic_set(&instance->adprecovery, MEGASAS_ADPRESET_SM_INFAULT);
|
2018-12-17 16:47:40 +08:00
|
|
|
status_reg = megasas_readl(instance, &instance->reg_set->doorbell);
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
writel(status_reg | MFI_STATE_FORCE_OCR,
|
|
|
|
&instance->reg_set->doorbell);
|
|
|
|
readl(&instance->reg_set->doorbell);
|
|
|
|
mutex_unlock(&instance->reset_mutex);
|
|
|
|
do {
|
|
|
|
ssleep(3);
|
|
|
|
io_timeout_in_crash_mode++;
|
|
|
|
dev_dbg(&instance->pdev->dev, "waiting for [%d] "
|
|
|
|
"seconds for crash dump collection and OCR "
|
|
|
|
"to be done\n", (io_timeout_in_crash_mode * 3));
|
2016-01-28 23:34:32 +08:00
|
|
|
} while ((atomic_read(&instance->adprecovery) != MEGASAS_HBA_OPERATIONAL) &&
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
(io_timeout_in_crash_mode < 80));
|
|
|
|
|
2016-01-28 23:34:32 +08:00
|
|
|
if (atomic_read(&instance->adprecovery) == MEGASAS_HBA_OPERATIONAL) {
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
dev_info(&instance->pdev->dev, "OCR done for IO "
|
|
|
|
"timeout case\n");
|
|
|
|
retval = SUCCESS;
|
|
|
|
} else {
|
|
|
|
dev_info(&instance->pdev->dev, "Controller is not "
|
|
|
|
"operational after 240 seconds wait for IO "
|
|
|
|
"timeout case in FW crash dump mode\n do "
|
|
|
|
"OCR/kill adapter\n");
|
|
|
|
retval = megasas_reset_fusion(shost, 0);
|
|
|
|
}
|
|
|
|
return retval;
|
|
|
|
}
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2014-03-10 17:51:56 +08:00
|
|
|
if (instance->requestorId && !instance->skip_heartbeat_timer_del)
|
|
|
|
del_timer_sync(&instance->sriov_heartbeat_timer);
|
2011-05-12 09:34:08 +08:00
|
|
|
set_bit(MEGASAS_FUSION_IN_RESET, &instance->reset_flags);
|
2020-01-14 19:21:19 +08:00
|
|
|
set_bit(MEGASAS_FUSION_OCR_NOT_POSSIBLE, &instance->reset_flags);
|
2016-01-28 23:34:32 +08:00
|
|
|
atomic_set(&instance->adprecovery, MEGASAS_ADPRESET_SM_POLLING);
|
2013-05-22 15:04:14 +08:00
|
|
|
instance->instancet->disable_intr(instance);
|
2017-02-10 16:59:34 +08:00
|
|
|
megasas_sync_irqs((unsigned long)instance);
|
2011-05-12 09:34:08 +08:00
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/* First try waiting for commands to complete */
|
2016-01-28 23:34:25 +08:00
|
|
|
if (megasas_wait_for_outstanding_fusion(instance, reason,
|
2014-03-10 17:51:56 +08:00
|
|
|
&convert)) {
|
2016-01-28 23:34:32 +08:00
|
|
|
atomic_set(&instance->adprecovery, MEGASAS_ADPRESET_SM_INFAULT);
|
2015-07-08 04:52:34 +08:00
|
|
|
dev_warn(&instance->pdev->dev, "resetting fusion "
|
2014-03-10 17:51:56 +08:00
|
|
|
"adapter scsi%d.\n", instance->host->host_no);
|
|
|
|
if (convert)
|
2016-01-28 23:34:25 +08:00
|
|
|
reason = 0;
|
2014-03-10 17:51:56 +08:00
|
|
|
|
2019-05-08 01:05:37 +08:00
|
|
|
if (megasas_dbg_lvl & OCR_DEBUG)
|
2017-02-10 16:59:15 +08:00
|
|
|
dev_info(&instance->pdev->dev, "\nPending SCSI commands:\n");
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/* Now return commands back to the OS */
|
2015-04-23 19:01:24 +08:00
|
|
|
for (i = 0 ; i < instance->max_scsi_cmds; i++) {
|
2010-12-22 05:34:31 +08:00
|
|
|
cmd_fusion = fusion->cmd_list[i];
|
2017-01-11 07:20:47 +08:00
|
|
|
/*check for extra commands issued by driver*/
|
2018-12-17 16:47:37 +08:00
|
|
|
if (instance->adapter_type >= VENTURA_SERIES) {
|
2017-02-10 16:59:03 +08:00
|
|
|
r1_cmd = fusion->cmd_list[i + instance->max_fw_cmds];
|
|
|
|
megasas_return_cmd_fusion(instance, r1_cmd);
|
2017-01-11 07:20:47 +08:00
|
|
|
}
|
2015-04-23 19:01:24 +08:00
|
|
|
scmd_local = cmd_fusion->scmd;
|
2010-12-22 05:34:31 +08:00
|
|
|
if (cmd_fusion->scmd) {
|
2019-05-08 01:05:37 +08:00
|
|
|
if (megasas_dbg_lvl & OCR_DEBUG) {
|
2017-02-10 16:59:15 +08:00
|
|
|
sdev_printk(KERN_INFO,
|
|
|
|
cmd_fusion->scmd->device, "SMID: 0x%x\n",
|
|
|
|
cmd_fusion->index);
|
2019-05-08 01:05:37 +08:00
|
|
|
megasas_dump_fusion_io(cmd_fusion->scmd);
|
2017-02-10 16:59:15 +08:00
|
|
|
}
|
|
|
|
|
2019-05-08 01:05:37 +08:00
|
|
|
if (cmd_fusion->io_request->Function ==
|
|
|
|
MPI2_FUNCTION_SCSI_IO_REQUEST)
|
|
|
|
fpio_count++;
|
|
|
|
|
2015-04-23 19:01:24 +08:00
|
|
|
scmd_local->result =
|
2014-03-10 17:51:56 +08:00
|
|
|
megasas_check_mpio_paths(instance,
|
2015-04-23 19:01:24 +08:00
|
|
|
scmd_local);
|
2017-01-11 07:20:51 +08:00
|
|
|
if (instance->ldio_threshold &&
|
|
|
|
megasas_cmd_type(scmd_local) == READ_WRITE_LDIO)
|
2016-01-28 23:34:30 +08:00
|
|
|
atomic_dec(&instance->ldio_outstanding);
|
2010-12-22 05:34:31 +08:00
|
|
|
megasas_return_cmd_fusion(instance, cmd_fusion);
|
2015-04-23 19:01:24 +08:00
|
|
|
scsi_dma_unmap(scmd_local);
|
|
|
|
scmd_local->scsi_done(scmd_local);
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-05-08 01:05:37 +08:00
|
|
|
dev_info(&instance->pdev->dev, "Outstanding fastpath IOs: %d\n",
|
|
|
|
fpio_count);
|
|
|
|
|
2017-01-11 07:20:47 +08:00
|
|
|
atomic_set(&instance->fw_outstanding, 0);
|
|
|
|
|
2018-12-17 16:47:39 +08:00
|
|
|
status_reg = instance->instancet->read_fw_status_reg(instance);
|
2011-05-12 09:34:08 +08:00
|
|
|
abs_state = status_reg & MFI_STATE_MASK;
|
|
|
|
reset_adapter = status_reg & MFI_RESET_ADAPTER;
|
|
|
|
if (instance->disableOnlineCtrlReset ||
|
|
|
|
(abs_state == MFI_STATE_FAULT && !reset_adapter)) {
|
2010-12-22 05:34:31 +08:00
|
|
|
/* Reset not supported, kill adapter */
|
2015-07-08 04:52:34 +08:00
|
|
|
dev_warn(&instance->pdev->dev, "Reset not supported"
|
2014-03-10 17:51:56 +08:00
|
|
|
", killing adapter scsi%d.\n",
|
|
|
|
instance->host->host_no);
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
goto kill_hba;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
2014-03-10 17:51:56 +08:00
|
|
|
/* Let SR-IOV VF & PF sync up if there was a HB failure */
|
2016-01-28 23:34:25 +08:00
|
|
|
if (instance->requestorId && !reason) {
|
2014-03-10 17:51:56 +08:00
|
|
|
msleep(MEGASAS_OCR_SETTLE_TIME_VF);
|
2019-05-08 01:05:33 +08:00
|
|
|
do_adp_reset = false;
|
|
|
|
max_reset_tries = MEGASAS_SRIOV_MAX_RESET_TRIES_VF;
|
2014-03-10 17:51:56 +08:00
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/* Now try to reset the chip */
|
2019-05-08 01:05:33 +08:00
|
|
|
for (i = 0; i < max_reset_tries; i++) {
|
2019-05-08 01:05:34 +08:00
|
|
|
/*
|
|
|
|
* Do adp reset and wait for
|
|
|
|
* controller to transition to ready
|
|
|
|
*/
|
|
|
|
if (megasas_adp_reset_wait_for_ready(instance,
|
|
|
|
do_adp_reset, 1) == FAILED)
|
2010-12-22 05:34:31 +08:00
|
|
|
continue;
|
2019-05-08 01:05:33 +08:00
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/* Wait for FW to become ready */
|
2011-10-09 09:14:27 +08:00
|
|
|
if (megasas_transition_to_ready(instance, 1)) {
|
2016-01-28 23:34:35 +08:00
|
|
|
dev_warn(&instance->pdev->dev,
|
|
|
|
"Failed to transition controller to ready for "
|
|
|
|
"scsi%d.\n", instance->host->host_no);
|
2019-05-08 01:05:33 +08:00
|
|
|
continue;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
megasas_reset_reply_desc(instance);
|
2016-01-28 23:34:30 +08:00
|
|
|
megasas_fusion_update_can_queue(instance, OCR_CONTEXT);
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
if (megasas_ioc_init_fusion(instance)) {
|
2019-05-08 01:05:33 +08:00
|
|
|
continue;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
2014-11-17 17:54:13 +08:00
|
|
|
if (megasas_get_ctrl_info(instance)) {
|
|
|
|
dev_info(&instance->pdev->dev,
|
|
|
|
"Failed from %s %d\n",
|
|
|
|
__func__, __LINE__);
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
goto kill_hba;
|
2014-11-17 17:54:13 +08:00
|
|
|
}
|
2018-01-05 21:27:47 +08:00
|
|
|
|
2020-01-14 19:21:20 +08:00
|
|
|
megasas_refire_mgmt_cmd(instance,
|
|
|
|
(i == (MEGASAS_FUSION_MAX_RESET_TRIES - 1)
|
|
|
|
? 1 : 0));
|
2018-01-05 21:27:47 +08:00
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/* Reset load balance info */
|
2017-02-10 16:59:17 +08:00
|
|
|
if (fusion->load_balance_info)
|
|
|
|
memset(fusion->load_balance_info, 0,
|
|
|
|
(sizeof(struct LD_LOAD_BALANCE_INFO) *
|
|
|
|
MAX_LOGICAL_DRIVES_EXT));
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2020-01-14 19:21:20 +08:00
|
|
|
if (!megasas_get_map_info(instance)) {
|
2010-12-22 05:34:31 +08:00
|
|
|
megasas_sync_map_info(instance);
|
2020-01-14 19:21:20 +08:00
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* Return pending polled mode cmds before
|
|
|
|
* retrying OCR
|
|
|
|
*/
|
|
|
|
megasas_return_polled_cmds(instance);
|
|
|
|
continue;
|
|
|
|
}
|
2010-12-22 05:34:31 +08:00
|
|
|
|
2015-08-31 19:53:11 +08:00
|
|
|
megasas_setup_jbod_map(instance);
|
|
|
|
|
2017-01-11 07:20:46 +08:00
|
|
|
/* reset stream detection array */
|
2018-12-17 16:47:37 +08:00
|
|
|
if (instance->adapter_type >= VENTURA_SERIES) {
|
2017-01-11 07:20:46 +08:00
|
|
|
for (j = 0; j < MAX_LOGICAL_DRIVES_EXT; ++j) {
|
|
|
|
memset(fusion->stream_detect_by_ld[j],
|
|
|
|
0, sizeof(struct LD_STREAM_DETECT));
|
|
|
|
fusion->stream_detect_by_ld[j]->mru_bit_map
|
|
|
|
= MR_STREAM_BITMAP;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-11-17 17:54:18 +08:00
|
|
|
clear_bit(MEGASAS_FUSION_IN_RESET,
|
|
|
|
&instance->reset_flags);
|
|
|
|
instance->instancet->enable_intr(instance);
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
megasas_enable_irq_poll(instance);
|
2018-06-04 18:45:12 +08:00
|
|
|
shost_for_each_device(sdev, shost) {
|
|
|
|
if ((instance->tgt_prop) &&
|
|
|
|
(instance->nvme_page_size))
|
|
|
|
ret_target_prop = megasas_get_target_prop(instance, sdev);
|
|
|
|
|
|
|
|
is_target_prop = (ret_target_prop == DCMD_SUCCESS) ? true : false;
|
|
|
|
megasas_set_dynamic_target_properties(sdev, is_target_prop);
|
|
|
|
}
|
|
|
|
|
2020-01-14 19:21:17 +08:00
|
|
|
status_reg = instance->instancet->read_fw_status_reg
|
|
|
|
(instance);
|
|
|
|
abs_state = status_reg & MFI_STATE_MASK;
|
|
|
|
if (abs_state != MFI_STATE_OPERATIONAL) {
|
|
|
|
dev_info(&instance->pdev->dev,
|
|
|
|
"Adapter is not OPERATIONAL, state 0x%x for scsi:%d\n",
|
|
|
|
abs_state, instance->host->host_no);
|
|
|
|
goto out;
|
|
|
|
}
|
2016-01-28 23:34:32 +08:00
|
|
|
atomic_set(&instance->adprecovery, MEGASAS_HBA_OPERATIONAL);
|
2014-11-17 17:54:18 +08:00
|
|
|
|
2019-05-08 01:05:45 +08:00
|
|
|
dev_info(&instance->pdev->dev,
|
|
|
|
"Adapter is OPERATIONAL for scsi:%d\n",
|
|
|
|
instance->host->host_no);
|
2017-08-23 19:47:06 +08:00
|
|
|
|
2014-03-10 17:51:56 +08:00
|
|
|
/* Restart SR-IOV heartbeat */
|
|
|
|
if (instance->requestorId) {
|
|
|
|
if (!megasas_sriov_start_heartbeat(instance, 0))
|
2017-10-23 06:30:04 +08:00
|
|
|
megasas_start_timer(instance);
|
2014-03-10 17:51:56 +08:00
|
|
|
else
|
|
|
|
instance->skip_heartbeat_timer_del = 1;
|
|
|
|
}
|
|
|
|
|
2014-11-17 17:54:18 +08:00
|
|
|
if (instance->crash_dump_drv_support &&
|
|
|
|
instance->crash_dump_app_support)
|
|
|
|
megasas_set_crash_dump_params(instance,
|
|
|
|
MR_CRASH_BUF_TURN_ON);
|
|
|
|
else
|
|
|
|
megasas_set_crash_dump_params(instance,
|
|
|
|
MR_CRASH_BUF_TURN_OFF);
|
|
|
|
|
2018-10-17 14:37:40 +08:00
|
|
|
if (instance->snapdump_wait_time) {
|
|
|
|
megasas_get_snapdump_properties(instance);
|
|
|
|
dev_info(&instance->pdev->dev,
|
|
|
|
"Snap dump wait time\t: %d\n",
|
|
|
|
instance->snapdump_wait_time);
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
retval = SUCCESS;
|
2017-08-23 19:47:06 +08:00
|
|
|
|
|
|
|
/* Adapter reset completed successfully */
|
|
|
|
dev_warn(&instance->pdev->dev,
|
|
|
|
"Reset successful for scsi%d.\n",
|
|
|
|
instance->host->host_no);
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
/* Reset failed, kill the adapter */
|
2015-07-08 04:52:34 +08:00
|
|
|
dev_warn(&instance->pdev->dev, "Reset failed, killing "
|
2014-03-10 17:51:56 +08:00
|
|
|
"adapter scsi%d.\n", instance->host->host_no);
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
goto kill_hba;
|
2010-12-22 05:34:31 +08:00
|
|
|
} else {
|
2014-03-10 17:51:56 +08:00
|
|
|
/* For VF: Restart HB timer if we didn't OCR */
|
|
|
|
if (instance->requestorId) {
|
2017-10-23 06:30:04 +08:00
|
|
|
megasas_start_timer(instance);
|
2014-03-10 17:51:56 +08:00
|
|
|
}
|
2011-10-09 09:14:59 +08:00
|
|
|
clear_bit(MEGASAS_FUSION_IN_RESET, &instance->reset_flags);
|
2013-05-22 15:04:14 +08:00
|
|
|
instance->instancet->enable_intr(instance);
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
megasas_enable_irq_poll(instance);
|
2016-01-28 23:34:32 +08:00
|
|
|
atomic_set(&instance->adprecovery, MEGASAS_HBA_OPERATIONAL);
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
goto out;
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:
We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count. SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
This may be a generic issue (if PCI device support completion on multiple
reply queues).
Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues. MegaRAID controller
supports multiple reply queues in completion path. Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.
Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA. If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.
megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue. Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.
Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU. If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.
Problem statement:
If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.
Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.
If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.
For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB
Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs. Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts. Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.
Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.
Solution:
Use IRQ poll interface defined in "irq_poll.c".
megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.
In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.
Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-05-08 01:05:35 +08:00
|
|
|
kill_hba:
|
|
|
|
megaraid_sas_kill_hba(instance);
|
|
|
|
megasas_enable_irq_poll(instance);
|
|
|
|
instance->skip_heartbeat_timer_del = 1;
|
|
|
|
retval = FAILED;
|
2010-12-22 05:34:31 +08:00
|
|
|
out:
|
2020-01-14 19:21:19 +08:00
|
|
|
clear_bit(MEGASAS_FUSION_OCR_NOT_POSSIBLE, &instance->reset_flags);
|
2010-12-22 05:34:31 +08:00
|
|
|
mutex_unlock(&instance->reset_mutex);
|
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
2018-10-17 14:37:39 +08:00
|
|
|
/* Fusion Crash dump collection */
|
2019-07-26 21:55:40 +08:00
|
|
|
static void megasas_fusion_crash_dump(struct megasas_instance *instance)
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
{
|
|
|
|
u32 status_reg;
|
|
|
|
u8 partial_copy = 0;
|
2018-10-17 14:37:39 +08:00
|
|
|
int wait = 0;
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
|
|
|
|
|
2018-12-17 16:47:39 +08:00
|
|
|
status_reg = instance->instancet->read_fw_status_reg(instance);
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Allocate host crash buffers to copy data from 1 MB DMA crash buffer
|
|
|
|
* to host crash buffers
|
|
|
|
*/
|
|
|
|
if (instance->drv_buf_index == 0) {
|
|
|
|
/* Buffer is already allocated for old Crash dump.
|
|
|
|
* Do OCR and do not wait for crash dump collection
|
|
|
|
*/
|
|
|
|
if (instance->drv_buf_alloc) {
|
|
|
|
dev_info(&instance->pdev->dev, "earlier crash dump is "
|
|
|
|
"not yet copied by application, ignoring this "
|
|
|
|
"crash dump and initiating OCR\n");
|
|
|
|
status_reg |= MFI_STATE_CRASH_DUMP_DONE;
|
|
|
|
writel(status_reg,
|
2018-10-17 14:37:51 +08:00
|
|
|
&instance->reg_set->outbound_scratch_pad_0);
|
|
|
|
readl(&instance->reg_set->outbound_scratch_pad_0);
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
megasas_alloc_host_crash_buffer(instance);
|
|
|
|
dev_info(&instance->pdev->dev, "Number of host crash buffers "
|
|
|
|
"allocated: %d\n", instance->drv_buf_alloc);
|
|
|
|
}
|
|
|
|
|
2018-10-17 14:37:39 +08:00
|
|
|
while (!(status_reg & MFI_STATE_CRASH_DUMP_DONE) &&
|
|
|
|
(wait < MEGASAS_WATCHDOG_WAIT_COUNT)) {
|
|
|
|
if (!(status_reg & MFI_STATE_DMADONE)) {
|
|
|
|
/*
|
|
|
|
* Next crash dump buffer is not yet DMA'd by FW
|
|
|
|
* Check after 10ms. Wait for 1 second for FW to
|
|
|
|
* post the next buffer. If not bail out.
|
|
|
|
*/
|
|
|
|
wait++;
|
|
|
|
msleep(MEGASAS_WAIT_FOR_NEXT_DMA_MSECS);
|
|
|
|
status_reg = instance->instancet->read_fw_status_reg(
|
2018-12-17 16:47:39 +08:00
|
|
|
instance);
|
2018-10-17 14:37:39 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
wait = 0;
|
|
|
|
if (instance->drv_buf_index >= instance->drv_buf_alloc) {
|
|
|
|
dev_info(&instance->pdev->dev,
|
|
|
|
"Driver is done copying the buffer: %d\n",
|
|
|
|
instance->drv_buf_alloc);
|
|
|
|
status_reg |= MFI_STATE_CRASH_DUMP_DONE;
|
|
|
|
partial_copy = 1;
|
|
|
|
break;
|
|
|
|
} else {
|
|
|
|
memcpy(instance->crash_buf[instance->drv_buf_index],
|
|
|
|
instance->crash_dump_buf, CRASH_DMA_BUF_SIZE);
|
|
|
|
instance->drv_buf_index++;
|
|
|
|
status_reg &= ~MFI_STATE_DMADONE;
|
|
|
|
}
|
|
|
|
|
2018-10-17 14:37:51 +08:00
|
|
|
writel(status_reg, &instance->reg_set->outbound_scratch_pad_0);
|
|
|
|
readl(&instance->reg_set->outbound_scratch_pad_0);
|
2018-10-17 14:37:39 +08:00
|
|
|
|
|
|
|
msleep(MEGASAS_WAIT_FOR_NEXT_DMA_MSECS);
|
2018-12-17 16:47:39 +08:00
|
|
|
status_reg = instance->instancet->read_fw_status_reg(instance);
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (status_reg & MFI_STATE_CRASH_DUMP_DONE) {
|
|
|
|
dev_info(&instance->pdev->dev, "Crash Dump is available,number "
|
|
|
|
"of copied buffers: %d\n", instance->drv_buf_index);
|
|
|
|
instance->fw_crash_buffer_size = instance->drv_buf_index;
|
|
|
|
instance->fw_crash_state = AVAILABLE;
|
|
|
|
instance->drv_buf_index = 0;
|
2018-10-17 14:37:51 +08:00
|
|
|
writel(status_reg, &instance->reg_set->outbound_scratch_pad_0);
|
|
|
|
readl(&instance->reg_set->outbound_scratch_pad_0);
|
megaraid_sas : Firmware crash dump feature support
Resending the patch. Addressed the review comments from Tomas Henzl.
Move buff_offset inside spinlock, corrected loop at crash dump buffer free,
reset_devices check is added to disable fw crash dump feature in kdump kernel.
This feature will provide similar interface as kernel crash dump feature.
When megaraid firmware encounter any crash, driver will collect the firmware raw image and
dump it into pre-configured location.
Driver will allocate two different segment of memory.
#1 Non-DMA able large buffer (will be allocated on demand) to capture actual FW crash dump.
#2 DMA buffer (persistence allocation) just to do a arbitrator job.
Firmware will keep writing Crash dump data in chucks of DMA buffer size into #2,
which will be copy back by driver to the host memory as described in #1.
Driver-Firmware interface:
==================
A.) Host driver can allocate maximum 512MB Host memory to store crash dump data.
This memory will be internal to the host and will not be exposed to the Firmware.
Driver may not be able to allocate 512 MB. In that case, driver will do possible memory
(available at run time) allocation to store crash dump data.
Let’s call this buffer as Host Crash Buffer.
Host Crash buffer will not be contigious as a whole, but it will have multiple chunk of contigious memory.
This will be internal to driver and firmware/application are unaware of it.
Partial allocation of Host Crash buffer may have valid information to debug depending upon
what was collected in that buffer and depending on nature of failure.
Complete Crash dump is the best case, but we do want to capture partial buffer just to grab something rather than nothing.
Host Crash buffer will be allocated only when FW Crash dump data is available,
and will be deallocated once application copy Host Crash buffer to the file.
Host Crash buffer size can be anything between 1MB to 512MB. (It will be multiple of 1MBs)
B.) Irrespective of underlying Firmware capability of crash dump support,
driver will allocate DMA buffer at start of the day for each MR controllers.
Let’s call this buffer as “DMA Crash Buffer”.
For this feature, size of DMA crash buffer will be 1MB.
(We will not gain much even if DMA buffer size is increased.)
C.) Driver will now read Controller Info sending existing dcmd “MR_DCMD_CTRL_GET_INFO”.
Driver should extract the information from ctrl info provided by firmware and
figure out if firmware support crash dump feature or not.
Driver will enable crash dump feature only if
“Firmware support Crash dump” +
“Driver was able to create DMA Crash Buffer”.
If either one from above is not set, Crash dump feature should be disable in driver.
Firmware will enable crash dump feature only if “Driver Send DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON”
Helper application/script should use sysfs parameter fw_crash_xxx to actually copy data from
host memory to the filesystem.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-09-12 21:27:28 +08:00
|
|
|
if (!partial_copy)
|
|
|
|
megasas_reset_fusion(instance->host, 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
/* Fusion OCR work queue */
|
|
|
|
void megasas_fusion_ocr_wq(struct work_struct *work)
|
|
|
|
{
|
|
|
|
struct megasas_instance *instance =
|
|
|
|
container_of(work, struct megasas_instance, work_init);
|
|
|
|
|
2014-03-10 17:51:56 +08:00
|
|
|
megasas_reset_fusion(instance->host, 0);
|
2010-12-22 05:34:31 +08:00
|
|
|
}
|
|
|
|
|
2017-02-10 16:59:17 +08:00
|
|
|
/* Allocate fusion context */
|
|
|
|
int
|
|
|
|
megasas_alloc_fusion_context(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
struct fusion_context *fusion;
|
|
|
|
|
2017-10-19 17:48:53 +08:00
|
|
|
instance->ctrl_context = kzalloc(sizeof(struct fusion_context),
|
|
|
|
GFP_KERNEL);
|
2017-02-10 16:59:17 +08:00
|
|
|
if (!instance->ctrl_context) {
|
2017-10-19 17:48:53 +08:00
|
|
|
dev_err(&instance->pdev->dev, "Failed from %s %d\n",
|
|
|
|
__func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
2017-02-10 16:59:17 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
fusion = instance->ctrl_context;
|
|
|
|
|
2017-10-19 17:48:53 +08:00
|
|
|
fusion->log_to_span_pages = get_order(MAX_LOGICAL_DRIVES_EXT *
|
|
|
|
sizeof(LD_SPAN_INFO));
|
|
|
|
fusion->log_to_span =
|
|
|
|
(PLD_SPAN_INFO)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
|
|
|
|
fusion->log_to_span_pages);
|
|
|
|
if (!fusion->log_to_span) {
|
treewide: Use array_size() in vzalloc()
The vzalloc() function has no 2-factor argument form, so multiplication
factors need to be wrapped in array_size(). This patch replaces cases of:
vzalloc(a * b)
with:
vzalloc(array_size(a, b))
as well as handling cases of:
vzalloc(a * b * c)
with:
vzalloc(array3_size(a, b, c))
This does, however, attempt to ignore constant size factors like:
vzalloc(4 * 1024)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
vzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
vzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
vzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
vzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
vzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
vzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
vzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
vzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
vzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
vzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
vzalloc(
- sizeof(TYPE) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * COUNT_ID
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(THING) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * COUNT_ID
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
vzalloc(
- SIZE * COUNT
+ array_size(COUNT, SIZE)
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
vzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
vzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
vzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
vzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
vzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
vzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
vzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
vzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
vzalloc(C1 * C2 * C3, ...)
|
vzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@
(
vzalloc(C1 * C2, ...)
|
vzalloc(
- E1 * E2
+ array_size(E1, E2)
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-13 05:27:37 +08:00
|
|
|
fusion->log_to_span =
|
|
|
|
vzalloc(array_size(MAX_LOGICAL_DRIVES_EXT,
|
|
|
|
sizeof(LD_SPAN_INFO)));
|
2017-10-19 17:48:53 +08:00
|
|
|
if (!fusion->log_to_span) {
|
|
|
|
dev_err(&instance->pdev->dev, "Failed from %s %d\n",
|
|
|
|
__func__, __LINE__);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-02-10 16:59:17 +08:00
|
|
|
fusion->load_balance_info_pages = get_order(MAX_LOGICAL_DRIVES_EXT *
|
|
|
|
sizeof(struct LD_LOAD_BALANCE_INFO));
|
|
|
|
fusion->load_balance_info =
|
|
|
|
(struct LD_LOAD_BALANCE_INFO *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
|
|
|
|
fusion->load_balance_info_pages);
|
|
|
|
if (!fusion->load_balance_info) {
|
treewide: Use array_size() in vzalloc()
The vzalloc() function has no 2-factor argument form, so multiplication
factors need to be wrapped in array_size(). This patch replaces cases of:
vzalloc(a * b)
with:
vzalloc(array_size(a, b))
as well as handling cases of:
vzalloc(a * b * c)
with:
vzalloc(array3_size(a, b, c))
This does, however, attempt to ignore constant size factors like:
vzalloc(4 * 1024)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
vzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
vzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
vzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
vzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
vzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
vzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
vzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
vzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
vzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
vzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
vzalloc(
- sizeof(TYPE) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * COUNT_ID
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(THING) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * COUNT_ID
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
vzalloc(
- SIZE * COUNT
+ array_size(COUNT, SIZE)
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
vzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
vzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
vzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
vzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
vzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
vzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
vzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
vzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
vzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
vzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
vzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
vzalloc(C1 * C2 * C3, ...)
|
vzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@
(
vzalloc(C1 * C2, ...)
|
vzalloc(
- E1 * E2
+ array_size(E1, E2)
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-13 05:27:37 +08:00
|
|
|
fusion->load_balance_info =
|
|
|
|
vzalloc(array_size(MAX_LOGICAL_DRIVES_EXT,
|
|
|
|
sizeof(struct LD_LOAD_BALANCE_INFO)));
|
2017-02-10 16:59:17 +08:00
|
|
|
if (!fusion->load_balance_info)
|
|
|
|
dev_err(&instance->pdev->dev, "Failed to allocate load_balance_info, "
|
|
|
|
"continuing without Load Balance support\n");
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
megasas_free_fusion_context(struct megasas_instance *instance)
|
|
|
|
{
|
|
|
|
struct fusion_context *fusion = instance->ctrl_context;
|
|
|
|
|
|
|
|
if (fusion) {
|
|
|
|
if (fusion->load_balance_info) {
|
|
|
|
if (is_vmalloc_addr(fusion->load_balance_info))
|
|
|
|
vfree(fusion->load_balance_info);
|
|
|
|
else
|
|
|
|
free_pages((ulong)fusion->load_balance_info,
|
|
|
|
fusion->load_balance_info_pages);
|
|
|
|
}
|
|
|
|
|
2017-10-19 17:48:53 +08:00
|
|
|
if (fusion->log_to_span) {
|
|
|
|
if (is_vmalloc_addr(fusion->log_to_span))
|
|
|
|
vfree(fusion->log_to_span);
|
|
|
|
else
|
|
|
|
free_pages((ulong)fusion->log_to_span,
|
|
|
|
fusion->log_to_span_pages);
|
|
|
|
}
|
|
|
|
|
|
|
|
kfree(fusion);
|
2017-02-10 16:59:17 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-12-22 05:34:31 +08:00
|
|
|
struct megasas_instance_template megasas_instance_template_fusion = {
|
|
|
|
.enable_intr = megasas_enable_intr_fusion,
|
|
|
|
.disable_intr = megasas_disable_intr_fusion,
|
|
|
|
.clear_intr = megasas_clear_intr_fusion,
|
|
|
|
.read_fw_status_reg = megasas_read_fw_status_reg_fusion,
|
|
|
|
.adp_reset = megasas_adp_reset_fusion,
|
|
|
|
.check_reset = megasas_check_reset_fusion,
|
|
|
|
.service_isr = megasas_isr_fusion,
|
|
|
|
.tasklet = megasas_complete_cmd_dpc_fusion,
|
|
|
|
.init_adapter = megasas_init_adapter_fusion,
|
|
|
|
.build_and_issue_cmd = megasas_build_and_issue_cmd_fusion,
|
|
|
|
.issue_dcmd = megasas_issue_dcmd_fusion,
|
|
|
|
};
|