OpenCloudOS-Kernel

History

Jeffrey Hugo ac191bcb0f bus: mhi: host: Add MHI_PM_SYS_ERR_FAIL state [ Upstream commit bce3f770684cc1d91ff9edab431b71ac991faf29 ] When processing a SYSERR, if the device does not respond to the MHI_RESET from the host, the host will be stuck in a difficult to recover state. The host will remain in MHI_PM_SYS_ERR_PROCESS and not clean up the host channels. Clients will not be notified of the SYSERR via the destruction of their channel devices, which means clients may think that the device is still up. Subsequent SYSERR events such as a device fatal error will not be processed as the state machine cannot transition from PROCESS back to DETECT. The only way to recover from this is to unload the mhi module (wipe the state machine state) or for the mhi controller to initiate SHUTDOWN. This issue was discovered by stress testing soc_reset events on AIC100 via the sysfs node. soc_reset is processed entirely in hardware. When the register write hits the endpoint hardware, it causes the soc to reset without firmware involvement. In stress testing, there is a rare race where soc_reset N will cause the soc to reset and PBL to signal SYSERR (fatal error). If soc_reset N+1 is triggered before PBL can process the MHI_RESET from the host, then the soc will reset again, and re-run PBL from the beginning. This will cause PBL to lose all state. PBL will be waiting for the host to respond to the new syserr, but host will be stuck expecting the previous MHI_RESET to be processed. Additionally, the AMSS EE firmware (QSM) was hacked to synthetically reproduce the issue by simulating a FW hang after the QSM issued a SYSERR. In this case, soc_reset would not recover the device. For this failure case, to recover the device, we need a state similar to PROCESS, but can transition to DETECT. There is not a viable existing state to use. POR has the needed transitions, but assumes the device is in a good state and could allow the host to attempt to use the device. Allowing PROCESS to transition to DETECT invites the possibility of parallel SYSERR processing which could get the host and device out of sync. Thus, invent a new state - MHI_PM_SYS_ERR_FAIL This essentially a holding state. It allows us to clean up the host elements that are based on the old state of the device (channels), but does not allow us to directly advance back to an operational state. It does allow the detection and processing of another SYSERR which may recover the device, or allows the controller to do a clean shutdown. Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20240112180800.536733-1-quic_jhugo@quicinc.com Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>		2024-04-13 13:07:38 +02:00
..
fsl-mc	ARM: SoC cleanups for 6.6	2023-08-30 16:49:40 -07:00
mhi	bus: mhi: host: Add MHI_PM_SYS_ERR_FAIL state	2024-04-13 13:07:38 +02:00
Kconfig	bus: tegra-aconnect: Update dependency to ARCH_TEGRA	2024-03-26 18:19:30 -04:00
Makefile	bus: add driver for initializing the SSC bus on (some) qcom SoCs	2022-04-19 13:03:57 -05:00
arm-cci.c	bus: arm-cci: remove unnecessary unreachable()	2018-05-14 01:22:49 -07:00
arm-integrator-lm.c	bus: arm-integrator-lm: remove MODULE_LICENSE in non-modules	2023-04-13 13:13:50 -07:00
brcmstb_gisb.c	bus: brcmstb_gisb: Use devm_platform_get_and_ioremap_resource()	2023-03-14 14:07:16 -07:00
bt1-apb.c	bus: remove MODULE_LICENSE in non-modules	2023-04-13 13:13:53 -07:00
bt1-axi.c	bus: remove MODULE_LICENSE in non-modules	2023-04-13 13:13:53 -07:00
da8xx-mstpri.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500	2019-06-19 17:09:55 +02:00
hisi_lpc.c	bus: Explicitly include correct DT includes	2023-08-12 10:31:01 +02:00
imx-weim.c	bus: imx-weim: fix valid range check	2024-03-01 13:35:05 +01:00
intel-ixp4xx-eb.c	bus: ixp4xx: fix IXP4XX_EXP_T1_MASK	2023-07-05 22:22:55 +02:00
mips_cdmm.c	driver core: make struct bus_type.uevent() take a const *	2023-01-27 13:45:52 +01:00
moxtet.c	bus: moxtet: Add spi device table	2024-01-20 11:51:47 +01:00
mvebu-mbus.c	bus: mvebu-mbus: Remove open coded "ranges" parsing	2023-04-18 11:18:24 -05:00
omap-ocp2scp.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157	2019-05-30 11:26:37 -07:00
omap_l3_noc.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_320.RULE	2022-06-10 14:51:36 +02:00
omap_l3_noc.h	treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_320.RULE	2022-06-10 14:51:36 +02:00
omap_l3_smx.c	ARM: SoC cleanups for 6.6	2023-08-30 16:49:40 -07:00
omap_l3_smx.h	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156	2019-05-30 11:26:35 -07:00
qcom-ebi2.c	bus: qcom: remove MODULE_LICENSE in non-modules	2023-04-13 13:13:50 -07:00
qcom-ssc-block-bus.c	bus: remove MODULE_LICENSE in non-modules	2023-04-13 13:13:50 -07:00
simple-pm-bus.c	bus: Explicitly include correct DT includes	2023-08-12 10:31:01 +02:00
sun50i-de2.c	bus: sun50i-de2: Adjust printing error message	2021-10-13 14:48:48 +02:00
sunxi-rsb.c	ARM: SoC cleanups for 6.6	2023-08-30 16:49:40 -07:00
tegra-aconnect.c	bus: tegra-aconnect: add system sleep callbacks	2019-03-28 17:26:14 +01:00
tegra-gmi.c	bus: tegra-gmi: Convert to devm_platform_ioremap_resource()	2023-07-21 17:27:33 +02:00
ti-pwmss.c	bus: Explicitly include correct DT includes	2023-08-12 10:31:01 +02:00
ti-sysc.c	bus: ti-sysc: Flush posted write only after srst_udelay	2024-01-01 12:42:46 +00:00
ts-nbus.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_56.RULE (part 2)	2022-06-10 14:51:35 +02:00
uniphier-system-bus.c	bus: uniphier-system-bus: Remove open coded "ranges" parsing	2023-03-30 13:37:21 -05:00
vexpress-config.c	bus: vexpress-config: Convert to devm_platform_ioremap_resource()	2023-07-13 14:38:44 +01:00