[ Upstream commit bce3f770684cc1d91ff9edab431b71ac991faf29 ]
When processing a SYSERR, if the device does not respond to the MHI_RESET
from the host, the host will be stuck in a difficult to recover state.
The host will remain in MHI_PM_SYS_ERR_PROCESS and not clean up the host
channels. Clients will not be notified of the SYSERR via the destruction
of their channel devices, which means clients may think that the device is
still up. Subsequent SYSERR events such as a device fatal error will not
be processed as the state machine cannot transition from PROCESS back to
DETECT. The only way to recover from this is to unload the mhi module
(wipe the state machine state) or for the mhi controller to initiate
SHUTDOWN.
This issue was discovered by stress testing soc_reset events on AIC100
via the sysfs node.
soc_reset is processed entirely in hardware. When the register write
hits the endpoint hardware, it causes the soc to reset without firmware
involvement. In stress testing, there is a rare race where soc_reset N
will cause the soc to reset and PBL to signal SYSERR (fatal error). If
soc_reset N+1 is triggered before PBL can process the MHI_RESET from the
host, then the soc will reset again, and re-run PBL from the beginning.
This will cause PBL to lose all state. PBL will be waiting for the host
to respond to the new syserr, but host will be stuck expecting the
previous MHI_RESET to be processed.
Additionally, the AMSS EE firmware (QSM) was hacked to synthetically
reproduce the issue by simulating a FW hang after the QSM issued a
SYSERR. In this case, soc_reset would not recover the device.
For this failure case, to recover the device, we need a state similar to
PROCESS, but can transition to DETECT. There is not a viable existing
state to use. POR has the needed transitions, but assumes the device is
in a good state and could allow the host to attempt to use the device.
Allowing PROCESS to transition to DETECT invites the possibility of
parallel SYSERR processing which could get the host and device out of
sync.
Thus, invent a new state - MHI_PM_SYS_ERR_FAIL
This essentially a holding state. It allows us to clean up the host
elements that are based on the old state of the device (channels), but
does not allow us to directly advance back to an operational state. It
does allow the detection and processing of another SYSERR which may
recover the device, or allows the controller to do a clean shutdown.
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20240112180800.536733-1-quic_jhugo@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
If the BHIOFF or BHIEOFF range checks fail, they return EINVAL. ERANGE
is a better error code since it implies an out of range condition.
Suggested-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/1679674860-28229-1-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
If the value read from the CHDBOFF and ERDBOFF registers is outside the
range of the MHI register space then an invalid address might be computed
which later causes a kernel panic. Range check the read value to prevent
a crash due to bad data from the device.
Fixes: 6cd330ae76 ("bus: mhi: core: Add support for ringing channel/event ring doorbells")
Cc: stable@vger.kernel.org
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/1679674384-27209-1-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Here is the large set of driver core changes for 6.3-rc1.
There's a lot of changes this development cycle, most of the work falls
into two different categories:
- fw_devlink fixes and updates. This has gone through numerous review
cycles and lots of review and testing by lots of different devices.
Hopefully all should be good now, and Saravana will be keeping a
watch for any potential regression on odd embedded systems.
- driver core changes to work to make struct bus_type able to be moved
into read-only memory (i.e. const) The recent work with Rust has
pointed out a number of areas in the driver core where we are
passing around and working with structures that really do not have
to be dynamic at all, and they should be able to be read-only making
things safer overall. This is the contuation of that work (started
last release with kobject changes) in moving struct bus_type to be
constant. We didn't quite make it for this release, but the
remaining patches will be finished up for the release after this
one, but the groundwork has been laid for this effort.
Other than that we have in here:
- debugfs memory leak fixes in some subsystems
- error path cleanups and fixes for some never-able-to-be-hit
codepaths.
- cacheinfo rework and fixes
- Other tiny fixes, full details are in the shortlog
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY/ipdg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynL3gCgwzbcWu0So3piZyLiJKxsVo9C2EsAn3sZ9gN6
6oeFOjD3JDju3cQsfGgd
=Su6W
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.3-rc1.
There's a lot of changes this development cycle, most of the work
falls into two different categories:
- fw_devlink fixes and updates. This has gone through numerous review
cycles and lots of review and testing by lots of different devices.
Hopefully all should be good now, and Saravana will be keeping a
watch for any potential regression on odd embedded systems.
- driver core changes to work to make struct bus_type able to be
moved into read-only memory (i.e. const) The recent work with Rust
has pointed out a number of areas in the driver core where we are
passing around and working with structures that really do not have
to be dynamic at all, and they should be able to be read-only
making things safer overall. This is the contuation of that work
(started last release with kobject changes) in moving struct
bus_type to be constant. We didn't quite make it for this release,
but the remaining patches will be finished up for the release after
this one, but the groundwork has been laid for this effort.
Other than that we have in here:
- debugfs memory leak fixes in some subsystems
- error path cleanups and fixes for some never-able-to-be-hit
codepaths.
- cacheinfo rework and fixes
- Other tiny fixes, full details are in the shortlog
All of these have been in linux-next for a while with no reported
problems"
[ Geert Uytterhoeven points out that that last sentence isn't true, and
that there's a pending report that has a fix that is queued up - Linus ]
* tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits)
debugfs: drop inline constant formatting for ERR_PTR(-ERROR)
OPP: fix error checking in opp_migrate_dentry()
debugfs: update comment of debugfs_rename()
i3c: fix device.h kernel-doc warnings
dma-mapping: no need to pass a bus_type into get_arch_dma_ops()
driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place
Revert "driver core: add error handling for devtmpfs_create_node()"
Revert "devtmpfs: add debug info to handle()"
Revert "devtmpfs: remove return value of devtmpfs_delete_node()"
driver core: cpu: don't hand-override the uevent bus_type callback.
devtmpfs: remove return value of devtmpfs_delete_node()
devtmpfs: add debug info to handle()
driver core: add error handling for devtmpfs_create_node()
driver core: bus: update my copyright notice
driver core: bus: add bus_get_dev_root() function
driver core: bus: constify bus_unregister()
driver core: bus: constify some internal functions
driver core: bus: constify bus_get_kset()
driver core: bus: constify bus_register/unregister_notifier()
driver core: remove private pointer from struct bus_type
...
The uevent() callback in struct bus_type should not be modifying the
device that is passed into it, so mark it as a const * and propagate the
function signature changes out into all relevant subsystems that use
this callback.
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230111113018.459199-16-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This should be a mistake. MHI contains "Host Interface"
already. So we shall update "MHI" to "Modem" and the full
name shall be "Modem Host Interface".
Signed-off-by: Slark Xiao <slark_xiao@163.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20221229011358.15874-1-slark_xiao@163.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
During runtime, the MHI endpoint may be powered up/down several times.
So instead of allocating and destroying the IRQs all the time, let's just
enable/disable IRQs during power up/down.
The IRQs will be allocated during mhi_register_controller() and freed
during mhi_unregister_controller(). This works well for things like PCI
hotplug also as once the PCI device gets removed, the controller will
get unregistered. And once it comes back, it will get registered back
and even if the IRQ configuration changes (MSI), that will get accounted.
Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/1655952183-66792-1-git-send-email-quic_qianyu@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
MHI Host
--------
Support for new modems:
- Foxconn Cinterion MV32-WA/MV32-WB based on SDX62/SDX65
- Telit FN980 v1 based on SDX55
- Telit FN990 based on SDX65
- Foxconn T99W373/T99W368 based on SDX62/SDX65
Core changes:
- During the recycle of event ring elements, compute the ctxt_wp based on the
local cached value instead of reading from shared memory. This is to prevent
the possible corruption of the ctxt_wp as some of the endpoint devices could
modify the value in shared memory.
- Add sysfs support for resetting the endpoint based on the MHI spec. The MHI
spec allows the host to hard reset the device in the case of an unrecoverable
error and all other reset mechanisms have failed.
- During MHI shutdown, wait for the endpoint device to enter the ready state
post reset before proceeding. This is to avoid a possible race where host
would remove the interrupt handler and device will send ready state
interrupt, resulting in IOMMU fault.
- Bail out updating the MHI register if the read has failed during
read/modify/write.
- Use mhi_write_reg() instead of mhi_write_reg_field() for writing the whole
register fields in mhi_init_mmio().
MAINTAINERS change:
- Since Qualcomm has moved the email domain for its employess from codeaurora
domain to quicinc, update the same for Hemant.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEZ6VDKoFIy9ikWCeXVZ8R5v6RzvUFAmJ6AVoACgkQVZ8R5v6R
zvW6mAf+J9D7EWR4W8YTJwN3Pdmu0tkeYQXg3ThOrDFLnljBOle0j3wOdHdMRR5t
vzn66NfPJIJkugff0diasE07qRZk3ZV7xw+1IZMnHVNCTj4lOyO12jcz/WVIybjl
6klnm/YEXdmtK5Owt01yNIA1GZYRdfmZw2E2hR6T05Pe/ov2wff/CJ6UcHFrb3Es
0AdxQ/kpMhQvOzv53wB6ZYL611mMJMrDtcGb5pOH0LWJYN8nbhvh5JfULKBuyWCb
NsZschX0veYA82OfxhejOiarO6/qKi6cdIfaqzbx26WxfGV2A4Iiqj5UzhTX3sWO
zl0HyVPKtZIWOm3WvygvXR/1tvdMgg==
=6jsl
-----END PGP SIGNATURE-----
Merge tag 'mhi-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-work-next
Manivannan writes:
MHI changes for v5.19
MHI Host
--------
Support for new modems:
- Foxconn Cinterion MV32-WA/MV32-WB based on SDX62/SDX65
- Telit FN980 v1 based on SDX55
- Telit FN990 based on SDX65
- Foxconn T99W373/T99W368 based on SDX62/SDX65
Core changes:
- During the recycle of event ring elements, compute the ctxt_wp based on the
local cached value instead of reading from shared memory. This is to prevent
the possible corruption of the ctxt_wp as some of the endpoint devices could
modify the value in shared memory.
- Add sysfs support for resetting the endpoint based on the MHI spec. The MHI
spec allows the host to hard reset the device in the case of an unrecoverable
error and all other reset mechanisms have failed.
- During MHI shutdown, wait for the endpoint device to enter the ready state
post reset before proceeding. This is to avoid a possible race where host
would remove the interrupt handler and device will send ready state
interrupt, resulting in IOMMU fault.
- Bail out updating the MHI register if the read has failed during
read/modify/write.
- Use mhi_write_reg() instead of mhi_write_reg_field() for writing the whole
register fields in mhi_init_mmio().
MAINTAINERS change:
- Since Qualcomm has moved the email domain for its employess from codeaurora
domain to quicinc, update the same for Hemant.
* tag 'mhi-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi: (29 commits)
bus: mhi: host: Add support for Foxconn T99W373 and T99W368
bus: mhi: host: pci_generic: add Telit FN990
bus: mhi: host: pci_generic: add Telit FN980 v1 hardware revision
bus: mhi: host: Add support for Cinterion MV32-WA/MV32-WB
bus: mhi: host: Optimize and update MMIO register write method
bus: mhi: host: Bail on writing register fields if read fails
bus: mhi: host: Wait for ready state after reset
bus: mhi: host: Add soc_reset sysfs
bus: mhi: host: pci_generic: Sort mhi_pci_id_table based on the PID
bus: mhi: host: Use cached values for calculating the shared write pointer
MAINTAINERS: Update Hemant's email id
bus: mhi: ep: Add uevent support for module autoloading
bus: mhi: ep: Add support for suspending and resuming channels
bus: mhi: ep: Add support for queueing SKBs to the host
bus: mhi: ep: Add support for processing channel rings
bus: mhi: ep: Add support for reading from the host
bus: mhi: ep: Add support for processing command rings
bus: mhi: ep: Add support for handling SYS_ERR condition
bus: mhi: ep: Add support for handling MHI_RESET
bus: mhi: ep: Add support for powering down the MHI endpoint stack
...
Fix following coccicheck warning:
./drivers/bus/mhi/host/init.c:89:8-16: WARNING: use scnprintf or sprintf
Use sysfs_emit and sysfs_emit_at instead of snprintf.
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Link: https://lore.kernel.org/r/20220426125902.681258-1-wanjiabing@vivo.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As of now, MMIO writes done after ready state transition use the
mhi_write_reg_field() API even though the whole register is being
written in most cases. Optimize this process by using mhi_write_reg()
API instead for those writes and use the mhi_write_reg_field()
API for MHI config registers only.
Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/1650304226-11080-3-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Helper API to write register fields relies on successful reads
of the register/address prior to the write. Bail out if a failure
is seen when reading the register before the actual write is
performed.
Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/1650304226-11080-2-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
The MHI bus supports a standardized hardware reset, which is known as the
"SoC Reset". This reset is similar to the reset sysfs for PCI devices -
a hardware mechanism to reset the state back to square one.
The MHI SoC Reset is described in the spec as a reset of last resort. If
some unrecoverable error has occurred where other resets have failed, SoC
Reset is the "big hammer" that ungracefully resets the device. This is
effectivly the same as yanking the power on the device, and reapplying it.
However, depending on the nature of the particular issue, the underlying
transport link may remain active and configured. If the link remains up,
the device will flag a MHI system error early in the boot process after
the reset is executed, which allows the MHI bus to process a fatal error
event, and clean up appropiately.
While the SoC Reset is generally intended as a means of recovery when all
else has failed, it can be useful in non-error scenarios. For example,
if the device loads firmware from the host filesystem, the device may need
to be fully rebooted inorder to pick up the new firmware. In this
scenario, the system administrator may use the soc_reset sysfs to cause
the device to pick up the new firmware that the admin placed on the
filesystem.
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Bhaumik Bhatt <quic_bbhatt@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/1650302327-30439-1-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
mhi_state_str[] array could be used by MHI endpoint stack also. So let's
make the array as "static inline function" and move it inside the
"common.h" header so that the endpoint stack could also make use of it.
Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20220301160308.107452-11-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Structure "struct mhi_tre" is representing a generic MHI ring element and
not specifically a Transfer Ring Element (TRE). Fix the naming.
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20220301160308.107452-9-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Functions like mhi_read_reg_field(), mhi_poll_reg_field() and
mhi_write_reg_field() could be modified to not depend on the shift value
passed as an argument. Instead, the bitfield operation could be used to
extract the shift value from the mask itself.
This eliminates the need to define _SHIFT (and _SHFT) macros and
simplifies the code a bit. For shift values those cannot be determined
during build time, "__ffs()" helper is used find the shift value during
runtime.
While at it, let's also get rid of 32-bit masks like CHDBOFF_CHDBOFF_MASK
by doing the full 32-bit register read.
Suggested-by: Alex Elder <elder@linaro.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20220301160308.107452-6-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In preparation of the endpoint MHI support, let's move the host MHI code
to its own "host" directory and adjust the toplevel MHI Kconfig & Makefile.
While at it, let's also move the "pci_generic" driver to "host" directory
as it is a host MHI controller driver.
Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20220301160308.107452-5-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>