OpenCloudOS-Kernel/arch/s390/pci
Niklas Schnelle 4cdf2f4e24 s390/pci: implement minimal PCI error recovery
When the platform detects an error on a PCI function or a service action
has been performed it is put in the error state and an error event
notification is provided to the OS.

Currently we treat all error event notifications the same and simply set
pdev->error_state = pci_channel_io_perm_failure requiring user
intervention such as use of the recover attribute to get the device
usable again. Despite requiring a manual step this also has the
disadvantage that the device is completely torn down and recreated
resulting in higher level devices such as a block or network device
being recreated. In case of a block device this also means that it may
need to be removed and added to a software raid even if that could
otherwise survive with a temporary degradation.

This is of course not ideal more so since an error notification with PEC
0x3A indicates that the platform already performed error recovery
successfully or that the error state was caused by a service action that
is now finished.

At least in this case we can assume that the error state can be reset
and the function made usable again. So as not to have the disadvantage
of a full tear down and recreation we need to coordinate this recovery
with the driver. Thankfully there is already a well defined recovery
flow for this described in Documentation/PCI/pci-error-recovery.rst.

The implementation of this is somewhat straight forward and simplified
by the fact that our recovery flow is defined per PCI function. As
a reset we use the newly introduced zpci_hot_reset_device() which also
takes the PCI function out of the error state.

Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Acked-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-11-08 14:17:49 +01:00
..
Makefile s390/pci: consolidate SR-IOV specific code 2020-09-14 11:38:34 +02:00
pci.c s390/pci: implement minimal PCI error recovery 2021-11-08 14:17:49 +01:00
pci_bus.c s390/pci: improve DMA translation init and exit 2021-08-25 11:03:34 +02:00
pci_bus.h s390/pci: fix use after free of zpci_dev 2021-08-18 10:12:42 +02:00
pci_clp.c s390/pci: read clp_list_pci_req only once 2021-09-07 13:38:42 +02:00
pci_debug.c locking/atomic, s390/pci: Remove redundant casts 2019-06-03 12:32:57 +02:00
pci_dma.c s390/pci: add s390_iommu_aperture kernel parameter 2021-10-26 15:21:30 +02:00
pci_event.c s390/pci: implement minimal PCI error recovery 2021-11-08 14:17:49 +01:00
pci_insn.c s390/pci: refresh function handle in iomap 2021-11-08 14:17:49 +01:00
pci_iov.c s390/pci: add missing pci_iov.h include 2020-09-16 14:08:47 +02:00
pci_iov.h s390/pci: consolidate SR-IOV specific code 2020-09-14 11:38:34 +02:00
pci_irq.c s390/pci: implement reset_slot for hotplug slot 2021-11-08 14:17:49 +01:00
pci_mmio.c s390/pci_mmio: fully validate the VMA before calling follow_pte() 2021-09-15 14:29:21 +02:00
pci_sysfs.c s390/pci: tolerate inconsistent handle in recover 2021-10-04 09:49:36 +02:00