OpenCloudOS-Kernel/drivers/iommu
Ethan Zhao 220ab143d3 iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected
commit 4fc82cd907ac075648789cc3a00877778aa1838b upstream.

For those endpoint devices connect to system via hotplug capable ports,
users could request a hot reset to the device by flapping device's link
through setting the slot's link control register, as pciehp_ist() DLLSC
interrupt sequence response, pciehp will unload the device driver and
then power it off. thus cause an IOMMU device-TLB invalidation (Intel
VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence
target device to be sent and deadly loop to retry that request after ITE
fault triggered in interrupt context.

That would cause following continuous hard lockup warning and system hang

[ 4211.433662] pcieport 0000:17:01.0: pciehp: Slot(108): Link Down
[ 4211.433664] pcieport 0000:17:01.0: pciehp: Slot(108): Card not present
[ 4223.822591] NMI watchdog: Watchdog detected hard LOCKUP on cpu 144
[ 4223.822622] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S
         OE    kernel version xxxx
[ 4223.822623] Hardware name: vendorname xxxx 666-106,
BIOS 01.01.02.03.01 05/15/2023
[ 4223.822623] RIP: 0010:qi_submit_sync+0x2c0/0x490
[ 4223.822624] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 c1 48 8b
 57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> f6 c6 1
0 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39
[ 4223.822624] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093
[ 4223.822625] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 0000000000000005
[ 4223.822625] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: ffff9f38401a8340
[ 4223.822625] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 0000000000000000
[ 4223.822626] R10: 0000000000000010 R11: 0000000000000018 R12: ffff9f384005e200
[ 4223.822626] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000004
[ 4223.822626] FS:  0000000000000000(0000) GS:ffffa237ae400000(0000)
knlGS:0000000000000000
[ 4223.822627] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4223.822627] CR2: 00007ffe86515d80 CR3: 000002fd3000a001 CR4: 0000000000770ee0
[ 4223.822627] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4223.822628] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 4223.822628] PKRU: 55555554
[ 4223.822628] Call Trace:
[ 4223.822628]  qi_flush_dev_iotlb+0xb1/0xd0
[ 4223.822628]  __dmar_remove_one_dev_info+0x224/0x250
[ 4223.822629]  dmar_remove_one_dev_info+0x3e/0x50
[ 4223.822629]  intel_iommu_release_device+0x1f/0x30
[ 4223.822629]  iommu_release_device+0x33/0x60
[ 4223.822629]  iommu_bus_notifier+0x7f/0x90
[ 4223.822630]  blocking_notifier_call_chain+0x60/0x90
[ 4223.822630]  device_del+0x2e5/0x420
[ 4223.822630]  pci_remove_bus_device+0x70/0x110
[ 4223.822630]  pciehp_unconfigure_device+0x7c/0x130
[ 4223.822631]  pciehp_disable_slot+0x6b/0x100
[ 4223.822631]  pciehp_handle_presence_or_link_change+0xd8/0x320
[ 4223.822631]  pciehp_ist+0x176/0x180
[ 4223.822631]  ? irq_finalize_oneshot.part.50+0x110/0x110
[ 4223.822632]  irq_thread_fn+0x19/0x50
[ 4223.822632]  irq_thread+0x104/0x190
[ 4223.822632]  ? irq_forced_thread_fn+0x90/0x90
[ 4223.822632]  ? irq_thread_check_affinity+0xe0/0xe0
[ 4223.822633]  kthread+0x114/0x130
[ 4223.822633]  ? __kthread_cancel_work+0x40/0x40
[ 4223.822633]  ret_from_fork+0x1f/0x30
[ 4223.822633] Kernel panic - not syncing: Hard LOCKUP
[ 4223.822634] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S
         OE     kernel version xxxx
[ 4223.822634] Hardware name: vendorname xxxx 666-106,
BIOS 01.01.02.03.01 05/15/2023
[ 4223.822634] Call Trace:
[ 4223.822634]  <NMI>
[ 4223.822635]  dump_stack+0x6d/0x88
[ 4223.822635]  panic+0x101/0x2d0
[ 4223.822635]  ? ret_from_fork+0x11/0x30
[ 4223.822635]  nmi_panic.cold.14+0xc/0xc
[ 4223.822636]  watchdog_overflow_callback.cold.8+0x6d/0x81
[ 4223.822636]  __perf_event_overflow+0x4f/0xf0
[ 4223.822636]  handle_pmi_common+0x1ef/0x290
[ 4223.822636]  ? __set_pte_vaddr+0x28/0x40
[ 4223.822637]  ? flush_tlb_one_kernel+0xa/0x20
[ 4223.822637]  ? __native_set_fixmap+0x24/0x30
[ 4223.822637]  ? ghes_copy_tofrom_phys+0x70/0x100
[ 4223.822637]  ? __ghes_peek_estatus.isra.16+0x49/0xa0
[ 4223.822637]  intel_pmu_handle_irq+0xba/0x2b0
[ 4223.822638]  perf_event_nmi_handler+0x24/0x40
[ 4223.822638]  nmi_handle+0x4d/0xf0
[ 4223.822638]  default_do_nmi+0x49/0x100
[ 4223.822638]  exc_nmi+0x134/0x180
[ 4223.822639]  end_repeat_nmi+0x16/0x67
[ 4223.822639] RIP: 0010:qi_submit_sync+0x2c0/0x490
[ 4223.822639] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 c1 48 8b
 57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> f6 c6 10
 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39
[ 4223.822640] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093
[ 4223.822640] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 0000000000000005
[ 4223.822640] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: ffff9f38401a8340
[ 4223.822641] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 0000000000000000
[ 4223.822641] R10: 0000000000000010 R11: 0000000000000018 R12: ffff9f384005e200
[ 4223.822641] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000004
[ 4223.822641]  ? qi_submit_sync+0x2c0/0x490
[ 4223.822642]  ? qi_submit_sync+0x2c0/0x490
[ 4223.822642]  </NMI>
[ 4223.822642]  qi_flush_dev_iotlb+0xb1/0xd0
[ 4223.822642]  __dmar_remove_one_dev_info+0x224/0x250
[ 4223.822643]  dmar_remove_one_dev_info+0x3e/0x50
[ 4223.822643]  intel_iommu_release_device+0x1f/0x30
[ 4223.822643]  iommu_release_device+0x33/0x60
[ 4223.822643]  iommu_bus_notifier+0x7f/0x90
[ 4223.822644]  blocking_notifier_call_chain+0x60/0x90
[ 4223.822644]  device_del+0x2e5/0x420
[ 4223.822644]  pci_remove_bus_device+0x70/0x110
[ 4223.822644]  pciehp_unconfigure_device+0x7c/0x130
[ 4223.822644]  pciehp_disable_slot+0x6b/0x100
[ 4223.822645]  pciehp_handle_presence_or_link_change+0xd8/0x320
[ 4223.822645]  pciehp_ist+0x176/0x180
[ 4223.822645]  ? irq_finalize_oneshot.part.50+0x110/0x110
[ 4223.822645]  irq_thread_fn+0x19/0x50
[ 4223.822646]  irq_thread+0x104/0x190
[ 4223.822646]  ? irq_forced_thread_fn+0x90/0x90
[ 4223.822646]  ? irq_thread_check_affinity+0xe0/0xe0
[ 4223.822646]  kthread+0x114/0x130
[ 4223.822647]  ? __kthread_cancel_work+0x40/0x40
[ 4223.822647]  ret_from_fork+0x1f/0x30
[ 4223.822647] Kernel Offset: 0x6400000 from 0xffffffff81000000 (relocation
range: 0xffffffff80000000-0xffffffffbfffffff)

Such issue could be triggered by all kinds of regular surprise removal
hotplug operation. like:

1. pull EP(endpoint device) out directly.
2. turn off EP's power.
3. bring the link down.
etc.

this patch aims to work for regular safe removal and surprise removal
unplug. these hot unplug handling process could be optimized for fix the
ATS Invalidation hang issue by calling pci_dev_is_disconnected() in
function devtlb_invalidation_with_pasid() to check target device state to
avoid sending meaningless ATS Invalidation request to iommu when device is
gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)

For safe removal, device wouldn't be removed until the whole software
handling process is done, it wouldn't trigger the hard lock up issue
caused by too long ATS Invalidation timeout wait. In safe removal path,
device state isn't set to pci_channel_io_perm_failure in
pciehp_unconfigure_device() by checking 'presence' parameter, calling
pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return
false there, wouldn't break the function.

For surprise removal, device state is set to pci_channel_io_perm_failure in
pciehp_unconfigure_device(), means device is already gone (disconnected)
call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
return true to break the function not to send ATS Invalidation request to
the disconnected device blindly, thus avoid to trigger further ITE fault,
and ITE fault will block all invalidation request to be handled.
furthermore retry the timeout request could trigger hard lockup.

safe removal (present) & surprise removal (not present)

pciehp_ist()
   pciehp_handle_presence_or_link_change()
     pciehp_disable_slot()
       remove_board()
         pciehp_unconfigure_device(presence) {
           if (!presence)
                pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
           }

this patch works for regular safe removal and surprise removal of ATS
capable endpoint on PCIe switch downstream ports.

Intel-SIG: commit 4fc82cd907ac iommu/vt-d: Don't issue ATS Invalidation
request when device is disconnected
Backport for SPR/EMR/GNR support.

Fixes: 6f7db75e1c ("iommu/vt-d: Add second level page table interface")
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Tested-by: Haorong Ye <yehaorong@bytedance.com>
Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
Link: https://lore.kernel.org/r/20240301080727.3529832-3-haifeng.zhao@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit 4fc82cd907ac075648789cc3a00877778aa1838b)
Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
2024-06-12 19:28:41 +08:00
..
intel iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected 2024-06-12 19:28:41 +08:00
Kconfig ioasid: Add /dev/ioasid for userspace 2024-06-11 21:14:18 +08:00
Makefile iommu: add domain argument to page response 2024-06-11 21:14:24 +08:00
amd_iommu.c iommu: Add capability IOMMU_CAP_VIOMMU_HINT 2024-06-11 21:16:09 +08:00
amd_iommu.h iommu/amd: Override wrong IVRS IOAPIC on Raven Ridge systems 2019-08-23 10:26:48 +02:00
amd_iommu_debugfs.c iommu/amd: Add basic debugfs infrastructure for AMD IOMMU 2018-07-06 14:06:30 +02:00
amd_iommu_init.c iommu: Restore intel_iommu_strict and remove iommu_set_dma_strict() 2024-06-11 21:13:58 +08:00
amd_iommu_proto.h drm, iommu: Change type of pasid to u32 2024-06-11 21:07:28 +08:00
amd_iommu_quirks.c iommu/amd: Apply the same IVRS IOAPIC workaround to Acer Aspire A315-41 2019-10-30 10:24:03 +01:00
amd_iommu_types.h iommu/amd: Add support for IOMMU default DMA mode build options 2024-06-11 21:13:55 +08:00
amd_iommu_v2.c drm, iommu: Change type of pasid to u32 2024-06-11 21:07:28 +08:00
arm-smmu-impl.c iommu/arm-smmu: Make private implementation details static 2019-08-20 10:58:03 +01:00
arm-smmu-v3.c iommu/arm-smmu: Drop IOVA cookie management 2024-06-11 21:08:20 +08:00
arm-smmu.c iommu/arm-smmu: Drop IOVA cookie management 2024-06-11 21:08:20 +08:00
arm-smmu.h Merge branches 'for-joerg/arm-smmu/smmu-v2' and 'for-joerg/arm-smmu/smmu-v3' into for-joerg/arm-smmu/updates 2019-08-23 15:05:45 +01:00
dma-iommu.c iommu/vtd: Fix vIOMMU GIOVA by avoiding reserved IOASIDs 2024-06-11 21:16:57 +08:00
exynos-iommu.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
fsl_pamu.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 266 2019-06-05 17:30:28 +02:00
fsl_pamu.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 266 2019-06-05 17:30:28 +02:00
fsl_pamu_domain.c iommu: remove the unused domain_window_disable method 2024-06-11 21:13:48 +08:00
fsl_pamu_domain.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 266 2019-06-05 17:30:28 +02:00
hyperv-iommu.c x86: Kill all traces of irq_remapping_get_irq_domain() 2024-06-11 21:07:37 +08:00
io-pgfault.c iommu: Fix implicit declaration of function 'mmap_read_lock' 2024-06-11 21:08:12 +08:00
io-pgtable-arm-v7s.c Merge branch 'arm/smmu' into arm/mediatek 2019-08-30 16:12:10 +02:00
io-pgtable-arm.c iommu/io-pgtable-arm: Support all Mali configurations 2019-10-01 12:16:47 +01:00
io-pgtable.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234 2019-06-19 17:09:07 +02:00
ioasid.c ioasid: Reject null notifier call registration 2024-06-11 21:17:21 +08:00
ioasid_user.c Add back quota to avoid no space issue when trying to open /dev/ioasid in the second time. 2024-06-11 21:14:33 +08:00
iommu-debugfs.c iommu: Fix IOMMU debugfs fallout 2019-02-26 11:15:58 +01:00
iommu-sva-lib.c iommu/ioasid: Redefine IOASID set and allocation APIs 2024-06-11 21:14:12 +08:00
iommu-sva-lib.h Revert "iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit" 2024-06-11 21:14:10 +08:00
iommu-sysfs.c drivers/iommu: Export core IOMMU API symbols to permit modular drivers 2024-06-11 21:06:30 +08:00
iommu-traces.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
iommu.c iommu/vt-d: Only enable SLT if guest is passthrough mode 2024-06-11 21:18:02 +08:00
iova.c iommu: Allow enabling non-strict mode dynamically 2024-06-11 21:14:00 +08:00
ipmmu-vmsa.c iommu/ipmmu-vmsa: Remove dev_err() on platform_get_irq() failure 2019-10-30 10:16:37 +01:00
irq_remapping.c x86: Kill all traces of irq_remapping_get_irq_domain() 2024-06-11 21:07:37 +08:00
irq_remapping.h x86: Kill all traces of irq_remapping_get_irq_domain() 2024-06-11 21:07:37 +08:00
msm_iommu.c Merge branch 'for-joerg/batched-unmap' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into core 2019-08-20 11:09:43 +02:00
msm_iommu.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 267 2019-06-05 17:30:29 +02:00
msm_iommu_hw-8xxx.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 267 2019-06-05 17:30:29 +02:00
mtk_iommu.c tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
mtk_iommu.h tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
mtk_iommu_v1.c Devicetree updates for v5.4: 2019-09-19 13:48:37 -07:00
of_iommu.c iommu: Remove unused of_get_dma_window() 2024-06-11 21:13:43 +08:00
omap-iommu-debug.c ock: sync codes to ock 5.4.119-20.0009.21 2024-06-11 20:27:38 +08:00
omap-iommu.c iommu: Fix some W=1 warnings 2024-06-11 21:12:36 +08:00
omap-iommu.h iommu/omap: add support for late attachment of iommu devices 2019-08-09 17:37:10 +02:00
omap-iopgtable.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
qcom_iommu.c iommu/arm-smmu: Drop IOVA cookie management 2024-06-11 21:08:20 +08:00
rockchip-iommu.c tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
s390-iommu.c iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync() 2019-07-29 17:22:52 +01:00
tegra-gart.c iommu: Switch gather->end to the inclusive end 2024-06-11 21:17:24 +08:00
tegra-smmu.c tkernel: add base tlinux kernel interfaces 2024-06-11 20:09:33 +08:00
virtio-iommu.c iommu: Add capability IOMMU_CAP_VIOMMU_HINT 2024-06-11 21:16:09 +08:00