OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Zhen Lei	f5c858ec2b	vfio/platform: Fix spelling mistake "registe" -> "register" There is a spelling mistake in a comment, fix it. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210326083528.1329-5-thunder.leizhen@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:53:50 -06:00
Zhen Lei	d0915b3291	vfio/pci: fix a couple of spelling mistakes There are several spelling mistakes, as follows: thru ==> through presense ==> presence Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210326083528.1329-4-thunder.leizhen@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:53:50 -06:00
Zhen Lei	d0a7541dd9	vfio/mdev: Fix spelling mistake "interal" -> "internal" There is a spelling mistake in a comment, fix it. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210326083528.1329-3-thunder.leizhen@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:53:50 -06:00
Zhen Lei	06d738c8ab	vfio/type1: fix a couple of spelling mistakes There are several spelling mistakes, as follows: userpsace ==> userspace Accouting ==> Accounting exlude ==> exclude Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210326083528.1329-2-thunder.leizhen@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:53:50 -06:00
Fred Gao	bab2c1990b	vfio/pci: Add support for opregion v2.1+ Before opregion version 2.0 VBT data is stored in opregion mailbox #4, but when VBT data exceeds 6KB size and cannot be within mailbox #4 then from opregion v2.0+, Extended VBT region, next to opregion is used to hold the VBT data, so the total size will be opregion size plus extended VBT region size. Since opregion v2.0 with physical host VBT address would not be practically available for end user and guest can not directly access host physical address, so it is not supported. Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Swee Yee Fonn <swee.yee.fonn@intel.com> Signed-off-by: Fred Gao <fred.gao@intel.com> Message-Id: <20210325170953.24549-1-fred.gao@intel.com> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:53:50 -06:00
Zhou Wang	36f0be5a30	vfio/pci: Remove an unnecessary blank line in vfio_pci_enable This blank line is unnecessary, so remove it. Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Message-Id: <1615808073-178604-1-git-send-email-wangzhou1@hisilicon.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:53:50 -06:00
Bhaskar Chowdhury	fbc9d37161	vfio: pci: Spello fix in the file vfio_pci.c s/permision/permission/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Message-Id: <20210314052925.3560-1-unixbhaskar@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-04-06 11:53:49 -06:00
Jason Gunthorpe	e0146a108c	vfio/nvlink: Add missing SPAPR_TCE_IOMMU depends Compiling the nvlink stuff relies on the SPAPR_TCE_IOMMU otherwise there are compile errors: drivers/vfio/pci/vfio_pci_nvlink2.c:101:10: error: implicit declaration of function 'mm_iommu_put' [-Werror,-Wimplicit-function-declaration] ret = mm_iommu_put(data->mm, data->mem); As PPC only defines these functions when the config is set. Previously this wasn't a problem by chance as SPAPR_TCE_IOMMU was the only IOMMU that could have satisfied IOMMU_API on POWERNV. Fixes: `179209fa12` ("vfio: IOMMU_API should be selected") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <0-v1-83dba9768fc3+419-vfio_nvlink2_kconfig_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-03-29 14:48:00 -06:00
Daniel Jordan	60c988bc15	vfio/type1: Empty batch for pfnmap pages When vfio_pin_pages_remote() returns with a partial batch consisting of a single VM_PFNMAP pfn, a subsequent call will unfortunately try restoring it from batch->pages, resulting in vfio mapping the wrong page and unbalancing the page refcount. Prevent the function from returning with this kind of partial batch to avoid the issue. There's no explicit check for a VM_PFNMAP pfn because it's awkward to do so, so infer it from characteristics of the batch instead. This may result in occasional false positives but keeps the code simpler. Fixes: `4d83de6da2` ("vfio/type1: Batch page pinning") Link: https://lkml.kernel.org/r/20210323133254.33ed9161@omen.home.shazbot.org/ Reported-by: Alex Williamson <alex.williamson@redhat.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Message-Id: <20210325010552.185481-1-daniel.m.jordan@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-03-25 12:48:38 -06:00
Daniel Jordan	4ab4fcfce5	vfio/type1: fix vaddr_get_pfns() return in vfio_pin_page_external() vaddr_get_pfns() now returns the positive number of pfns successfully gotten instead of zero. vfio_pin_page_external() might return 1 to vfio_iommu_type1_pin_pages(), which will treat it as an error, if vaddr_get_pfns() is successful but vfio_pin_page_external() doesn't reach vfio_lock_acct(). Fix it up in vfio_pin_page_external(). Found by inspection. Fixes: `be16c1fd99` ("vfio/type1: Change success value of vaddr_get_pfn()") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Message-Id: <20210308172452.38864-1-daniel.m.jordan@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-03-16 10:39:29 -06:00
Jason Gunthorpe	b2b12db535	vfio: Depend on MMU VFIO_IOMMU_TYPE1 does not compile with !MMU: ../drivers/vfio/vfio_iommu_type1.c: In function 'follow_fault_pfn': ../drivers/vfio/vfio_iommu_type1.c:536:22: error: implicit declaration of function 'pte_write'; did you mean 'vfs_write'? [-Werror=implicit-function-declaration] So require it. Suggested-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <0-v1-02cb5500df6e+78-vfio_no_mmu_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-03-16 10:39:28 -06:00
Jason Gunthorpe	3b49dfb08c	ARM: amba: Allow some ARM_AMBA users to compile with COMPILE_TEST CONFIG_VFIO_AMBA has a light use of AMBA, adding some inline fallbacks when AMBA is disabled will allow it to be compiled under COMPILE_TEST and make VFIO easier to maintain. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <3-v1-df057e0f92c3+91-vfio_arm_compile_test_jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-03-16 10:39:28 -06:00
Jason Gunthorpe	d3d72a6dff	vfio-platform: Add COMPILE_TEST to VFIO_PLATFORM x86 can build platform bus code too, so vfio-platform and all the platform reset implementations compile successfully on x86. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <2-v1-df057e0f92c3+91-vfio_arm_compile_test_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-03-16 10:39:28 -06:00
Jason Gunthorpe	179209fa12	vfio: IOMMU_API should be selected As IOMMU_API is a kconfig without a description (eg does not show in the menu) the correct operator is select not 'depends on'. Using 'depends on' for this kind of symbol means VFIO is not selectable unless some other random kconfig has already enabled IOMMU_API for it. Fixes: `cba3345cc4` ("vfio: VFIO core") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <1-v1-df057e0f92c3+91-vfio_arm_compile_test_jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-03-16 10:39:27 -06:00
Steve Sistare	7dc4b2fdb2	vfio/type1: fix unmap all on ILP32 Some ILP32 architectures support mapping a 32-bit vaddr within a 64-bit iova space. The unmap-all code uses 32-bit SIZE_MAX as an upper bound on the extent of the mappings within iova space, so mappings above 4G cannot be found and unmapped. Use U64_MAX instead, and use u64 for size variables. This also fixes a static analysis bug found by the kernel test robot running smatch for ILP32. Fixes: `0f53afa12b` ("vfio/type1: unmap cleanup") Fixes: `c196509953` ("vfio/type1: implement unmap all") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Message-Id: <1614281102-230747-1-git-send-email-steven.sistare@oracle.com> Link: https://lore.kernel.org/linux-mm/20210222141043.GW2222@kadam Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-03-16 10:39:27 -06:00
Linus Torvalds	719bbd4a50	VFIO updates for v5.12-rc1 - Virtual address update handling (Steve Sistare) - s390/zpci fixes and cleanups (Max Gurtovoy) - Fixes for dirty bitmap handling, non-mdev page pinning, and improved pinned dirty scope tracking (Keqian Zhu) - Batched page pinning enhancement (Daniel Jordan) - Page access permission fix (Alex Williamson) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJgNpAEAAoJECObm247sIsiDEsP/1G0QJIum3KqG0+ABHgSS7ks j3oKeLxDl2BGeDBw2yIfinif1fjtafmUWg3Q0RlVRv0S71ccu7Ee4MfAHqy8k7Gp BM/G+2Amnrz1qWsgEV2JGw8T2wwZDG8ZJluh0sxj2KFqI99jWKftlPH4D8TTJeDj VrsFHzQlpcILFBh9Mj5zWFkIuqm2/70O7FJF3jhyN2b0MjYG/f390k0TLQZS+Mkr l+6pfIZ3pHYngzro8pX56B1z3c1mJEeRChMPt7IdTVruBcGkUCMXrZKZVN2WqoOf Otj6Mxvq5Wur8Rk9VfKs2fO/oz9FJjr5/sL4Vv7xUigWe9nDXBnoy+OR4XJUwxEf BaB4tK8f9xTJcf8MrK+eOpBvMSx7eE0qnP/7VMtykC7Cw57qdhCuzEq7ueUGKuVw ubj+pjHcAx6T2urjL7KdzuJUMNPkafATi8hN/Bj6oshESZuhM2lSCHiqI4ZQnh5H TPMWpb2dX/ohRkcnQdO9N2T2+Lcg6tmD4Kqigv+75zzDj+U15Ph2owtnmH5OFJIG BCtibsX2yk6UuxBPvl8eN0X7n41G6gwJcsD6spuaoateK6UTJugjTCZtKB96YMFQ c4eULO+hvUIiQJkWbbpFA+mXUcLwcpEoT2pWfuj3MET0FuHVtEhbGEO609gGAAWI GMheKjGI+GRW07JFwgCV =ei4J -----END PGP SIGNATURE----- Merge tag 'vfio-v5.12-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updatesfrom Alex Williamson: - Virtual address update handling (Steve Sistare) - s390/zpci fixes and cleanups (Max Gurtovoy) - Fixes for dirty bitmap handling, non-mdev page pinning, and improved pinned dirty scope tracking (Keqian Zhu) - Batched page pinning enhancement (Daniel Jordan) - Page access permission fix (Alex Williamson) * tag 'vfio-v5.12-rc1' of git://github.com/awilliam/linux-vfio: (21 commits) vfio/type1: Batch page pinning vfio/type1: Prepare for batched pinning with struct vfio_batch vfio/type1: Change success value of vaddr_get_pfn() vfio/type1: Use follow_pte() vfio/pci: remove CONFIG_VFIO_PCI_ZDEV from Kconfig vfio/iommu_type1: Fix duplicate included kthread.h vfio-pci/zdev: fix possible segmentation fault issue vfio-pci/zdev: remove unused vdev argument vfio/pci: Fix handling of pci use accessor return codes vfio/iommu_type1: Mantain a counter for non_pinned_groups vfio/iommu_type1: Fix some sanity checks in detach group vfio/iommu_type1: Populate full dirty when detach non-pinned group vfio/type1: block on invalid vaddr vfio/type1: implement notify callback vfio: iommu driver notify callback vfio/type1: implement interfaces to update vaddr vfio/type1: massage unmap iteration vfio: interfaces to update vaddr vfio/type1: implement unmap all vfio/type1: unmap cleanup ...	2021-02-24 10:43:40 -08:00
Daniel Jordan	4d83de6da2	vfio/type1: Batch page pinning Pinning one 4K page at a time is inefficient, so do it in batches of 512 instead. This is just an optimization with no functional change intended, and in particular the driver still calls iommu_map() with the largest physically contiguous range possible. Add two fields in vfio_batch to remember where to start between calls to vfio_pin_pages_remote(), and use vfio_batch_unpin() to handle remaining pages in the batch in case of error. qemu pins pages for guests around 8% faster on my test system, a two-node Broadwell server with 128G memory per node. The qemu process was bound to one node with its allocations constrained there as well. base test guest ---------------- ---------------- mem (GB) speedup avg sec (std) avg sec (std) 1 7.4% 0.61 (0.00) 0.56 (0.00) 2 8.3% 0.93 (0.00) 0.85 (0.00) 4 8.4% 1.46 (0.00) 1.34 (0.00) 8 8.6% 2.54 (0.01) 2.32 (0.00) 16 8.3% 4.66 (0.00) 4.27 (0.01) 32 8.3% 8.94 (0.01) 8.20 (0.01) 64 8.2% 17.47 (0.01) 16.04 (0.03) 120 8.5% 32.45 (0.13) 29.69 (0.01) perf diff confirms less time spent in pup. Here are the top ten functions: Baseline Delta Abs Symbol 78.63% +6.64% clear_page_erms 1.50% -1.50% __gup_longterm_locked 1.27% -0.78% __get_user_pages +0.76% kvm_zap_rmapp.constprop.0 0.54% -0.53% vmacache_find 0.55% -0.51% get_pfnblock_flags_mask 0.48% -0.48% __get_user_pages_remote +0.39% slot_rmap_walk_next +0.32% vfio_pin_map_dma +0.26% kvm_handle_hva_range ... Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-22 16:30:47 -07:00
Daniel Jordan	4b6c33b322	vfio/type1: Prepare for batched pinning with struct vfio_batch Get ready to pin more pages at once with struct vfio_batch, which represents a batch of pinned pages. The struct has a fallback page pointer to avoid two unlikely scenarios: pointlessly allocating a page if disable_hugepages is enabled or failing the whole pinning operation if the kernel can't allocate memory. vaddr_get_pfn() becomes vaddr_get_pfns() to prepare for handling multiple pages, though for now only one page is stored in the pages array. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-22 16:30:45 -07:00
Daniel Jordan	be16c1fd99	vfio/type1: Change success value of vaddr_get_pfn() vaddr_get_pfn() simply returns 0 on success. Have it report the number of pfns successfully gotten instead, whether from page pinning or follow_fault_pfn(), which will be used later when batching pinning. Change the last check in vfio_pin_pages_remote() for consistency with the other two. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-22 16:30:44 -07:00
Alex Williamson	07956b6269	vfio/type1: Use follow_pte() follow_pfn() doesn't make sure that we're using the correct page protections, get the pte with follow_pte() so that we can test protections and get the pfn from the pte. Fixes: `5cbf3264bc` ("vfio/type1: Fix VA->PA translation for PFNMAP VMAs in vaddr_get_pfn()") Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-22 10:17:13 -07:00
Max Gurtovoy	b9abef43a0	vfio/pci: remove CONFIG_VFIO_PCI_ZDEV from Kconfig In case we're running on s390 system always expose the capabilities for configuration of zPCI devices. In case we're running on different platform, continue as usual. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-19 10:29:56 -07:00
Tian Tao	35ac5991cd	vfio/iommu_type1: Fix duplicate included kthread.h linux/kthread.h is included more than once, remove the one that isn't necessary. Fixes: `898b9eaeb3` ("vfio/type1: block on invalid vaddr") Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-18 12:25:37 -07:00
Alex Williamson	76adb20f92	Merge branch 'v5.12/vfio/next-vaddr' into v5.12/vfio/next	2021-02-02 09:17:48 -07:00
Max Gurtovoy	7e31d6dc2c	vfio-pci/zdev: fix possible segmentation fault issue In case allocation fails, we must behave correctly and exit with error. Fixes: `e6b817d4b8` ("vfio-pci/zdev: Add zPCI capabilities to VFIO_DEVICE_GET_INFO") Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-02 09:06:02 -07:00
Uwe Kleine-König	3fd269e74f	amba: Make the remove callback return void All amba drivers return 0 in their remove callback. Together with the driver core ignoring the return value anyhow, it doesn't make sense to return a value here. Change the remove prototype to return void, which makes it explicit that returning an error value doesn't work as expected. This simplifies changing the core remove callback to return void, too. Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> # for drivers/memory Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> # for hwtracing/coresight Acked-By: Vinod Koul <vkoul@kernel.org> # for dmaengine Acked-by: Guenter Roeck <linux@roeck-us.net> # for watchdog Acked-by: Wolfram Sang <wsa@kernel.org> # for I2C Acked-by: Takashi Iwai <tiwai@suse.de> # for sound Acked-by: Vladimir Zapolskiy <vz@mleia.com> # for memory/pl172 Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20210126165835.687514-5-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>	2021-02-02 14:25:50 +01:00
Uwe Kleine-König	5b495ac8fe	vfio: platform: simplify device removal vfio_platform_remove_common() cannot return non-NULL in vfio_amba_remove() as the latter is only called if vfio_amba_probe() returned success. Diagnosed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20210126165835.687514-4-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>	2021-02-02 14:24:23 +01:00
Max Gurtovoy	46c4746660	vfio-pci/zdev: remove unused vdev argument Zdev static functions do not use vdev argument. Remove it. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-01 13:43:06 -07:00
Heiner Kallweit	37a682ffbe	vfio/pci: Fix handling of pci use accessor return codes The pci user accessors return negative errno's on error. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [aw: drop Fixes tag, pcibios_err_to_errno() behaves correctly for -errno] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-01 13:40:52 -07:00
Keqian Zhu	010321565a	vfio/iommu_type1: Mantain a counter for non_pinned_groups With this counter, we never need to traverse all groups to update pinned_scope of vfio_iommu. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-01 13:40:52 -07:00
Keqian Zhu	4a19f37a3d	vfio/iommu_type1: Fix some sanity checks in detach group vfio_sanity_check_pfn_list() is used to check whether pfn_list and notifier are empty when remove the external domain, so it makes a wrong assumption that only external domain will use the pinning interface. Now we apply the pfn_list check when a vfio_dma is removed and apply the notifier check when all domains are removed. Fixes: `a54eb55045` ("vfio iommu type1: Add support for mediated devices") Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-01 13:40:52 -07:00
Keqian Zhu	d0a78f9176	vfio/iommu_type1: Populate full dirty when detach non-pinned group If a group with non-pinned-page dirty scope is detached with dirty logging enabled, we should fully populate the dirty bitmaps at the time it's removed since we don't know the extent of its previous DMA, nor will the group be present to trigger the full bitmap when the user retrieves the dirty bitmap. Fixes: `d6a4c18566` ("vfio iommu: Implementation of ioctl for dirty pages tracking") Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-01 13:40:52 -07:00
Steve Sistare	898b9eaeb3	vfio/type1: block on invalid vaddr Block translation of host virtual address while an iova range has an invalid vaddr. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-01 13:20:07 -07:00
Steve Sistare	487ace1340	vfio/type1: implement notify callback Implement a notify callback that remembers if the container's file descriptor has been closed. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-01 13:20:07 -07:00
Steve Sistare	ec5e32940c	vfio: iommu driver notify callback Define a vfio_iommu_driver_ops notify callback, for sending events to the driver. Drivers are not required to provide the callback, and may ignore any events. The handling of events is driver specific. Define the CONTAINER_CLOSE event, called when the container's file descriptor is closed. This event signifies that no further state changes will occur via container ioctl's. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-01 13:20:07 -07:00
Steve Sistare	c3cbab24db	vfio/type1: implement interfaces to update vaddr Implement VFIO_DMA_UNMAP_FLAG_VADDR, VFIO_DMA_MAP_FLAG_VADDR, and VFIO_UPDATE_VADDR. This is a partial implementation. Blocking is added in a subsequent patch. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-01 13:20:07 -07:00
Steve Sistare	40ae9b807b	vfio/type1: massage unmap iteration Modify the iteration in vfio_dma_do_unmap so it does not depend on deletion of each dma entry. Add a variant of vfio_find_dma that returns the entry with the lowest iova in the search range to initialize the iteration. No externally visible change, but this behavior is needed in the subsequent update-vaddr patch. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-01 13:20:06 -07:00
Steve Sistare	c196509953	vfio/type1: implement unmap all Implement VFIO_DMA_UNMAP_FLAG_ALL. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-01 13:20:06 -07:00
Steve Sistare	0f53afa12b	vfio/type1: unmap cleanup Minor changes in vfio_dma_do_unmap to improve readability, which also simplify the subsequent unmap-all patch. No functional change. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-02-01 13:20:05 -07:00
Linus Torvalds	6a447b0e31	ARM: * PSCI relay at EL2 when "protected KVM" is enabled * New exception injection code * Simplification of AArch32 system register handling * Fix PMU accesses when no PMU is enabled * Expose CSV3 on non-Meltdown hosts * Cache hierarchy discovery fixes * PV steal-time cleanups * Allow function pointers at EL2 * Various host EL2 entry cleanups * Simplification of the EL2 vector allocation s390: * memcg accouting for s390 specific parts of kvm and gmap * selftest for diag318 * new kvm_stat for when async_pf falls back to sync x86: * Tracepoints for the new pagetable code from 5.10 * Catch VFIO and KVM irqfd events before userspace * Reporting dirty pages to userspace with a ring buffer * SEV-ES host support * Nested VMX support for wait-for-SIPI activity state * New feature flag (AVX512 FP16) * New system ioctl to report Hyper-V-compatible paravirtualization features Generic: * Selftest improvements -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl/bdL4UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNgQQgAnTH6rhXa++Zd5F0EM2NwXwz3iEGb lOq1DZSGjs6Eekjn8AnrWbmVQr+CBCuGU9MrxpSSzNDK/awryo3NwepOWAZw9eqk BBCVwGBbJQx5YrdgkGC0pDq2sNzcpW/VVB3vFsmOxd9eHblnuKSIxEsCCXTtyqIt XrLpQ1UhvI4yu102fDNhuFw2EfpzXm+K0Lc0x6idSkdM/p7SyeOxiv8hD4aMr6+G bGUQuMl4edKZFOWFigzr8NovQAvDHZGrwfihu2cLRYKLhV97QuWVmafv/yYfXcz2 drr+wQCDNzDOXyANnssmviazrhOX0QmTAhbIXGGX/kTxYKcfPi83ZLoI3A== =ISud -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "Much x86 work was pushed out to 5.12, but ARM more than made up for it. ARM: - PSCI relay at EL2 when "protected KVM" is enabled - New exception injection code - Simplification of AArch32 system register handling - Fix PMU accesses when no PMU is enabled - Expose CSV3 on non-Meltdown hosts - Cache hierarchy discovery fixes - PV steal-time cleanups - Allow function pointers at EL2 - Various host EL2 entry cleanups - Simplification of the EL2 vector allocation s390: - memcg accouting for s390 specific parts of kvm and gmap - selftest for diag318 - new kvm_stat for when async_pf falls back to sync x86: - Tracepoints for the new pagetable code from 5.10 - Catch VFIO and KVM irqfd events before userspace - Reporting dirty pages to userspace with a ring buffer - SEV-ES host support - Nested VMX support for wait-for-SIPI activity state - New feature flag (AVX512 FP16) - New system ioctl to report Hyper-V-compatible paravirtualization features Generic: - Selftest improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits) KVM: SVM: fix 32-bit compilation KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting KVM: SVM: Provide support to launch and run an SEV-ES guest KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests KVM: SVM: Provide support for SEV-ES vCPU loading KVM: SVM: Provide support for SEV-ES vCPU creation/loading KVM: SVM: Update ASID allocation to support SEV-ES guests KVM: SVM: Set the encryption mask for the SVM host save area KVM: SVM: Add NMI support for an SEV-ES guest KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest KVM: SVM: Do not report support for SMM for an SEV-ES guest KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES KVM: SVM: Add support for CR8 write traps for an SEV-ES guest KVM: SVM: Add support for CR4 write traps for an SEV-ES guest KVM: SVM: Add support for CR0 write traps for an SEV-ES guest KVM: SVM: Add support for EFER write traps for an SEV-ES guest KVM: SVM: Support string IO operations for an SEV-ES guest KVM: SVM: Support MMIO for an SEV-ES guest KVM: SVM: Create trace events for VMGEXIT MSR protocol processing KVM: SVM: Create trace events for VMGEXIT processing ...	2020-12-20 10:44:05 -08:00
Linus Torvalds	0c71cc04eb	VFIO updates for v5.11-rc1 - Fix uninitialized list walk in error path (Eric Auger) - Use io_remap_pfn_range() (Jason Gunthorpe) - Allow fallback support for NVLink on POWER8 (Alexey Kardashevskiy) - Enable mdev request interrupt with CCW support (Eric Farman) - Enable interface to iommu_domain from vfio_group (Lu Baolu) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJf2mBlAAoJECObm247sIsi4Y0P/2+beHwcZfSn5HSSSeqFR+uj HfiZYN05bFcVirHPIPfY56kbe8+XNqwHAUPqch+c2iuGhLHuUIXkHwFq8VqvBDY4 xVqu9ZB23e881dF7NPzMAcK2UMDDwtlmeaf+7oRADVmTn58hErU9qt813i7a+x1S 5p8UpCTf+X+sh434cV882/TIV4hWgaFgn1/Gy3l8GDaJUW1lQb3MHZ5XY+o9KOxt 4R7VLIYZCDwEcByUsz8p4RLD036YpSS2Ir+CHXqeArtKrwRjjM62cSwDCOuM6ewZ GJ7O3YzPp9FQ75F3oorL3D+ojPY6AU1QjZKg0+gAQS3kucewwkv+vI+RDhSc8Xx7 PFTW2bLk4cu9LHQxUT64uH5Qoa0NtfPBUGvgsR4kXPCRClk71ZcGZgaQD3CWJBhE CTPI1OLHeJgp7MGXAArGRzFOtf0nux5oxOmcmT5fg4icG+x7BgcvDd7dVhyGAjn7 Gp87OOtJ3itDhWIlO1aTJVHEt42b1eezLkkyIKHfPlDLJmyHEfCOTjDLftr7Rmma 2IlyDJs83MCahQFcjqbPJWqh2Ttda8+ItutiklwBJRHe6EC+4WML6JaHgXe71CDi 9Y9HwKLxYFr1pwUpQP6bxzEDlTCPasWyhBmOAUnRtZdU/daX+KGp4WqSFBYkPIgy ERmTCiJGJ3p41V08qOki =W/GT -----END PGP SIGNATURE----- Merge tag 'vfio-v5.11-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Fix uninitialized list walk in error path (Eric Auger) - Use io_remap_pfn_range() (Jason Gunthorpe) - Allow fallback support for NVLink on POWER8 (Alexey Kardashevskiy) - Enable mdev request interrupt with CCW support (Eric Farman) - Enable interface to iommu_domain from vfio_group (Lu Baolu) * tag 'vfio-v5.11-rc1' of git://github.com/awilliam/linux-vfio: vfio/type1: Add vfio_group_iommu_domain() vfio-ccw: Wire in the request callback vfio-mdev: Wire in a request handler for mdev parent vfio/pci/nvlink2: Do not attempt NPU2 setup on POWER8NVL NPU vfio-pci: Use io_remap_pfn_range() for PCI IO memory vfio/pci: Move dummy_resources_list init in vfio_pci_probe()	2020-12-16 15:51:15 -08:00
Lu Baolu	bdfae1c9a9	vfio/type1: Add vfio_group_iommu_domain() Add the API for getting the domain from a vfio group. This could be used by the physical device drivers which rely on the vfio/mdev framework for mediated device user level access. The typical use case like below: unsigned int pasid; struct vfio_group vfio_group; struct iommu_domain iommu_domain; struct device dev = mdev_dev(mdev); struct device iommu_device = mdev_get_iommu_device(dev); if (!iommu_device \|\| !iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX)) return -EINVAL; vfio_group = vfio_group_get_external_user_from_dev(dev); if (IS_ERR_OR_NULL(vfio_group)) return -EFAULT; iommu_domain = vfio_group_iommu_domain(vfio_group); if (IS_ERR_OR_NULL(iommu_domain)) { vfio_group_put_external_user(vfio_group); return -EFAULT; } pasid = iommu_aux_get_pasid(iommu_domain, iommu_device); if (pasid < 0) { vfio_group_put_external_user(vfio_group); return -EFAULT; } /* Program device context with pasid value. */ ... Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-12-10 14:47:56 -07:00
Andy Shevchenko	feaba5932b	vfio: platform: Switch to use platform_get_mem_or_io() Switch to use new platform_get_mem_or_io() instead of home grown analogue. Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: kvm@vger.kernel.org Acked-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20201209203642.27648-2-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-12-10 16:31:46 +01:00
Eric Farman	a15ac665b9	vfio-mdev: Wire in a request handler for mdev parent While performing some destructive tests with vfio-ccw, where the paths to a device are forcible removed and thus the device itself is unreachable, it is rather easy to end up in an endless loop in vfio_del_group_dev() due to the lack of a request callback for the associated device. In this example, one MDEV (77c) is used by a guest, while another (77b) is not. The symptom is that the iommu is detached from the mdev for 77b, but not 77c, until that guest is shutdown: [ 238.794867] vfio_ccw 0.0.077b: MDEV: Unregistering [ 238.794996] vfio_mdev 11f2d2bc-4083-431d-a023-eff72715c4f0: Removing from iommu group 2 [ 238.795001] vfio_mdev 11f2d2bc-4083-431d-a023-eff72715c4f0: MDEV: detaching iommu [ 238.795036] vfio_ccw 0.0.077c: MDEV: Unregistering ...silence... Let's wire in the request call back to the mdev device, so that a device being physically removed from the host can be (gracefully?) handled by the parent device at the time the device is removed. Add a message when registering the device if a driver doesn't provide this callback, so a clue is given that this same loop may be encountered in a similar situation, and a message when this occurs instead of the awkward silence noted above. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-12-03 16:21:07 -07:00
Alexey Kardashevskiy	d22f9a6c92	vfio/pci/nvlink2: Do not attempt NPU2 setup on POWER8NVL NPU We execute certain NPU2 setup code (such as mapping an LPID to a device in NPU2) unconditionally if an Nvlink bridge is detected. However this cannot succeed on POWER8NVL machines as the init helpers return an error other than ENODEV which means the device is there is and setup failed so vfio_pci_enable() fails and pass through is not possible. This changes the two NPU2 related init helpers to return -ENODEV if there is no "memory-region" device tree property as this is the distinction between NPU and NPU2. Tested on - POWER9 pvr=004e1201, Ubuntu 19.04 host, Ubuntu 18.04 vm, NVIDIA GV100 10de:1db1 driver 418.39 - POWER8 pvr=004c0100, RHEL 7.6 host, Ubuntu 16.10 vm, NVIDIA P100 10de:15f9 driver 396.47 Fixes: `7f92891778` ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver") Cc: stable@vger.kernel.org # 5.0 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-12-02 13:04:22 -07:00
Jason Gunthorpe	7b06a56d46	vfio-pci: Use io_remap_pfn_range() for PCI IO memory commit `f8f6ae5d07` ("mm: always have io_remap_pfn_range() set pgprot_decrypted()") allows drivers using mmap to put PCI memory mapped BAR space into userspace to work correctly on AMD SME systems that default to all memory encrypted. Since vfio_pci_mmap_fault() is working with PCI memory mapped BAR space it should be calling io_remap_pfn_range() otherwise it will not work on SME systems. Fixes: `11c4cd07ba` ("vfio-pci: Fault mmaps to enable vma tracking") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Peter Xu <peterx@redhat.com> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-12-02 13:03:12 -07:00
Eric Auger	16b8fe4caf	vfio/pci: Move dummy_resources_list init in vfio_pci_probe() In case an error occurs in vfio_pci_enable() before the call to vfio_pci_probe_mmaps(), vfio_pci_disable() will try to iterate on an uninitialized list and cause a kernel panic. Lets move to the initialization to vfio_pci_probe() to fix the issue. Signed-off-by: Eric Auger <eric.auger@redhat.com> Fixes: `05f0c03fba` ("vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive") CC: Stable <stable@vger.kernel.org> # v4.7+ Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-12-02 13:03:12 -07:00
David Woodhouse	b1b397aeef	vfio/virqfd: Drain events from eventfd in virqfd_wakeup() Don't allow the events to accumulate in the eventfd counter, drain them as they are handled. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20201027135523.646811-3-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-15 09:49:10 -05:00
Fred Gao	e4eccb8536	vfio/pci: Bypass IGD init in case of -ENODEV Bypass the IGD initialization when -ENODEV returns, that should be the case if opregion is not available for IGD or within discrete graphics device's option ROM, or host/lpc bridge is not found. Then use of -ENODEV here means no special device resources found which needs special care for VFIO, but we still allow other normal device resource access. Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Xiong Zhang <xiong.y.zhang@intel.com> Cc: Hang Yuan <hang.yuan@linux.intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Fred Gao <fred.gao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-03 11:07:40 -07:00
Zhang Qilong	bb742ad019	vfio: platform: fix reference leak in vfio_platform_open pm_runtime_get_sync() will increment pm usage counter even it failed. Forgetting to call pm_runtime_put will result in reference leak in vfio_platform_open, so we should fix it. Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Acked-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-03 11:07:40 -07:00
Alex Williamson	38565c93c8	vfio/pci: Implement ioeventfd thread handler for contended memory lock The ioeventfd is called under spinlock with interrupts disabled, therefore if the memory lock is contended defer code that might sleep to a thread context. Fixes: `bc93b9ae01` ("vfio-pci: Avoid recursive read-lock usage") Link: https://bugzilla.kernel.org/show_bug.cgi?id=209253#c1 Reported-by: Ian Pilcher <arequipeno@gmail.com> Tested-by: Ian Pilcher <arequipeno@gmail.com> Tested-by: Justin Gatzen <justin.gatzen@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-03 11:07:40 -07:00
Diana Craciun	8e91cb3812	vfio/fsl-mc: Make vfio_fsl_mc_irqs_allocate static Fixed compiler warning: drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c:16:5: warning: no previous prototype for function 'vfio_fsl_mc_irqs_allocate' [-Wmissing-prototypes] ^ drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c:16:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int vfio_fsl_mc_irqs_allocate(struct vfio_fsl_mc_device *vdev) Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-03 11:07:40 -07:00
Dan Carpenter	69848cd6f0	vfio/fsl-mc: prevent underflow in vfio_fsl_mc_mmap() My static analsysis tool complains that the "index" can be negative. There are some checks in do_mmap() which try to prevent underflows but I don't know if they are sufficient for this situation. Either way, making "index" unsigned is harmless so let's do it just to be safe. Fixes: `6724728968` ("vfio/fsl-mc: Allow userspace to MMAP fsl-mc device MMIO regions") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Diana Craciun <diana.craciun@oss.nxp.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-03 11:07:19 -07:00
Dan Carpenter	09699e56de	vfio/fsl-mc: return -EFAULT if copy_to_user() fails The copy_to_user() function returns the number of bytes remaining to be copied, but this code should return -EFAULT. Fixes: `df747bcd5b` ("vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl call") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Diana Craciun <diana.craciun@oss.nxp.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-02 15:00:06 -07:00
Zenghui Yu	572f64c71e	vfio/type1: Use the new helper to find vfio_group When attaching a new group to the container, let's use the new helper vfio_iommu_find_iommu_group() to check if it's already attached. There is no functional change. Also take this chance to add a missing blank line. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-02 14:58:29 -07:00
Linus Torvalds	fc996db970	VFIO updates for v5.10-rc1 - New fsl-mc vfio bus driver supporting userspace drivers of objects within NXP's DPAA2 architecture (Diana Craciun) - Support for exposing zPCI information on s390 (Matthew Rosato) - Fixes for "detached" VFs on s390 (Matthew Rosato) - Fixes for pin-pages and dma-rw accesses (Yan Zhao) - Cleanups and optimize vconfig regen (Zenghui Yu) - Fix duplicate irq-bypass token registration (Alex Williamson) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJfkcCjAAoJECObm247sIsi2XIP/j7NL4glPrWU37mesz9dd5nx SmZhcmxnOqZSQkOCnu+hNFZ9e+tdQjuX+jATOZaYz5l55bLAFmBlBj1Dv8HWaCVI mTbJ6xXUwdOvNSxbFH6BIUkJg8otR0iEkefVyJLNlF84FsaDknH4yZxx0vdeczjF wTkkk3+4VmH+4klvPIa9v0eL7yeKeFmgls9nQViVE5kDWUF4us/z/oHlVm9wR+mL 2r3DEjHyz4L2hwVEkhZk7ytR6szdhuhF2l7NoMmaSEXRXjBzJoO6I3P9Y2W4i+su MFgTfiQ+OpIfVuiR8GzGev+/SrjWGX0Hvb2sYriKOELjhyedkE2kmxacbqMZ/UE+ SRAhFf64C1rzJ4g1IW//Gg+9ObIPqlkqU52VDbOZdCED0AquwSyVmdwIUAK6qF+I HLOyZXhMI8EZ+w063cS+aKLJIvQTBbfIdMmPZkopVZhwWB3N3BjdvBKA+rPpPoTx 0DpeUo891+zyeEE4aunUmCB8HFnBPgUa+XZqg2juq9MxjScsqgTzA0WEZg7jV4oj tORQrqoAKJgSk9oVL3EvAnr+IJix3ScRTqYymESORkz/lRCk2hFX48qdeW+qiSP8 W1DHOnivFb1+JzhuZyaRKFWy1mK0EQQWTsE2b2ymPMKJbFhi+pVxaksmeG5x+4Q9 SAp+Qma8Aj3UtBKcj/S+ =LDPo -----END PGP SIGNATURE----- Merge tag 'vfio-v5.10-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - New fsl-mc vfio bus driver supporting userspace drivers of objects within NXP's DPAA2 architecture (Diana Craciun) - Support for exposing zPCI information on s390 (Matthew Rosato) - Fixes for "detached" VFs on s390 (Matthew Rosato) - Fixes for pin-pages and dma-rw accesses (Yan Zhao) - Cleanups and optimize vconfig regen (Zenghui Yu) - Fix duplicate irq-bypass token registration (Alex Williamson) * tag 'vfio-v5.10-rc1' of git://github.com/awilliam/linux-vfio: (30 commits) vfio iommu type1: Fix memory leak in vfio_iommu_type1_pin_pages vfio/pci: Clear token on bypass registration failure vfio/fsl-mc: fix the return of the uninitialized variable ret vfio/fsl-mc: Fix the dead code in vfio_fsl_mc_set_irq_trigger vfio/fsl-mc: Fixed vfio-fsl-mc driver compilation on 32 bit MAINTAINERS: Add entry for s390 vfio-pci vfio-pci/zdev: Add zPCI capabilities to VFIO_DEVICE_GET_INFO vfio/fsl-mc: Add support for device reset vfio/fsl-mc: Add read/write support for fsl-mc devices vfio/fsl-mc: trigger an interrupt via eventfd vfio/fsl-mc: Add irq infrastructure for fsl-mc devices vfio/fsl-mc: Added lock support in preparation for interrupt handling vfio/fsl-mc: Allow userspace to MMAP fsl-mc device MMIO regions vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl call vfio/fsl-mc: Implement VFIO_DEVICE_GET_INFO ioctl vfio/fsl-mc: Scan DPRC objects on vfio-fsl-mc driver bind vfio: Introduce capability definitions for VFIO_DEVICE_GET_INFO s390/pci: track whether util_str is valid in the zpci_dev s390/pci: stash version in the zpci_dev vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devices ...	2020-10-22 13:00:44 -07:00
Xiaoyang Xu	2e6cfd496f	vfio iommu type1: Fix memory leak in vfio_iommu_type1_pin_pages pfn is not added to pfn_list when vfio_add_to_pfn_list fails. vfio_unpin_page_external will exit directly without calling vfio_iova_put_vfio_pfn. This will lead to a memory leak. Fixes: `a54eb55045` ("vfio iommu type1: Add support for mediated devices") Signed-off-by: Xiaoyang Xu <xuxiaoyang2@huawei.com> [aw: simplified logic, add Fixes] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-20 10:12:17 -06:00
Alex Williamson	852b1beecb	vfio/pci: Clear token on bypass registration failure The eventfd context is used as our irqbypass token, therefore if an eventfd is re-used, our token is the same. The irqbypass code will return an -EBUSY in this case, but we'll still attempt to unregister the producer, where if that duplicate token still exists, results in removing the wrong object. Clear the token of failed producers so that they harmlessly fall out when unregistered. Fixes: `6d7425f109` ("vfio: Register/unregister irq_bypass_producer") Reported-by: guomin chen <guomin_chen@sina.com> Tested-by: guomin chen <guomin_chen@sina.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-19 07:13:55 -06:00
Diana Craciun	822e1a90af	vfio/fsl-mc: fix the return of the uninitialized variable ret The vfio_fsl_mc_reflck_attach function may return, on success path, an uninitialized variable. Fix the problem by initializing the return variable to 0. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: `f2ba7e8c94` ("vfio/fsl-mc: Added lock support in preparation for interrupt handling") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-19 07:09:41 -06:00
Jann Horn	4d45e75a99	mm: remove the now-unnecessary mmget_still_valid() hack The preceding patches have ensured that core dumping properly takes the mmap_lock. Thanks to that, we can now remove mmget_still_valid() and all its users. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-16 11:11:22 -07:00
Diana Craciun	159246378d	vfio/fsl-mc: Fix the dead code in vfio_fsl_mc_set_irq_trigger Static analysis discovered that some code in vfio_fsl_mc_set_irq_trigger is dead code. Fixed the code by changing the conditions order. Fixes: `cc0ee20bd9` ("vfio/fsl-mc: trigger an interrupt via eventfd") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-15 12:46:08 -06:00
Diana Craciun	83e491799e	vfio/fsl-mc: Fixed vfio-fsl-mc driver compilation on 32 bit The FSL_MC_BUS on which the VFIO-FSL-MC driver is dependent on can be compiled on other architectures as well (not only ARM64) including 32 bit architectures. Include linux/io-64-nonatomic-hi-lo.h to make writeq/readq used in the driver available on 32bit platforms. Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-13 11:12:29 -06:00
Alex Williamson	2099363255	Merge branches 'v5.10/vfio/fsl-mc-v6' and 'v5.10/vfio/zpci-info-v3' into v5.10/vfio/next	2020-10-12 11:41:02 -06:00
Matthew Rosato	e6b817d4b8	vfio-pci/zdev: Add zPCI capabilities to VFIO_DEVICE_GET_INFO Define a new configuration entry VFIO_PCI_ZDEV for VFIO/PCI. When this s390-only feature is configured we add capabilities to the VFIO_DEVICE_GET_INFO ioctl that describe features of the associated zPCI device and its underlying hardware. This patch is based on work previously done by Pierre Morel. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-12 11:37:59 -06:00
Diana Craciun	ac93ab2bf6	vfio/fsl-mc: Add support for device reset Currently only resetting the DPRC container is supported which will reset all the objects inside it. Resetting individual objects is possible from the userspace by issueing commands towards MC firmware. Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-12 11:33:48 -06:00
Diana Craciun	1bb141ed5e	vfio/fsl-mc: Add read/write support for fsl-mc devices The software uses a memory-mapped I/O command interface (MC portals) to communicate with the MC hardware. This command interface is used to discover, enumerate, configure and remove DPAA2 objects. The DPAA2 objects use MSIs, so the command interface needs to be emulated such that the correct MSI is configured in the hardware (the guest has the virtual MSIs). This patch is adding read/write support for fsl-mc devices. The mc commands are emulated by the userspace. The host is just passing the correct command to the hardware. Also the current patch limits userspace to write complete 64byte command once and read 64byte response by one ioctl. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-12 11:33:27 -06:00
Diana Craciun	cc0ee20bd9	vfio/fsl-mc: trigger an interrupt via eventfd This patch allows to set an eventfd for fsl-mc device interrupts and also to trigger the interrupt eventfd from userspace for testing. All fsl-mc device interrupts are MSIs. The MSIs are allocated from the MSI domain only once per DPRC and used by all the DPAA2 objects. The interrupts are managed by the DPRC in a pool of interrupts. Each device requests interrupts from this pool. The pool is allocated when the first virtual device is setting the interrupts. The pool of interrupts is protected by a lock. The DPRC has an interrupt of its own which indicates if the DPRC contents have changed. However, currently, the contents of a DPRC assigned to the guest cannot be changed at runtime, so this interrupt is not configured. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-12 11:33:15 -06:00
Diana Craciun	2e0d29561f	vfio/fsl-mc: Add irq infrastructure for fsl-mc devices This patch adds the skeleton for interrupt support for fsl-mc devices. The interrupts are not yet functional, the functionality will be added by subsequent patches. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-12 11:33:06 -06:00
Diana Craciun	f2ba7e8c94	vfio/fsl-mc: Added lock support in preparation for interrupt handling Only the DPRC object allocates interrupts from the MSI interrupt domain. The interrupts are managed by the DPRC in a pool of interrupts. The access to this pool of interrupts has to be protected with a lock. This patch extends the current lock implementation to have a lock per DPRC. Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-12 11:32:49 -06:00
Diana Craciun	6724728968	vfio/fsl-mc: Allow userspace to MMAP fsl-mc device MMIO regions Allow userspace to mmap device regions for direct access of fsl-mc devices. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-12 11:32:37 -06:00
Diana Craciun	df747bcd5b	vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl call Expose to userspace information about the memory regions. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-12 11:32:24 -06:00
Diana Craciun	f97f4c04e5	vfio/fsl-mc: Implement VFIO_DEVICE_GET_INFO ioctl Allow userspace to get fsl-mc device info (number of regions and irqs). Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-12 11:32:19 -06:00
Diana Craciun	704f5082d8	vfio/fsl-mc: Scan DPRC objects on vfio-fsl-mc driver bind The DPRC (Data Path Resource Container) device is a bus device and has child devices attached to it. When the vfio-fsl-mc driver is probed the DPRC is scanned and the child devices discovered and initialized. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-12 11:32:03 -06:00
Bharat Bhushan	fb1ff4c194	vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devices DPAA2 (Data Path Acceleration Architecture) consists in mechanisms for processing Ethernet packets, queue management, accelerators, etc. The Management Complex (mc) is a hardware entity that manages the DPAA2 hardware resources. It provides an object-based abstraction for software drivers to use the DPAA2 hardware. The MC mediates operations such as create, discover, destroy of DPAA2 objects. The MC provides memory-mapped I/O command interfaces (MC portals) which DPAA2 software drivers use to operate on DPAA2 objects. A DPRC is a container object that holds other types of DPAA2 objects. Each object in the DPRC is a Linux device and bound to a driver. The MC-bus driver is a platform driver (different from PCI or platform bus). The DPRC driver does runtime management of a bus instance. It performs the initial scan of the DPRC and handles changes in the DPRC configuration (adding/removing objects). All objects inside a container share the same hardware isolation context, meaning that only an entire DPRC can be assigned to a virtual machine. When a container is assigned to a virtual machine, all the objects within that container are assigned to that virtual machine. The DPRC container assigned to the virtual machine is not allowed to change contents (add/remove objects) by the guest. The restriction is set by the host and enforced by the mc hardware. The DPAA2 objects can be directly assigned to the guest. However the MC portals (the memory mapped command interface to the MC) need to be emulated because there are commands that configure the interrupts and the isolation IDs which are virtual in the guest. Example: echo vfio-fsl-mc > /sys/bus/fsl-mc/devices/dprc.2/driver_override echo dprc.2 > /sys/bus/fsl-mc/drivers/vfio-fsl-mc/bind The dprc.2 is bound to the VFIO driver and all the objects within dprc.2 are going to be bound to the VFIO driver. This patch adds the infrastructure for VFIO support for fsl-mc devices. Subsequent patches will add support for binding and secure assigning these devices using VFIO. More details about the DPAA2 objects can be found here: Documentation/networking/device_drivers/freescale/dpaa2/overview.rst Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-10-07 14:17:33 -06:00
Alex Williamson	3de066f8f8	Merge branches 'v5.10/vfio/bardirty', 'v5.10/vfio/dma_avail', 'v5.10/vfio/misc', 'v5.10/vfio/no-cmd-mem' and 'v5.10/vfio/yan_zhao_fixes' into v5.10/vfio/next	2020-09-22 10:56:51 -06:00
Yan Zhao	2c5af98592	vfio/type1: fix dirty bitmap calculation in vfio_dma_rw The count of dirtied pages is not only determined by count of copied pages, but also by the start offset. e.g. if offset = PAGE_SIZE - 1, and *copied=2, the dirty pages count is 2, instead of 1 or 0. Fixes: `d6a4c18566` ("vfio iommu: Implementation of ioctl for dirty pages tracking") Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-09-22 10:56:41 -06:00
Yan Zhao	28b1302440	vfio: fix a missed vfio group put in vfio_pin_pages When error occurs, need to put vfio group after a successful get. Fixes: `95fc87b441` ("vfio: Selective dirty page tracking if IOMMU backed device pins pages") Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-09-22 10:56:40 -06:00
Matthew Rosato	515ecd5368	vfio/pci: Decouple PCI_COMMAND_MEMORY bit checks from is_virtfn While it is true that devices with is_virtfn=1 will have a Memory Space Enable bit that is hard-wired to 0, this is not the only case where we see this behavior -- For example some bare-metal hypervisors lack Memory Space Enable bit emulation for devices not setting is_virtfn (s390). Fix this by instead checking for the newly-added no_command_memory bit which directly denotes the need for PCI_COMMAND_MEMORY emulation in vfio. Fixes: `abafbc551f` ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-09-22 10:52:24 -06:00
Matthew Rosato	7d6e132965	vfio iommu: Add dma available capability Commit `492855939b` ("vfio/type1: Limit DMA mappings per container") added the ability to limit the number of memory backed DMA mappings. However on s390x, when lazy mapping is in use, we use a very large number of concurrent mappings. Let's provide the current allowable number of DMA mappings to userspace via the IOMMU info chain so that userspace can take appropriate mitigation. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-09-21 14:58:34 -06:00
Yan Zhao	7ef32e5236	vfio: add a singleton check for vfio_group_pin_pages Page pinning is used both to translate and pin device mappings for DMA purpose, as well as to indicate to the IOMMU backend to limit the dirty page scope to those pages that have been pinned, in the case of an IOMMU backed device. To support this, the vfio_pin_pages() interface limits itself to only singleton groups such that the IOMMU backend can consider dirty page scope only at the group level. Implement the same requirement for the vfio_group_pin_pages() interface. Fixes: `95fc87b441` ("vfio: Selective dirty page tracking if IOMMU backed device pins pages") Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-09-21 14:50:51 -06:00
Zenghui Yu	1c0f68252a	vfio/pci: Don't regenerate vconfig for all BARs if !bardirty Now we regenerate vconfig for all the BARs via vfio_bar_fixup(), every time any offset of any of them are read. Though BARs aren't re-read regularly, the regeneration can be avoided if no BARs had been written since they were last read, in which case vdev->bardirty is false. Let's return immediately in vfio_bar_fixup() if bardirty is false. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-09-21 14:08:12 -06:00
Zenghui Yu	eac7cc21c4	vfio/pci: Remove redundant declaration of vfio_pci_driver It was added by commit `137e553135` ("vfio/pci: Add sriov_configure support") but duplicates a forward declaration earlier in the file. Remove it. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-09-21 14:07:05 -06:00
Tom Murphy	aae4c8e27b	iommu: Rename iommu_tlb_* functions to iommu_iotlb_* To keep naming consistent we should stick with iotlb. This patch renames a few remaining functions. Signed-off-by: Tom Murphy <murphyt7@tcd.ie> Link: https://lore.kernel.org/r/20200817210051.13546-1-murphyt7@tcd.ie Signed-off-by: Joerg Roedel <jroedel@suse.de>	2020-09-04 11:16:09 +02:00
Gustavo A. R. Silva	df561f6688	treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2020-08-23 17:36:59 -05:00
Alex Williamson	aae7a75a82	vfio/type1: Add proper error unwind for vfio_iommu_replay() The vfio_iommu_replay() function does not currently unwind on error, yet it does pin pages, perform IOMMU mapping, and modify the vfio_dma structure to indicate IOMMU mapping. The IOMMU mappings are torn down when the domain is destroyed, but the other actions go on to cause trouble later. For example, the iommu->domain_list can be empty if we only have a non-IOMMU backed mdev attached. We don't currently check if the list is empty before getting the first entry in the list, which leads to a bogus domain pointer. If a vfio_dma entry is erroneously marked as iommu_mapped, we'll attempt to use that bogus pointer to retrieve the existing physical page addresses. This is the scenario that uncovered this issue, attempting to hot-add a vfio-pci device to a container with an existing mdev device and DMA mappings, one of which could not be pinned, causing a failure adding the new group to the existing container and setting the conditions for a subsequent attempt to explode. To resolve this, we can first check if the domain_list is empty so that we can reject replay of a bogus domain, should we ever encounter this inconsistent state again in the future. The real fix though is to add the necessary unwind support, which means cleaning up the current pinning if an IOMMU mapping fails, then walking back through the r-b tree of DMA entries, reading from the IOMMU which ranges are mapped, and unmapping and unpinning those ranges. To be able to do this, we also defer marking the DMA entry as IOMMU mapped until all entries are processed, in order to allow the unwind to know the disposition of each entry. Fixes: `a54eb55045` ("vfio iommu type1: Add support for mediated devices") Reported-by: Zhiyi Guo <zhguo@redhat.com> Tested-by: Zhiyi Guo <zhguo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-08-17 11:09:13 -06:00
Alex Williamson	bc93b9ae01	vfio-pci: Avoid recursive read-lock usage A down_read on memory_lock is held when performing read/write accesses to MMIO BAR space, including across the copy_to/from_user() callouts which may fault. If the user buffer for these copies resides in an mmap of device MMIO space, the mmap fault handler will acquire a recursive read-lock on memory_lock. Avoid this by reducing the lock granularity. Sequential accesses requiring multiple ioread/iowrite cycles are expected to be rare, therefore typical accesses should not see additional overhead. VGA MMIO accesses are expected to be non-fatal regardless of the PCI memory enable bit to allow legacy probing, this behavior remains with a comment added. ioeventfds are now included in memory access testing, with writes dropped while memory space is disabled. Fixes: `abafbc551f` ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Reported-by: Zhiyi Guo <zhguo@redhat.com> Tested-by: Zhiyi Guo <zhguo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-08-17 11:08:18 -06:00
Linus Torvalds	407bc8d818	VFIO updates for v5.9-rc1 - Inclusive naming updates (Alex Williamson) - Intel X550 INTx quirk (Alex Williamson) - Error path resched between unmaps (Xiang Zheng) - SPAPR IOMMU pin_user_pages() conversion (John Hubbard) - Trivial mutex simplification (Alex Williamson) - QAT device denylist (Giovanni Cabiddu) - type1 IOMMU ioctl refactor (Liu Yi L) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJfM1LBAAoJECObm247sIsiF1oP/RJOlmwKf0mmq8iP0j5yjCis gjvAAvmD4h3y15yx0kOoY8BNJjyHT26iuqnCHbEhaO+u+lNt3plWNJG9FRpuZJJY To6fsj6SApC3BnI3lxhduvQa29dcflbyohzjF2/+QJLbt4PJoRA5GlAEcI4rIUE0 JfxquYE7n++h4of/532392TmJpMZeW+9IScF2TzPCIYMCw4DwUYZRcfm0UePkv0D nQOIyyeyzFhBhoG45TqQr4yp8KBXQvrBViwkuIX+4P0JnRBI/h8LLcDu9TCGAo87 mkF/AZfuq7q5X6hJwbNavk9YeksNWeIM5JaYyJeBd42a3j3uz6iUe9ENUMQpm7qe 6SGU3V7xVAPokE5q9VmjlSJz0yUgopqsTrD1EBP5DrQGIBVBAkFPHf/bOPE8DrAN htUNX9Zl/TuNPBmEBursgot9qMvgdwA1f2uOPtiB2zvf44qPgoGbT16k48xNK+x7 kWWVlgUYX3yIi9NI0EnlEaTMA2DGL3xY5/ZLjySojWrtu5AspTuBlJZaT1jPKAAF OmKqJFCIIRMqrWPdg8r9P7k6nLcW0nI6hIMhxqRVgqp18PYLojPsaeDk/ndQv8Lx D0IafrruF932XGRFjuo2d5vUar4pJcHXqwbeQbf6mWRIeprrBYWm7nVsU/ZIWIMU lCtj3NQtShAIGARZUZK9 =qsYH -----END PGP SIGNATURE----- Merge tag 'vfio-v5.9-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Inclusive naming updates (Alex Williamson) - Intel X550 INTx quirk (Alex Williamson) - Error path resched between unmaps (Xiang Zheng) - SPAPR IOMMU pin_user_pages() conversion (John Hubbard) - Trivial mutex simplification (Alex Williamson) - QAT device denylist (Giovanni Cabiddu) - type1 IOMMU ioctl refactor (Liu Yi L) * tag 'vfio-v5.9-rc1' of git://github.com/awilliam/linux-vfio: vfio/type1: Refactor vfio_iommu_type1_ioctl() vfio/pci: Add QAT devices to denylist vfio/pci: Add device denylist PCI: Add Intel QuickAssist device IDs vfio/pci: Hold igate across releasing eventfd contexts vfio/spapr_tce: convert get_user_pages() --> pin_user_pages() vfio/type1: Add conditional rescheduling after iommu map failed vfio/pci: Add Intel X550 to hidden INTx devices vfio: Cleanup allowed driver naming	2020-08-12 12:09:36 -07:00
Peter Xu	64019a2e46	mm/gup: remove task_struct pointer for all gup code After the cleanup of page fault accounting, gup does not need to pass task_struct around any more. Remove that parameter in the whole gup stack. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-12 10:58:04 -07:00
Liu Yi L	ccd59dce1a	vfio/type1: Refactor vfio_iommu_type1_ioctl() This patch refactors the vfio_iommu_type1_ioctl() to use switch instead of if-else, and each command got a helper function. Cc: Kevin Tian <kevin.tian@intel.com> CC: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-07-27 13:46:13 -06:00
Giovanni Cabiddu	50173329c8	vfio/pci: Add QAT devices to denylist The current generation of Intel® QuickAssist Technology devices are not designed to run in an untrusted environment because of the following issues reported in the document "Intel® QuickAssist Technology (Intel® QAT) Software for Linux" (document number 336211-014): QATE-39220 - GEN - Intel® QAT API submissions with bad addresses that trigger DMA to invalid or unmapped addresses can cause a platform hang QATE-7495 - GEN - An incorrectly formatted request to Intel® QAT can hang the entire Intel® QAT Endpoint The document is downloadable from https://01.org/intel-quickassist-technology at the following link: https://01.org/sites/default/files/downloads/336211-014-qatforlinux-releasenotes-hwv1.7_0.pdf This patch adds the following QAT devices to the denylist: DH895XCC, C3XXX and C62X. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-07-27 13:43:40 -06:00
Giovanni Cabiddu	1f97970e6c	vfio/pci: Add device denylist Add denylist of devices that by default are not probed by vfio-pci. Devices in this list may be susceptible to untrusted application, even if the IOMMU is enabled. To be accessed via vfio-pci, the user has to explicitly disable the denylist. The denylist can be disabled via the module parameter disable_denylist. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-07-27 13:43:40 -06:00
Alex Williamson	924b51abf9	vfio/pci: Hold igate across releasing eventfd contexts No need to release and immediately re-acquire igate while clearing out the eventfd ctxs. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-07-27 13:43:38 -06:00
John Hubbard	9d532f2869	vfio/spapr_tce: convert get_user_pages() --> pin_user_pages() This code was using get_user_pages(), in a "Case 2" scenario (DMA/RDMA), using the categorization from [1]. That means that it's time to convert the get_user_pages() + put_page() calls to pin_user_pages*() + unpin_user_pages() calls. There is some helpful background in [2]: basically, this is a small part of fixing a long-standing disconnect between pinning pages, and file systems' use of those pages. [1] Documentation/core-api/pin_user_pages.rst [2] "Explicit pinning of user-space pages": https://lwn.net/Articles/807108/ Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-07-27 13:43:38 -06:00
Xiang Zheng	e1907d6752	vfio/type1: Add conditional rescheduling after iommu map failed Commit `c5e6688752` ("vfio/type1: Add conditional rescheduling") missed a "cond_resched()" in vfio_iommu_map if iommu map failed. This is a very tiny optimization and the case can hardly happen. Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-07-27 13:43:37 -06:00
Alex Williamson	bf3551e150	vfio/pci: Add Intel X550 to hidden INTx devices Intel document 333717-008, "Intel® Ethernet Controller X550 Specification Update", version 2.7, dated June 2020, includes errata #22, added in version 2.1, May 2016, indicating X550 NICs suffer from the same implementation deficiency as the 700-series NICs: "The Interrupt Status bit in the Status register of the PCIe configuration space is not implemented and is not set as described in the PCIe specification." Without the interrupt status bit, vfio-pci cannot determine when these devices signal INTx. They are therefore added to the nointx quirk. Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-07-27 13:43:37 -06:00
Alex Williamson	26afdd9882	vfio: Cleanup allowed driver naming No functional change, avoid non-inclusive naming schemes. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-07-27 13:43:36 -06:00
Zeng Tao	b872d06408	vfio/pci: fix racy on error and request eventfd ctx The vfio_pci_release call will free and clear the error and request eventfd ctx while these ctx could be in use at the same time in the function like vfio_pci_request, and it's expected to protect them under the vdev->igate mutex, which is missing in vfio_pci_release. This issue is introduced since commit `1518ac272e` ("vfio/pci: fix memory leaks of eventfd ctx"),and since commit `5c5866c593` ("vfio/pci: Clear error and request eventfd ctx after releasing"), it's very easily to trigger the kernel panic like this: [ 9513.904346] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 9513.913091] Mem abort info: [ 9513.915871] ESR = 0x96000006 [ 9513.918912] EC = 0x25: DABT (current EL), IL = 32 bits [ 9513.924198] SET = 0, FnV = 0 [ 9513.927238] EA = 0, S1PTW = 0 [ 9513.930364] Data abort info: [ 9513.933231] ISV = 0, ISS = 0x00000006 [ 9513.937048] CM = 0, WnR = 0 [ 9513.940003] user pgtable: 4k pages, 48-bit VAs, pgdp=0000007ec7d12000 [ 9513.946414] [0000000000000008] pgd=0000007ec7d13003, p4d=0000007ec7d13003, pud=0000007ec728c003, pmd=0000000000000000 [ 9513.956975] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 9513.962521] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio hclge hns3 hnae3 [last unloaded: vfio_pci] [ 9513.972998] CPU: 4 PID: 1327 Comm: bash Tainted: G W 5.8.0-rc4+ #3 [ 9513.980443] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B270.01 05/08/2020 [ 9513.989274] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--) [ 9513.994827] pc : _raw_spin_lock_irqsave+0x48/0x88 [ 9513.999515] lr : eventfd_signal+0x6c/0x1b0 [ 9514.003591] sp : ffff800038a0b960 [ 9514.006889] x29: ffff800038a0b960 x28: ffff007ef7f4da10 [ 9514.012175] x27: ffff207eefbbfc80 x26: ffffbb7903457000 [ 9514.017462] x25: ffffbb7912191000 x24: ffff007ef7f4d400 [ 9514.022747] x23: ffff20be6e0e4c00 x22: 0000000000000008 [ 9514.028033] x21: 0000000000000000 x20: 0000000000000000 [ 9514.033321] x19: 0000000000000008 x18: 0000000000000000 [ 9514.038606] x17: 0000000000000000 x16: ffffbb7910029328 [ 9514.043893] x15: 0000000000000000 x14: 0000000000000001 [ 9514.049179] x13: 0000000000000000 x12: 0000000000000002 [ 9514.054466] x11: 0000000000000000 x10: 0000000000000a00 [ 9514.059752] x9 : ffff800038a0b840 x8 : ffff007ef7f4de60 [ 9514.065038] x7 : ffff007fffc96690 x6 : fffffe01faffb748 [ 9514.070324] x5 : 0000000000000000 x4 : 0000000000000000 [ 9514.075609] x3 : 0000000000000000 x2 : 0000000000000001 [ 9514.080895] x1 : ffff007ef7f4d400 x0 : 0000000000000000 [ 9514.086181] Call trace: [ 9514.088618] _raw_spin_lock_irqsave+0x48/0x88 [ 9514.092954] eventfd_signal+0x6c/0x1b0 [ 9514.096691] vfio_pci_request+0x84/0xd0 [vfio_pci] [ 9514.101464] vfio_del_group_dev+0x150/0x290 [vfio] [ 9514.106234] vfio_pci_remove+0x30/0x128 [vfio_pci] [ 9514.111007] pci_device_remove+0x48/0x108 [ 9514.115001] device_release_driver_internal+0x100/0x1b8 [ 9514.120200] device_release_driver+0x28/0x38 [ 9514.124452] pci_stop_bus_device+0x68/0xa8 [ 9514.128528] pci_stop_and_remove_bus_device+0x20/0x38 [ 9514.133557] pci_iov_remove_virtfn+0xb4/0x128 [ 9514.137893] sriov_disable+0x3c/0x108 [ 9514.141538] pci_disable_sriov+0x28/0x38 [ 9514.145445] hns3_pci_sriov_configure+0x48/0xb8 [hns3] [ 9514.150558] sriov_numvfs_store+0x110/0x198 [ 9514.154724] dev_attr_store+0x44/0x60 [ 9514.158373] sysfs_kf_write+0x5c/0x78 [ 9514.162018] kernfs_fop_write+0x104/0x210 [ 9514.166010] __vfs_write+0x48/0x90 [ 9514.169395] vfs_write+0xbc/0x1c0 [ 9514.172694] ksys_write+0x74/0x100 [ 9514.176079] __arm64_sys_write+0x24/0x30 [ 9514.179987] el0_svc_common.constprop.4+0x110/0x200 [ 9514.184842] do_el0_svc+0x34/0x98 [ 9514.188144] el0_svc+0x14/0x40 [ 9514.191185] el0_sync_handler+0xb0/0x2d0 [ 9514.195088] el0_sync+0x140/0x180 [ 9514.198389] Code: b9001020 d2800000 52800022 f9800271 (885ffe61) [ 9514.204455] ---[ end trace 648de00c8406465f ]--- [ 9514.212308] note: bash[1327] exited with preempt_count 1 Cc: Qian Cai <cai@lca.pw> Cc: Alex Williamson <alex.williamson@redhat.com> Fixes: `1518ac272e` ("vfio/pci: fix memory leaks of eventfd ctx") Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-07-17 08:28:40 -06:00
Alex Williamson	ebfa440ce3	vfio/pci: Fix SR-IOV VF handling with MMIO blocking SR-IOV VFs do not implement the memory enable bit of the command register, therefore this bit is not set in config space after pci_enable_device(). This leads to an unintended difference between PF and VF in hand-off state to the user. We can correct this by setting the initial value of the memory enable bit in our virtualized config space. There's really no need however to ever fault a user on a VF though as this would only indicate an error in the user's management of the enable bit, versus a PF where the same access could trigger hardware faults. Fixes: `abafbc551f` ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-06-25 11:04:23 -06:00
Alex Williamson	5c5866c593	vfio/pci: Clear error and request eventfd ctx after releasing The next use of the device will generate an underflow from the stale reference. Cc: Qian Cai <cai@lca.pw> Fixes: `1518ac272e` ("vfio/pci: fix memory leaks of eventfd ctx") Reported-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Tested-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-06-17 15:18:42 -06:00
Christoph Hellwig	f5678e7f2a	kernel: better document the use_mm/unuse_mm API contract Switch the function documentation to kerneldoc comments, and add WARN_ON_ONCE asserts that the calling thread is a kernel thread and does not have ->mm set (or has ->mm set in the case of unuse_mm). Also give the functions a kthread_ prefix to better document the use case. [hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio] Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de [sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename] Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.au Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [usb] Acked-by: Haren Myneni <haren@linux.ibm.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-10 19:14:18 -07:00
Christoph Hellwig	4dbe59a6ae	kernel: move use_mm/unuse_mm to kthread.c cover the newly merged use_mm/unuse_mm caller in vfio Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Link: http://lkml.kernel.org/r/20200416053158.586887-2-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-10 19:14:18 -07:00

1 2 3 4 5 ...

670 Commits