There is really no way to safely give a user full access to a DMA
capable device without an IOMMU to protect the host system. There is
also no way to provide DMA translation, for use cases such as device
assignment to virtual machines. However, there are still those users
that want userspace drivers even under those conditions. The UIO
driver exists for this use case, but does not provide the degree of
device access and programming that VFIO has. In an effort to avoid
code duplication, this introduces a No-IOMMU mode for VFIO.
This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
the "enable_unsafe_noiommu_mode" option on the vfio driver. This
should make it very clear that this mode is not safe. Additionally,
CAP_SYS_RAWIO privileges are necessary to work with groups and
containers using this mode. Groups making use of this support are
named /dev/vfio/noiommu-$GROUP and can only make use of the special
VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically
binding a device without a native IOMMU group to a VFIO bus driver
will taint the kernel and should therefore not be considered
supported. This patch includes no-iommu support for the vfio-pci bus
driver only.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
- VFIO platform bus driver support (Baptiste Reynal, Antonios Motakis, testing and review by Eric Auger)
- Split VFIO irqfd support to separate module (Alex Williamson)
- vfio-pci VGA arbiter client (Alex Williamson)
- New vfio-pci.ids= module option (Alex Williamson)
- vfio-pci D3 power state support for idle devices (Alex Williamson)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVLXApAAoJECObm247sIsiw4cP/AzBLqBXlxeCLgWNRJ4F6Dwz
PrjSK4xBRnPV/eMbIksWW6Wc6D4EaSUASMUUb1JLnF8roKWbRg2a5wdI4clrSYL0
mV8j4Uaw+9XkV0nWrVeiW0Yw8tu4pIQRvC6FiEJ6/pE9rHsP8wr8uzKzC7d4AvyJ
tb/awTF4PN9rZVIs68sIJi0JJjFXHuUTCHx6ns/723F1FMRS8ONAxEvIyFJOqaYh
DNDZs3UalstXYFHlV/0Zs/TqMXQJoSSOPkWx41P+kMsn00tKge88WAmybyQRjjoQ
paFjz7ox3PETUolojCFS72kMPApBCRT2q2HXXrPV74/GY2Or2UV5F1qo4gUsclyh
HqdE4OF9LcexCAOopc907Tped2SrHoiHpZu2aJWKJz+qEsejxgsAgnf1pxJikRQX
Eu7LJxJcYlddREN60ONgCUvsRq9ayNopuIDqD47Zhic6e5y8ujPjjz1e4yin+oxI
5WxMzcEdeVKC72vg2abUTHBri+l3GEWcdHk6YeK4fe95g11+gSBga8XmufgmOcGJ
VUl/umBQWzfxk0wHJtBLVIgleifs4sq+b5jxuXOboko8Z80q12zLlpcBXs8IX5Wa
wgzFvPPg8Ecsw6goCEW1xkeHfMEWcmWhI8QEWCtoZfSYLKK5I/hpuCUZBSqWgPVE
Wm2rcOkNBmqu2HpHBPPu
=8g7y
-----END PGP SIGNATURE-----
Merge tag 'vfio-v4.1-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- VFIO platform bus driver support (Baptiste Reynal, Antonios Motakis,
testing and review by Eric Auger)
- Split VFIO irqfd support to separate module (Alex Williamson)
- vfio-pci VGA arbiter client (Alex Williamson)
- New vfio-pci.ids= module option (Alex Williamson)
- vfio-pci D3 power state support for idle devices (Alex Williamson)
* tag 'vfio-v4.1-rc1' of git://github.com/awilliam/linux-vfio: (30 commits)
vfio-pci: Fix use after free
vfio-pci: Move idle devices to D3hot power state
vfio-pci: Remove warning if try-reset fails
vfio-pci: Allow PCI IDs to be specified as module options
vfio-pci: Add VGA arbiter client
vfio-pci: Add module option to disable VGA region access
vgaarb: Stub vga_set_legacy_decoding()
vfio: Split virqfd into a separate module for vfio bus drivers
vfio: virqfd_lock can be static
vfio: put off the allocation of "minor" in vfio_create_group
vfio/platform: implement IRQ masking/unmasking via an eventfd
vfio: initialize the virqfd workqueue in VFIO generic code
vfio: move eventfd support code for VFIO_PCI to a separate file
vfio: pass an opaque pointer on virqfd initialization
vfio: add local lock for virqfd instead of depending on VFIO PCI
vfio: virqfd: rename vfio_pci_virqfd_init and vfio_pci_virqfd_exit
vfio: add a vfio_ prefix to virqfd_enable and virqfd_disable and export
vfio/platform: support for level sensitive interrupts
vfio/platform: trigger an interrupt via eventfd
vfio/platform: initial interrupts support code
...
An unintended consequence of commit 42ac9bd18d ("vfio: initialize
the virqfd workqueue in VFIO generic code") is that the vfio module
is renamed to vfio_core so that it can include both vfio and virqfd.
That's a user visible change that may break module loading scritps
and it imposes eventfd support as a dependency on the core vfio code,
which it's really not. virqfd is intended to be provided as a service
to vfio bus drivers, so instead of wrapping it into vfio.ko, we can
make it a stand-alone module toggled by vfio bus drivers. This has
the additional benefit of removing initialization and exit from the
core vfio code.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The virqfd functionality that is used by VFIO_PCI to implement interrupt
masking and unmasking via an eventfd, is generic enough and can be reused
by another driver. Move it to a separate file in order to allow the code
to be shared.
Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Tested-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
When a request is made to unbind a device from a vfio bus driver,
we need to wait for the device to become unused, ie. for userspace
to release the device. However, we have a long standing TODO in
the code to do something proactive to make that happen. To enable
this, we add a request callback on the vfio bus driver struct,
which is intended to signal the user through the vfio device
interface to release the device. Instead of passively waiting for
the device to become unused, we can now pester the user to give
it up.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The existing vfio_pci_open() fails upon error returned from
vfio_spapr_pci_eeh_open(), which breaks POWER7's P5IOC2 PHB
support which this patch brings back.
The patch fixes the issue by dropping the return value of
vfio_spapr_pci_eeh_open().
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The VFIO related components could be built as dynamic modules.
Unfortunately, CONFIG_EEH can't be configured to "m". The patch
fixes the build errors when configuring VFIO related components
as dynamic modules as follows:
CC [M] drivers/vfio/vfio_iommu_spapr_tce.o
In file included from drivers/vfio/vfio.c:33:0:
include/linux/vfio.h:101:43: warning: ‘struct pci_dev’ declared \
inside parameter list [enabled by default]
:
WRAP arch/powerpc/boot/zImage.pseries
WRAP arch/powerpc/boot/zImage.maple
WRAP arch/powerpc/boot/zImage.pmac
WRAP arch/powerpc/boot/zImage.epapr
MODPOST 1818 modules
ERROR: ".vfio_spapr_iommu_eeh_ioctl" [drivers/vfio/vfio_iommu_spapr_tce.ko]\
undefined!
ERROR: ".vfio_spapr_pci_eeh_open" [drivers/vfio/pci/vfio-pci.ko] undefined!
ERROR: ".vfio_spapr_pci_eeh_release" [drivers/vfio/pci/vfio-pci.ko] undefined!
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The macro offsetofend() introduces unnecessary temporary variable
"tmp". The patch avoids that and saves a bit memory in stack.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This lets us check extensions, particularly VFIO_DMA_CC_IOMMU using
the external user interface, allowing KVM to probe IOMMU coherency.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
VFIO is designed to be used via ioctls on file descriptors
returned by VFIO.
However in some situations support for an external user is required.
The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to
use the existing VFIO groups for exclusive access in real/virtual mode
on a host to avoid passing map/unmap requests to the user space which
would made things pretty slow.
The protocol includes:
1. do normal VFIO init operation:
- opening a new container;
- attaching group(s) to it;
- setting an IOMMU driver for a container.
When IOMMU is set for a container, all groups in it are
considered ready to use by an external user.
2. User space passes a group fd to an external user.
The external user calls vfio_group_get_external_user()
to verify that:
- the group is initialized;
- IOMMU is set for it.
If both checks passed, vfio_group_get_external_user()
increments the container user counter to prevent
the VFIO group from disposal before KVM exits.
3. The external user calls vfio_external_user_iommu_id()
to know an IOMMU ID. PPC64 KVM uses it to link logical bus
number (LIOBN) with IOMMU ID.
4. When the external KVM finishes, it calls
vfio_group_put_external_user() to release the VFIO group.
This call decrements the container user counter.
Everything gets released.
The "vfio: Limit group opens" patch is also required for the consistency.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
- Added vfio_device_get_from_dev() as wrapper to get
reference to vfio_device from struct device.
- Added vfio_device_data() as a wrapper to get device_data from
vfio_device.
Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
Add PCI device support for VFIO. PCI devices expose regions
for accessing config space, I/O port space, and MMIO areas
of the device. PCI config access is virtualized in the kernel,
allowing us to ensure the integrity of the system, by preventing
various accesses while reducing duplicate support across various
userspace drivers. I/O port supports read/write access while
MMIO also supports mmap of sufficiently sized regions. Support
for INTx, MSI, and MSI-X interrupts are provided using eventfds to
userspace.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This VFIO IOMMU backend is designed primarily for AMD-Vi and Intel
VT-d hardware, but is potentially usable by anything supporting
similar mapping functionality. We arbitrarily call this a Type1
backend for lack of a better name. This backend has no IOVA
or host memory mapping restrictions for the user and is optimized
for relatively static mappings. Mapped areas are pinned into system
memory.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
VFIO is a secure user level driver for use with both virtual machines
and user level drivers. VFIO makes use of IOMMU groups to ensure the
isolation of devices in use, allowing unprivileged user access. It's
intended that VFIO will replace KVM device assignment and UIO drivers
(in cases where the target platform includes a sufficiently capable
IOMMU).
New in this version of VFIO is support for IOMMU groups managed
through the IOMMU core as well as a rework of the API, removing the
group merge interface. We now go back to a model more similar to
original VFIO with UIOMMU support where the file descriptor obtained
from /dev/vfio/vfio allows access to the IOMMU, but only after a
group is added, avoiding the previous privilege issues with this type
of model. IOMMU support is also now fully modular as IOMMUs have
vastly different interface requirements on different platforms. VFIO
users are able to query and initialize the IOMMU model of their
choice.
Please see the follow-on Documentation commit for further description
and usage example.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>