Largely hugepage support for vfio/type1 iommu and surrounding cleanups and fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJR2uNvAAoJECObm247sIsiJRYQAJK15MfXgJq2PtBABNvFUAOG
nqUvLgBgM5Ow1NI0Rzh9jkNohNqCvXDFGaWXXnsaX83hIpi59GFK31W2E3SiFCj3
xISA9SUnm7Kjt9LAF6HTNz805zBkshIOk4MCx6HlezVWSRlWwT3rZzI4dI2fMvl8
iPRk1Ion3QSQui99HWfXv/rtezAIzgZqsFqPC6DjWRfN7LcdEtKtcQwnrSb5GGY9
3TIRY9IRYTSfJ2yjSz5f5258JxoDG5sR8dTMkgG2Gm92iGvGcPGpzQWPzVc4t+TO
PdTqtv9ftEyAJKsYTFjPIod8XbzJBa1FSPadVAIfwF0JCDcsSFjoWGp+RzMQQSF8
MK3VsnQ/pqJfs2nJHDQbWbKu0qWYPntvOCdojZ4679ceDTd0t515npfYeDQuX8yU
fAA5rB46mDXjyxikTP574NdnkcGjbAj7EOCp7s+WTsVPGQQ3mId/3fQw0Wg7bE6v
jaJqdRj70SNTRHs8DFLQhvSZgpef4RzepE4sRBZqzY4vWd4riNcAC3Got+F2rQy3
X4hcHHU/5LGLoGMxOJQmuBfKVM8RAgikq6w2RfttVMLeKCknKtJ29OnotKilvILh
W8nAOGxRnkmONFfHakNJtLl5tQJ4FQXc2cG8OeIIhHgheJjUxL72/zv8bBxOo7rY
jUBjtZ5riQXc/ck4FEGI
=9+Jh
-----END PGP SIGNATURE-----
Merge tag 'vfio-v3.11' of git://github.com/awilliam/linux-vfio
Pull vfio updates from Alex Williamson:
"Largely hugepage support for vfio/type1 iommu and surrounding cleanups
and fixes"
* tag 'vfio-v3.11' of git://github.com/awilliam/linux-vfio:
vfio/type1: Fix leak on error path
vfio: Limit group opens
vfio/type1: Fix missed frees and zero sized removes
vfio: fix documentation
vfio: Provide module option to disable vfio_iommu_type1 hugepage support
vfio: hugepage support for vfio_iommu_type1
vfio: Convert type1 iommu to use rbtree
We currently send all mappings to the iommu in PAGE_SIZE chunks,
which prevents the iommu from enabling support for larger page sizes.
We still need to pin pages, which means we step through them in
PAGE_SIZE chunks, but we can batch up contiguous physical memory
chunks to allow the iommu the opportunity to use larger pages. The
approach here is a bit different that the one currently used for
legacy KVM device assignment. Rather than looking at the vma page
size and using that as the maximum size to pass to the iommu, we
instead simply look at whether the next page is physically
contiguous. This means we might ask the iommu to map a 4MB region,
while legacy KVM might limit itself to a maximum of 2MB.
Splitting our mapping path also allows us to be smarter about locked
memory because we can more easily unwind if the user attempts to
exceed the limit. Therefore, rather than assuming that a mapping
will result in locked memory, we test each page as it is pinned to
determine whether it locks RAM vs an mmap'd MMIO region. This should
result in better locking granularity and less locked page fudge
factors in userspace.
The unmap path uses the same algorithm as legacy KVM. We don't want
to track the pfn for each mapping ourselves, but we need the pfn in
order to unpin pages. We therefore ask the iommu for the iova to
physical address translation, ask it to unpin a page, and see how many
pages were actually unpinned. iommus supporting large pages will
often return something bigger than a page here, which we know will be
physically contiguous and we can unpin a batch of pfns. iommus not
supporting large mappings won't see an improvement in batching here as
they only unmap a page at a time.
With this change, we also make a clarification to the API for mapping
and unmapping DMA. We can only guarantee unmaps at the same
granularity as used for the original mapping. In other words,
unmapping a subregion of a previous mapping is not guaranteed and may
result in a larger or smaller unmapping than requested. The size
field in the unmapping structure is updated to reflect this.
Previously this was unmodified on mapping, always returning the the
requested unmap size. This is now updated to return the actual unmap
size on success, allowing userspace to appropriately track mappings.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
VFIO implements platform independent stuff such as
a PCI driver, BAR access (via read/write on a file descriptor
or direct mapping when possible) and IRQ signaling.
The platform dependent part includes IOMMU initialization
and handling. This implements an IOMMU driver for VFIO
which does mapping/unmapping pages for the guest IO and
provides information about DMA window (required by a POWER
guest).
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
- New VFIO_SET_IRQ ioctl option to pass the eventfd that is signaled when
an error occurs in the vfio_pci_device
- Register pci_error_handler for the vfio_pci driver
- When the device encounters an error, the error handler registered by
the vfio_pci driver gets invoked by the AER infrastructure
- In the error handler, signal the eventfd registered for the device.
- This results in the qemu eventfd handler getting invoked and
appropriate action taken for the guest.
Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
PCI defines display class VGA regions at I/O port address 0x3b0, 0x3c0
and MMIO address 0xa0000. As these are non-overlapping, we can ignore
the I/O port vs MMIO difference and expose them both in a single
region. We make use of the VGA arbiter around each access to
configure chipset access as necessary.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>