This adds create/remove window ioctls to create and remove DMA windows.
sPAPR defines a Dynamic DMA windows capability which allows
para-virtualized guests to create additional DMA windows on a PCI bus.
The existing linux kernels use this new window to map the entire guest
memory and switch to the direct DMA operations saving time on map/unmap
requests which would normally happen in a big amounts.
This adds 2 ioctl handlers - VFIO_IOMMU_SPAPR_TCE_CREATE and
VFIO_IOMMU_SPAPR_TCE_REMOVE - to create and remove windows.
Up to 2 windows are supported now by the hardware and by this driver.
This changes VFIO_IOMMU_SPAPR_TCE_GET_INFO handler to return additional
information such as a number of supported windows and maximum number
levels of TCE tables.
DDW is added as a capability, not as a SPAPR TCE IOMMU v2 unique feature
as we still want to support v2 on platforms which cannot do DDW for
the sake of TCE acceleration in KVM (coming soon).
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: for the vfio related changes]
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The existing implementation accounts the whole DMA window in
the locked_vm counter. This is going to be worse with multiple
containers and huge DMA windows. Also, real-time accounting would requite
additional tracking of accounted pages due to the page size difference -
IOMMU uses 4K pages and system uses 4K or 64K pages.
Another issue is that actual pages pinning/unpinning happens on every
DMA map/unmap request. This does not affect the performance much now as
we spend way too much time now on switching context between
guest/userspace/host but this will start to matter when we add in-kernel
DMA map/unmap acceleration.
This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU.
New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces
2 new ioctls to register/unregister DMA memory -
VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY -
which receive user space address and size of a memory region which
needs to be pinned/unpinned and counted in locked_vm.
New IOMMU splits physical pages pinning and TCE table update
into 2 different operations. It requires:
1) guest pages to be registered first
2) consequent map/unmap requests to work only with pre-registered memory.
For the default single window case this means that the entire guest
(instead of 2GB) needs to be pinned before using VFIO.
When a huge DMA window is added, no additional pinning will be
required, otherwise it would be guest RAM + 2GB.
The new memory registration ioctls are not supported by
VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration
will require memory to be preregistered in order to work.
The accounting is done per the user process.
This advertises v2 SPAPR TCE IOMMU and restricts what the userspace
can do with v1 or v2 IOMMUs.
In order to support memory pre-registration, we need a way to track
the use of every registered memory region and only allow unregistration
if a region is not in use anymore. So we need a way to tell from what
region the just cleared TCE was from.
This adds a userspace view of the TCE table into iommu_table struct.
It contains userspace address, one per TCE entry. The table is only
allocated when the ownership over an IOMMU group is taken which means
it is only used from outside of the powernv code (such as VFIO).
As v2 IOMMU supports IODA2 and pre-IODA2 IOMMUs (which do not support
DDW API), this creates a default DMA window for IODA2 for consistency.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: for the vfio related changes]
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The patch adds one more EEH sub-command (VFIO_EEH_PE_INJECT_ERR)
to inject the specified EEH error, which is represented by
(struct vfio_eeh_pe_err), to the indicated PE for testing purpose.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Largely hugepage support for vfio/type1 iommu and surrounding cleanups and fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJR2uNvAAoJECObm247sIsiJRYQAJK15MfXgJq2PtBABNvFUAOG
nqUvLgBgM5Ow1NI0Rzh9jkNohNqCvXDFGaWXXnsaX83hIpi59GFK31W2E3SiFCj3
xISA9SUnm7Kjt9LAF6HTNz805zBkshIOk4MCx6HlezVWSRlWwT3rZzI4dI2fMvl8
iPRk1Ion3QSQui99HWfXv/rtezAIzgZqsFqPC6DjWRfN7LcdEtKtcQwnrSb5GGY9
3TIRY9IRYTSfJ2yjSz5f5258JxoDG5sR8dTMkgG2Gm92iGvGcPGpzQWPzVc4t+TO
PdTqtv9ftEyAJKsYTFjPIod8XbzJBa1FSPadVAIfwF0JCDcsSFjoWGp+RzMQQSF8
MK3VsnQ/pqJfs2nJHDQbWbKu0qWYPntvOCdojZ4679ceDTd0t515npfYeDQuX8yU
fAA5rB46mDXjyxikTP574NdnkcGjbAj7EOCp7s+WTsVPGQQ3mId/3fQw0Wg7bE6v
jaJqdRj70SNTRHs8DFLQhvSZgpef4RzepE4sRBZqzY4vWd4riNcAC3Got+F2rQy3
X4hcHHU/5LGLoGMxOJQmuBfKVM8RAgikq6w2RfttVMLeKCknKtJ29OnotKilvILh
W8nAOGxRnkmONFfHakNJtLl5tQJ4FQXc2cG8OeIIhHgheJjUxL72/zv8bBxOo7rY
jUBjtZ5riQXc/ck4FEGI
=9+Jh
-----END PGP SIGNATURE-----
Merge tag 'vfio-v3.11' of git://github.com/awilliam/linux-vfio
Pull vfio updates from Alex Williamson:
"Largely hugepage support for vfio/type1 iommu and surrounding cleanups
and fixes"
* tag 'vfio-v3.11' of git://github.com/awilliam/linux-vfio:
vfio/type1: Fix leak on error path
vfio: Limit group opens
vfio/type1: Fix missed frees and zero sized removes
vfio: fix documentation
vfio: Provide module option to disable vfio_iommu_type1 hugepage support
vfio: hugepage support for vfio_iommu_type1
vfio: Convert type1 iommu to use rbtree
VFIO implements platform independent stuff such as
a PCI driver, BAR access (via read/write on a file descriptor
or direct mapping when possible) and IRQ signaling.
The platform dependent part includes IOMMU initialization
and handling. This implements an IOMMU driver for VFIO
which does mapping/unmapping pages for the guest IO and
provides information about DMA window (required by a POWER
guest).
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>